by Bob Thomas
We’ve recently encountered situations where a customer had data that was too complex for a SQL database, so they attempted to utilize a graph database (specifically Neo4j) and developed custom software on top…but those efforts failed due to programming issues, performance problems, or functionality limitations. These customers then selected DataWalk, effectively as a Neo4j alternative (or more generally, as a graph database alternative).
Based on these experiences, in this article we’ll briefly summarize where DataWalk is a good fit as a Neo4j alternative.
Let’s start by comparing and contrasting some of the key capabilities of DataWalk vs. graph databases:
Scriptable Query Languages are available with graph databases, such as Cipher for Neo4j. In contrast, DataWalk is an application, not a database. DataWalk users can do the equivalent of Cipher queries via an intuitive visual interface, by filtering and traversing data sets and their connections. These queries also can be executed via an API, such that DataWalk is not too far away from a scriptable language.
Connected Data is another key consideration. For example, graph databases are fast if you just need to identify connections for a single object. However, they often degrade when identifying connections between populations (e.g., “find all red Toyota cars less than 5 years old, registered in a specific ZIP code to males aged 18-25”). If you need to identify connections between populations, then DataWalk consistently delivers better performance and results, and serves very well as a Neo4j alternative or a graph database alternative.
External Data Sources can easily be configured in DataWalk, enabling you to add and connect external data with your other available data. Utilizing external data sources with a graph database requires that you write (or utilize) additional software which reaches out to those sources and then inputs into the database. If external data sources are important to you, then DataWalk is an excellent Neo4j alternative (or graph database alternative).
Graph Algorithms such as finding paths, or finding shortest paths, may become impractically slow on some graph databases for any more than a few hops. DataWalk, on the other hand, can quickly find all paths, or find the shortest path, across many hops. In addition, there are other algorithms that DataWalk can execute that are not necessarily available in Neo4j (without deploying additional tools like Spark).
Dirty Data is problematic for graph databases, such that an ETL tool and/or other processes must be implemented in order to ensure clean data and reliable results. In contrast, DataWalk accepts the raw data “as-is” and can easily transform data elements without imposing additional computational resources, while maintaining the lineage and complete audit history of the data.
Data Discovery is simple and intuitive in DataWalk, where analysts easily traverse and filter data on the fly and can instantly go back multiple steps to change the analysis path (i.e., change the visual query sequence). With a graph database, changing a path requires writing a new query and waiting again for the results. DataWalk is more efficient and offers a more practical and maintainable platform than traditional graph query systems.
Data lineage is built-in to DataWalk. With a graph database, maintaining lineage requires a custom application to be written on top of the database.
Scoring is easily implemented and quickly calculated in DataWalk, but is not very practical with graph databases. For a graph database environment, you can think of scoring as taking hundreds of Cipher queries, weighting them, and then executing them to generate a single number that is the result for each node. This would have to be written on top of the graph database, and would be very cumbersome to maintain.
Binary unstructured content, such as images, PDFs, DOC files, etc., cannot be stored in graph databases, but can be stored and linked in DataWalk.
Considering the above, we can now summarize when DataWalk is, and is not, a Neo4j alternative (or more broadly, a graph database alternative):
DataWalk is a viable Neo4j alternative (or graph database alternative) if you have complex data/ontologies and you do not:
• Require a truly scriptable query language
• Need or want to write your own software
• Require open source software with zero license cost
DataWalk then becomes a compelling Neo4j alternative (or graph database alternative) if any of the following are also true:
• You cannot provide clean input data (e.g., via ETL)
• You need to do data discovery, e.g., instantly modify your analysis/queries in flight
• You want to incorporate external data from remote sources
• You require “built in” data lineage (i.e., you don’t want to write custom software for this)
• You need to identify connections between populations
• You need to find paths between objects more than three hops away
• You need to create scores
• You need to store binary unstructured content (e.g., images, PDF files, DOC files)
• You need to create flow diagrams
Neo4j and other graph databases are very good for a number of use cases. We hold them in high regard, and DataWalk is not always a graph database alternative. However, there are some cases where DataWalk can indeed serve as a compelling Neo4j alternative, or more broadly, a graph database alternative.