DataWalk: Different Than Other Neo4j Alternatives
Blog article
by Bob Thomas

DataWalk: Different Than Other Neo4j® Alternatives


The best of the Neo4j alternatives for analyzing complex multi-source data without learning a new query language or writing your own software.

 
 
 

We sometimes encounter situations where a customer needs to analyze highly connected data that was too complex for a SQL database, so they attempted to utilize a graph database such as Neo4j or TigerGraph®. They would develop custom software on top to try to address their application needs, but collectively those efforts failed due to programming issues, performance problems, or functionality limitations. Several of these customers have then selected DataWalk, effectively as a Neo4j alternative or TigerGraph alternative.

Based on these experiences, in this article we’ll briefly summarize where DataWalk is a good fit as a Neo4j alternative or TigerGraph alternative.

We’ll start by comparing and contrasting some of the key capabilities of DataWalk vs. graph databases.

Both graph databases and DataWalk are applicable for data sets where connections and relationships are key. Graph databases are…databases…while DataWalk is a full-stack data analysis platform for connecting and analyzing complex multi-source data.

Graph databases have Scriptable Query Languages, such as Cipher for Neo4j, and require programming. In contrast, DataWalk is an application for data analysis, is not a database, and does not require programming. DataWalk users can do the equivalent of Cipher queries via an intuitive DataWalk visual interface called the Universe Viewer. By simply traversing and filtering data sets and connections on the Universe Viewer, complex queries can easily be generated and will quickly complete. DataWalk queries also can be executed via an API, such that DataWalk is not too far away from a scriptable language. The DataWalk Universe Viewer also models all of the data of an Enterprise, not as sources or applications or silo’d data, but re-organized in relevant business terms such as customers, transactions, and so forth.

 
 
 
 

The nature of the Connected Data is another key consideration. For example, graph databases are fast if you just need to identify connections for a single object. However, they may dramatically degrade when identifying connections between populations (e.g., “find all red Toyota cars less than 5 years old, registered in a specific ZIP code to males aged 18-25”). If you need to identify connections between populations, then DataWalk consistently delivers superior performance and results, and serves very well as a Neo4j alternative or a TigerGraph alternative.

The ability to integrate external data sources is another key comparison point. Utilizing external data sources with a graph database requires that you write (or utilize) additional software which reaches out to those sources and then inputs into the database. With DataWalk you can easily add and connect external data with your other available data. If external data sources are important to you, then DataWalk is an excellent Neo4j alternative or TigerGraph alternative.

Graph algorithms such as finding paths, or finding shortest paths, may become impractically slow on some graph databases for any more than a few hops. In contrast, DataWalk benchmark results show that in some scenarios DataWalk can find all paths, or find the shortest path, across many hops much faster than a top-tier graph database. In addition, there are other algorithms that DataWalk can execute that are not necessarily available in Neo4j (without deploying additional tools like Spark).

Dirty Data is problematic for graph databases, such that an ETL tool and/or other processes must be implemented in order to ensure clean data and reliable results. In contrast, DataWalk accepts your data “as-is” and can easily transform data elements without consuming additional computational resources, while maintaining the lineage and complete audit history of the data.

Data Discovery is simple and intuitive in DataWalk, where analysts can easily traverse and filter data on the fly and instantly go back multiple steps to change their analysis path (i.e., change the visual query sequence). With a graph database, changing a path requires writing a new query and waiting again for the results. DataWalk is more efficient and offers a more practical and maintainable platform than traditional graph query systems.

Data lineage is built-in to DataWalk. With a graph database, maintaining lineage requires a custom application to be written on top of the database.

Scoring is easily implemented and quickly calculated in DataWalk, but is not very practical with graph databases. For a graph database environment, you can think of scoring as taking hundreds of Cipher queries, assigning weights to each, and then executing them to generate a single number. This would have to be written on top of the graph database, and would be very cumbersome to maintain.

Binary unstructured content, such as images, PDFs, DOC files, etc., cannot be stored in graph databases, but can be stored and linked in DataWalk.

Considering the above, we can now summarize when DataWalk is, and is not, a Neo4j alternative, TigerGraph alternative, or more broadly, a graph database alternative:

DataWalk is a viable graph database alternative if you have complex data/ontologies and you do not:

  • Require a truly scriptable query language
  • Need or want to write your own software
  • Require open source software with zero license cost

DataWalk then becomes a compelling Neo4j alternative if any of the following are also true:

  • You need to identify connections between populations
  • You need to find paths between objects more than three hops away
  • You cannot provide clean input data (e.g., via ETL)
  • You need to create flow diagrams
  • You need to do data discovery, e.g., instantly modify your analysis/queries in flight
  • You want to incorporate external data from remote sources
  • You require “built-in” data lineage (i.e., you don’t want to write custom software for this)
  • You need to create scores
  • You need to store binary unstructured content (e.g., images, PDF files, DOC files)

Neo4j, TigerGraph, and other graph databases are very good for a number of use cases. We hold them in high regard, and DataWalk is not always an appropriate graph database alternative. However, there are some cases where DataWalk can indeed serve as a compelling graph database alternative.

---

Neo4j is a registered trademark of Neo4j Inc., and TigerGraph is a registered trademark of TigerGraph, Inc.

 

 
Get a live demo