DataWalk Graph Analytics for Big Data
Blog article
by Bob Thomas

DataWalk Graph Analytics for Big Data


DataWalk delivers superior performance for graph analytics for big data, in a full-stack software platform for data analysis.

 
 
 

Graph Analytics (or Graph Algorithms) are analytic algorithms which can be used to identify patterns and relationships between objects. There are several types of graph analytics or graph algorithms, including:

  • Path recognition, which includes things like finding the shortest path between two objects.
  • Centralities, such as finding and visualizing the nodes which have the most connected nodes, or which are bridges between networks of objects, or which people have the greatest influence.
  • Graph characteristics, such as graph density, population detection via clustering, and so forth.

These graph algorithms can be very useful in a variety of applications. For example, in law enforcement the “find path” graph algorithm can instantly identify the possible connections between two people, which might be a phone call, a particular address, participation in the same crime, or something else. There are numerous other applications and use cases for graph analytics.

Executing graph analytics for big data is a challenge, but unique DataWalk technology enables exceptional execution speed of graph analytics across very large volumes of data, particularly when the size of the knowledge graph is larger than the available memory (RAM) as:

  • DataWalk can do both graph algorithms and OLAP on a single instance of data, without any data movement, under a single permissions structure.
  • Connections between data sets (or more specifically between all data elements in two data sets) are persisted (stored) in DataWalk.
  • The DataWalk engine is optimized to operate on multiple objects at once, which is crucial for querying and pattern recognition.

Unlike other available systems which provide graph analytics for big data, DataWalk can perform some unique extensions of traditional graph algorithms. For example, DataWalk can not only calculate graph algorithms such as “shortest path” between two objects, the system can also identify and visualize “all paths” between one object and multiple other objects in a single operation. This is a unique DataWalk capability for graph analytics, and can be a huge time-saver.

DataWalk can also ensure that long paths can be calculated, i.e., objects connected by many hops. While other solutions may struggle to identify paths over just three hops, DataWalk can efficiently identify paths over many hops.

Note that DataWalk is a full-stack system for data analysis. The back end of the DataWalk system can be thought of as a scale-out graph and relational database hybrid. DataWalk enables you to connect all of your data in a simple visual model organized not around data sources or applications, but instead reorganized around simple relevant entities such as people, phones, vehicles, crimes, and/or anything else. Data organized in this model, called the Universe Viewer, can be visually queried simply by traversing and filtering data sets and their connections. This enables you to easily identify the data for which you want to execute graph algorithms.

 
 
 
 

DataWalk enables users to establish multiple links between nodes, based on rules that can be specified and then automatically maintained in the system. This expands the power of graph analytics in DataWalk and accelerates the execution of graph algorithms. A simple example is that you have two sets of people that you want to connect, and in DataWalk you can easily connect people in these sets via a unique identifier, add a second connection for a phone number, a third connection that is a combination of attributes such as date of birth and address, and a fourth connection that is physical proximity of home addresses within a specified distance.

Compared to even the fastest graph databases, DataWalk can deliver dramatically better performance with graph analytics for big data. The fastest graph databases tend to be in-memory systems, such that they can deliver great performance for graph analytics as long as the graph fits into available memory (RAM).  In contrast, DataWalk utilizes a scale-out approach enabling exceptional performance for graph algorithms across vast amounts of data.

Note that a graph analytics comparison of DataWalk vs. graph databases is a bit like comparing apples and oranges. Graph databases are….well...databases, while DataWalk is a data repository and full-stack platform for data analysis. Exceptional capabilities for executing graph analytics is just one of many powerful features of the DataWalk platform.