Blog article

by Bob Thomas

DataWalk is a full-stack software platform for data analysis, and utilizes unique DataWalk technology which enables superior performance for graph analytics and graph algorithms.

We should first define that Graph Analytics (or Graph Algorithms) are analytic algorithms which can be used to identify patterns and relationships between objects on a network graph. A network graph is also known as a link chart.

There are several types of graph analytics or graph algorithms, including:

- Path recognition, which includes things like finding the shortest path between two objects.
- Centralities, such as finding and visualizing the nodes which have the most connected nodes, or which are bridges between networks of objects, or which people have the greatest influence.
- Graph characteristics, such as graph density, population detection via clustering, and so forth.

These graph algorithms can be very useful in a variety of applications. For example, in law enforcement the “find path” graph algorithm can instantly identify the possible connections between two people, which might be a phone call, a particular address, participation in the same crime, or something else. There are numerous other applications and use cases for graph analytics.

Unique DataWalk technology enables exceptional execution speed of graph analytics across very large volumes of data, particularly when the size of the knowledge graph is larger than the available memory (RAM). The back end of the DataWalk system can be thought of as a scale-out graph and relational database hybrid. DataWalk can do both graph algorithms and OLAP on a single instance of data, without any data movement, under a single permissions structure. Connections between data sets (or more specifically between all data elements in two data sets) are persisted (stored) in DataWalk, and this persistence is key to enabling fast execution of graph algorithms. The DataWalk engine is optimized to operate on multiple objects at once, which is crucial for querying and pattern recognition.

Unlike other available systems which provide graph analytics, DataWalk can perform some unique extensions of traditional graph algorithms. For example, DataWalk can not only calculate graph algorithms such as “shortest path” between two objects, the system can also identify and visualize “all paths” between one object and multiple other objects in a single operation. This is a unique DataWalk capability for graph analytics, and can be a huge time-saver.

DataWalk can also ensure that long paths can be calculated, i.e., objects connected by many hops. While other solutions may struggle to identify paths over just three hops, DataWalk can efficiently identify paths over many hops.

DataWalk enables users to establish multiple links between nodes, based on rules that can be specified and then automatically maintained in the system. This expands the power of graph analytics in DataWalk and accelerates the execution of graph algorithms. A simple example is that you have two sets of people that you want to connect, and in DataWalk you can easily connect people in these sets via a unique identifier, add a second connection for a phone number, a third connection that is a combination of attributes such as date of birth and address, and a fourth connection that is physical proximity of home addresses within a specified distance.

Compared to even the fastest graph databases, DataWalk can deliver dramatically better performance for graph analytics with large graphs. The fastest graph databases tend to be in-memory systems, such that they can deliver great performance for graph analytics as long as the graph fits into available memory (RAM). In contrast, DataWalk utilizes a scale-out approach enabling exceptional performance for graph algorithms across vast amounts of data.

Note that a graph analytics comparison of DataWalk vs. graph databases is a bit like comparing apples and oranges. Graph databases are….well...databases, while DataWalk is a data repository and full-stack platform for data analysis. Exceptional capabilities for executing graph analytics is just one of many powerful features of the DataWalk platform.