DataWalk Or A Graph Database For Fraud Detection

Blog article
by Bob Thomas

DataWalk Or A Graph Database For Fraud Detection?

Customers often come to DataWalk after realizing that building custom software on a graph database for fraud detection doesn’t always work.


We have been approached multiple times by customers who needed to analyze highly connected data that was too complex for a SQL database, so they attempted to develop custom software on a graph database for fraud detection. However, after many months they abandoned these efforts due to programming issues, performance problems, and/or functionality limitations.

In such cases DataWalk can provide a compelling alternative to a graph database. DataWalk provides a comprehensive, leading-edge data analysis platform that provides breakthrough capabilities for complex querying. DataWalk can enable customers to rapidly deploy, quickly get excellent results, and avoid the challenges of creating and maintaining custom software.

In this short paper we’ll briefly discuss some of the key considerations and tradeoffs to consider when evaluating whether to utilize DataWalk, or custom software using a graph database, for fraud detection and investigation applications.


DataWalk works and is available now, while most custom software projects fail

Building custom software solutions is difficult. Various industry studies suggest that two-out-of-three attempts will fail, and that the failure rate increases with the scale of the project[1]. DataWalk provides a solution that works, that is available immediately, and that is effectively maintained and frequently updated with new functionality.

Creating risk scores is easy in DataWalk; impractical with graph databases

DataWalk includes a powerful scoring mechanism enabling customers to configure scores themselves (through a simple visual interface), and then easily tune scores over time in order to continually improve fraud detection accuracy. This approach has enabled DataWalk customers – over time - to achieve true positive rates over 90%. Creating a scoring mechanism with a graph database is essentially impractical, as this effectively would involve creating dozens or hundreds of queries, weighting each one, and then executing those queries to calculate a score. This would be extremely difficult to program, maintain, and modify.


With DataWalk you can easily identify patterns that are similar to known fraud patterns

Once a fraud pattern has been identified, it can be “encoded” in DataWalk simply by creating and merging visual queries. Unique DataWalk technology makes this both simple to create, and quick to execute. Doing this with a graph database is again impractical, as there is no practical way to merge queries.


DataWalk can effectively handle dirty data

Many Enterprises are challenged with dirty data. DataWalk is an end-to-end solution which enables dirty data to easily be ingested as-is and then automatically transformed (“cleaned”), while maintaining lineage and audit history. In contrast, a graph database solution for fraud detection would require a separate ETL tool and ETL process, or would require that a custom cleaning facility be coded on top of the graph database. In either case, there is significant cost and/or development effort, and maintaining lineage is a challenge.


DataWalk is a comprehensive, off-the-shelf platform for data access, analysis, and investigations.

DataWalk provides a broad range of functionality valued by teams doing fraud detection and investigation: a single repository where all data is connected; simple visual querying; robust link analysis; the ability to instantly generate a 360-degree view for any entity; dashboards; integrated mapping; reports; and various other facilities. Constructing the equivalent with a custom application built on a graph database would be a lengthy, significant effort.


DataWalk is far superior for identifying connections between populations

If all you need to do is identify connections for a single object, then graph databases are fast. However, graph database performance can become a significant issue if you need to identify connections between populations (e.g., “find all dark SUVs no more than three years old, registered in California to females aged 26-35 who have never been a customer but who have applied for at least two auto loans for cars in model year 2019 or later”).


DataWalk enables you to easily integrate external data sources

To use an external data source with a graph database, you will need to have software that connects to that source and then imports the data. In contrast, DataWalk enables you to connect to various external data sources (such as public records data, data from other websites, social media data[2], etc.) and then connect this with your other data.


DataWalk can be superior for graph algorithms

If you want to run graph algorithms for things like finding paths between objects, graph algorithms can become very slow if you need to go beyond just a few hops and multiple objects of interest (e.g., whitelist, known fraudsters, etc.). On the other hand, DataWalk performance testing has shown that such graph algorithms may run dramatically faster on DataWalk when compared to highly regarded graph databases.


Data discovery is easy in DataWalk

In DataWalk, data discovery and exploration is a simple, intuitive process of traversing and filtering data sets and connection. You can easily revert back to an earlier step in your analysis and change your path (i.e., change the visual query sequence). To modify your path with a graph database requires generating and running another query.


DataWalk has built-in data lineage

In many applications it is important to have the ability to track data lineage. This capability is included in DataWalk, while with a graph database, custom coding needs to be done in order to capture lineage.


DataWalk can utilize images, PDFs, and other types of content

Graph databases cannot store unstructured binary content (e.g., images, PDFs, DOC files, etc.) Such content can be stored and linked in DataWalk.


In summary, though graph databases are an excellent fit for various use cases, for Enterprise-class deployments of graph databases for fraud detection, DataWalk is typically a far superior solution.


[1] For example, see's%20Annual,in%20partial%20or%20total%20failure.&text=Despite%20larger%20projects%20being%20more,fail%20one%20in%20ten%20times.

[2] May require third party software purchased separately