The DataWalk Graph AI Platform

DataWalk Technology

DataWalk has developed hybrid graph/relational technology to enable quick data integration and fast analysis of vast amounts of complex data.

With the accelerating interest in AI, a key development is the concept of a Graph AI platform. Here we provide an overview of the DataWalk Graph AI platform.

In general, a Graph AI platform brings together a variety of key graph and AI-based functions for integrating, organizing, understanding, and analyzing complex interconnected data. As shown in Figure 1, DataWalk is a graph AI platform and a full-stack solution for data analysis.

Figure 1. The DataWalk graph AI platform

DataWalk utilizes patented technology to rapidly execute complex queries and graph algorithms across vast amounts of complex data. DataWalk technology also enables a prototyping capability that couples an enterprise architecture with flexibility in providing changes, testing own hypotheses, and designing new analyses.

Specifically relative to AI, DataWalk utilizes AI facilities to enhance the platform, includes AI applications such as machine learning, and supports functions which improve the effectiveness of external AI tools such as Large Language Models (LLMs).

Relational/Graph Database

A foundational technology of DataWalk is an innovative graph/relational database hybrid. Data and connections are persisted, and data is shared at the data level to facilitate multi-user access and collaboration.

The graph structure enables:

The management and analysis of complex data
The ability to derive value from connections as well as values
Support for graph algorithms which can be run across vast amounts of data
Analysis of the relationships between individual objects on a link chart.

Complementing this is the relational structure, which enables OLAP analytics and traditional analysis of data values. The result is a powerful capability not only to do both graph and relational/OLAP functions, but to do this against the same instance of data without requiring data movement and additional data transformations.

Knowledge Graph

A knowledge graph enables creation of a graph data model, organizing and visualizing an organization’s knowledge in an intuitive “graph” structure. A knowledge graph - when properly done - is not “data centric”, but is instead “knowledge centric”. Data is not organized around the data sources, but instead can be re-organized around relevant business concepts such as customers, transactions, phone calls, and/or anything else. Knowledge is captured in nodes and edges and can be dynamically used by algorithms for various analytical tasks.

Figure 2. Example DataWalk Knowledge Graph

DataWalk’s knowledge graph interface enables data engineers, data scientists and less technical “citizen data scientists” to effectively collaborate and to accelerate results from analysis, e.g., machine learning. The knowledge graph is also available via API, enabling other applications to leverage this data and knowledge.

Not all knowledge graphs are created equal. In general, in the industry there are two types of knowledge graphs: property graphs, and knowledge graphs based on Semantic Web. Property graphs are data centric and have the potential to deliver excellent performance for graph algorithms executed across vast amounts of data, while those based on semantic web are knowledge centric and far more flexible for adapting to changes in the data structure.

In contrast, DataWalk’s unique knowledge graph technology delivers the benefits of both property graphs and Semantic Web. The DataWalk knowledge graph is highly flexible (i.e. changes can easily be applied without system interruption), with exceptional performance for graph algorithms and the ability to handle vast amounts of data. For further details, see DataWalk’s No Compromise Knowledge Graph.

Graph Analytics

Visual Querying

The DataWalk knowledge graph (called the Universe Viewer) enables you to directly perform ad-hoc, no-code complex queries via an intuitive visual interface, such that neither technical expertise or programming skills are required. This is a powerful capability that enables seamless and inexpensive prototyping and hypotheses testing. Queries are created in an iterative manner, and visualized in “breadcrumbs” such that you can clearly understand each step of a complex query and be assured that results will be reliable.

Patented DataWalk technology ensures that complex queries complete, and will complete quickly. As shown in Figure 2 below, DataWalk has published benchmark results showing that unlike traditional relational database systems, which fail to generate a result after a relatively small number of joins, DataWalk maintains linear response time through the equivalent of 600 joins.

Figure 3. DataWalk maintains linear performance through 600 joins.

Link Chart Analytics

Link analysis is a technique for identifying and analyzing relationships and connections between individual data elements. A simple example for an investigative use case is to identify the various connections of a specific individual - perhaps their vehicle, phone number, and known associates - and then to add the connections for each of those entities to detect (visually and/or automatically) if there are suspicious patterns and connections.

DataWalk is Enterprise-class software which includes a robust link analysis facility, with all your data pre-connected and at your fingertips to accelerate results.

Graph Algorithms

Graph algorithms enable automatic identification of patterns and connections between data in a graph. In addition to clustering, DataWalk supports a variety of in-database graph algorithms, including:

Pathfinding and search algorithms: Such as Find Paths and Find Shortest Paths.
Centrality algorithms: Such as PageRank, Personal PageRank and Eigenvector.
Community detection algorithms, as discussed above.
Others such as Link Prediction, Label Propagation, Graph Similarity, etc.

Clustering is a very useful technique for automatically identifying networks that represent a pattern of interest. This is done via execution of graph algorithms such as the community detection algorithm. An example of clustering is to automatically identify patterns in the data that may be representative of an organized crime group.

DataWalk utilizes proprietary patented technology that enables exceptional performance for automatically identifying clusters. Unlike graph databases, which must calculate clusters from scratch for any new data, DataWalk can incrementally expand clusters and thus operate dramatically faster with lower resource consumption.

Though many vendors support execution of graph algorithms, DataWalk’s unique technology often enables these algorithms to be executed far faster than is possible with other systems. For example, benchmark results of the find path algorithm performed against the high performance TigerGraph database showed that DataWalk is 2.1 - 3.7X faster for graphs with low and medium degree vertices. DataWalk was also significantly faster for high degree tests, though this could not be fully quantified as TigerGraph generated errors in 8 of the 10 tests and could not complete execution of the algorithm. For further details, see the benchmark results.

Scoring

DataWalk includes a facility for easily generating and tuning scores, which in effect are identifying patterns in a graph. DataWalk’s patented technology enables rapid generation of scores across vast amounts of complex multi-source data without coding. DataWalk also enables you to automatically detect the drift in score variance, enabling you to make better decisions. Customers have designed their own risk scores and achieved exceptional results - such as 90% true positives - by continuous, iterative tuning of rules and scores.

Inference

DataWalk supports an inference capability that enables you to intelligently derive new knowledge from the data and the business logic. DataWalk can incorporate existing links from external sources, and can calculate new links within our inference engine. It is not required that you generate links between data elements prior to ingesting data in DataWalk. DataWalk keeps track of all changes to data and ensures that all relationships and inferences are up to date.

DataWalk’s inference capability elevates data understanding by seamlessly combining property graph and RDF facilities, offering unparalleled flexibility. It dynamically associates disparate data sources, uncovering novel relationships for enriched insights. Tailor-made for incorporating business rules, DataWalk ensures that strategic decisions are grounded in comprehensive organizational knowledge. Further, it champions explainability and traceability, presenting clear, up-to-date logical pathways behind every inference, thus demystifying complex reasoning processes and fostering trust among stakeholders.

Machine Learning with Context

The DataWalk AI/ML Toolkit includes various facilities for enriching the DataWalk platform for data scientists and developers. It focuses on context, artificial intelligence, machine learning, and automation, offering a robust set of tools for data analysis, model development, data exploration, and integration with external services. The AI/ML Toolkit includes an embedded Jupyter Notebook interface, and is designed to work seamlessly with popular languages such as Python. The AI/ML toolkit also enables creating machine learning models over graph embeddings.

All computations are done within the DataWalk database, and this eliminates the performance impacts, computation delays, resource requirements, and operational issues associated with data movement and data access control.

DataWalk supports an automated machine learning pipeline, such that you do not need to maintain dozens of ETL tasks created from hundreds of lines of code.

Graph Embeddings

Though graphs lend themselves to being easily visualized, a different representation may be required in order to support various types of computations on graph data. Featuring the key capabilities of both property graph and semantic web, graph AI software such as DataWalk enables calculations of more computationally digestible formats called graph embeddings.

Graph embeddings are very useful in reducing the complexity of computations in machine learning (ML) and other AI tasks while retaining the contextual and structural information inherent in graph representations.

DataWalk is particularly good for executing graph embeddings on large volumes of data, without requiring specialized hardware (e.g. GPUs). For further details, see Graph Embeddings: A Breakthrough For Detecting High-Risk Accounts & Transactions.

ML Algorithms

DataWalk empowers your analytics with a versatile suite of in-database machine learning capabilities, all accessible through our dedicated Python package. From essential tasks like classification, regression, and clustering to sophisticated anomaly detection, our scalable solutions ensure efficient data processing and insight generation. For those seeking tailor-made analytics, DataWalk also integrates seamlessly with third-party Python packages, allowing for the execution of custom machine learning tasks. Unleash the full potential of your data with DataWalk's comprehensive and flexible machine learning toolkit.

Entity Extraction

Included standard with DataWalk is an entity extraction facility with over 60 languages supported. This facility parses documents to automatically identify and classify text content that represents things like names of people, locations, dates, and various other types of objects. These entities can then be linked to existing data already in DataWalk using various techniques.

Jupyter Notebook and Kedro

With DataWalk you can access your knowledge graph via an embedded Jupyter Notebook interface, which enables you to create a sandbox with a separate data model for machine learning usage. Then you can use a variety of machine learning algorithms to train your model, even for vast amounts of data.

DataWalk also includes a Kedro starter that guides Data Scientists to follow a predefined project structure to create both training and inference pipelines. When the best performing model is selected, it can then be instantly deployed in production without re-coding, with just a single click.

LLM Support

Increasingly popular AI applications, Large Language Models (LLMs) are great for general knowledge, understanding language, and summarizing content. However, they are not trained in domain-specific knowledge or your organization’s internal content, which are critical to providing relevant answers that your organization can trust.

DataWalk’s graph AI platform supports various LLMs (offline and via API) and can augment your LLM with your organization’s structured and unstructured data, so the LLM can reliably answer even internal organization-specific questions. DataWalk incorporates your organization’s knowledge with valuable context and relationship information so that the LLM summary is grounded in facts and hallucinations are reduced. With DataWalk you can simplify LLM deployment as there’s no need to write code to coordinate between your company’s information and your chosen LLM.

Other Facilities

The DataWalk Graph AI platform also includes:

The ability to easily integrate data from multiple disparate data sources, facilitated by the graph-oriented architecture.
Various facilities for viewing data, including dashboards, the ability to present a 360-degree contextual profile of any data element, and integration with various mapping facilities.
A search facility optimized for vast amounts of data, as well as a simplified Quick Search facility for casual users - on desktop or mobile devices - who simply want to search the DataWalk database and view a summary of the returned result.
The App Center, which enables programs to be safely inserted into DataWalk. These programs can be written by DataWalk, and by certified partners and customers.
System services to facilitate monitoring and management.
An alerts facility, where you can configure both alerts and alert notifications.
A permissions facility, where you can easily assign and manage highly granular permissions, with minimal performance impacts with vast volumes of data. This provides you with one centralized permission management schema to control access to your multi-source data.
An entity resolution capability, which enables you to easily identify and manage matching records.

Conclusion

Graph AI is an important advancement enabling the organization, understanding, and analysis of complex interconnected data. DataWalk provides a robust Graph AI platform with unique capabilities, including a uniquely powerful knowledge graph, the ability to rapidly calculate graph algorithms, supporting the embedded Jupyter Notebook on data with context, and effectively handling vast amounts of data. DataWalk both delivers AI functions and provides uniquely effective support for Large Language Models (LLMs). Focusing on an approach of “knowledge-first” instead of “data-first,” and having the system be easily expandable with new knowledge without major system rework enables DataWalk to make AI tools better.

© 2024 DataWalk Inc. All rights reserved. No portions of this document may be reproduced without prior written consent of DataWalk. Specifications, features, and functionality are subject to change without notice. DataWalk is a registered trademark of DataWalk S.A. All other brands or products are trademarks or registered trademarks of their respective holders and should be treated as such. Revision 0324.