Thanks to the versatile analytical capabilities of the hybrid graph-relational engine, the DataWalk platform proves to be an exceptional tool for supplementing LLMs. This allows the creation of LLM agents that can seamlessly solve complex analytical tasks by interacting with the system to dynamically traverse graphs, calculate pivot aggregations, perform searches, and much more. Moreover, DataWalk can also be used to feed the right data to the LLMs by using standard approaches like RAG and GraphRAG, enhancing the overall performance and reliability of the system. The system’s capabilities in managing knowledge graphs make it an ideal platform for advanced AI applications, providing the necessary infrastructure to support complex workflows and data integrations.
In today's rapidly evolving tech landscape, Large Language Models (LLMs) have emerged as powerful tools capable of delivering enormous benefits. However, they come with their own set of challenges and limitations. At DataWalk, our Innovation team has been exploring ways to combine LLMs with Knowledge Graph systems to address these challenges and enhance their capabilities. In this paper, we'll delve into DataWalk’s recent research and development efforts, showcasing how this combination can be leveraged to create more effective and reliable LLM solutions. This paper will be interesting to those thinking about how knowledge graphs, genAI, and LLMs will support their future strategic initiatives, including chief data officers, data architects, and data scientists.
LLMs are known for their general knowledge, generalizability, and in-context learning abilities. However, they also have notable drawbacks:
The popular Retrieval Augmented Generation (RAG) approach involves transforming documents into a vector database. When a user asks a question, it is converted into a vector and compared with the document vectors to find the most relevant ones. The LLM then uses these documents to generate a response. This method leverages LLM's summarization strengths and improves output quality by using private datasets while avoiding the high cost of fine-tuning.
While effective at incorporating recent data, RAG has limitations:
Another method of addressing LLMs' limitations is to combine them with Knowledge Graphs, building on the foundation set by RAG.
GraphRAG aims to address some of the limitations of RAG by incorporating graph-based methods. We explored two variations:
Building on the GraphRAG approaches, we developed a reasoning agent to automate workflows using knowledge graphs. This agent represents a significant advancement in our research, as it provides a dynamic and intelligent way to solve and manage complex and spontaneous tasks that were previously difficult to manage with LLMs alone.
The reasoning agent is designed to generate high-level plans for complex and spontaneous tasks, iterate with users, and generate the necessary steps to execute these tasks. Using repeatable code greatly reduces hallucination and combined with the user feedback iterations, can nearly eliminate hallucinations. This process involves several key steps:
Our approach is inspired by the FlowMind framework, which functions in two primary stages: an initial lecture to the LLM on the task context and available APIs, followed by workflow generation using the APIs and workflow execution to deliver the result to the user.
The first stage involves a lecture on the context, available APIs, and underlying knowledge graph ontology. We adhere to a generic lecture recipe to ensure the LLM has a clear understanding of the overall goal, the scope, inputs, and outputs of the functions in the APIs. The lecture recipe includes three key components:
Stage 2: Workflow Generation and Execution
In the second stage, the LLM generates the workflow using the provided APIs and executes it to deliver the results to the user. An optional feedback loop allows users to review and provide feedback, enabling the system to adjust the generated workflow accordingly.
This comprehensive approach is illustrated in the following diagram:
The reasoning agent in DataWalk is built with several key components that enhance its usability and efficiency:
Consider a scenario where a banking investigator needs to prepare a report on suspicious transactions involving a client named Thomas Miller. The traditional approach would require the investigator to manually analyze transactions flagged as potentially fraudulent. With the reasoning agent, the process is streamlined:
Create a report describing the people with whom the suspect Thomas Miller made transactions. Calculate the sum of inbound and outbound transactions made with the suspect for each associated person. Then create a written report and save it to the Reports set using the Content attribute.
Knowledge graphs offer a powerful solution to some of the inherent limitations of Large Language Models (LLMs) by providing rich relational and domain-specific context, thereby reducing hallucinations and improving accuracy. DataWalk enhances the reliability and broadens the applicability of LLMs through its ability to perform complex graph analytics on extensive data sets. The introduction of the reasoning agent marks a significant advancement, offering auditability, reusability, and the capability to dynamically handle complex queries. The synergy between DataWalk and LLMs promises to drive the development of increasingly sophisticated analytical solutions. We are enthusiastic about the potential of this technology and anticipate further innovations in this field.
DataWalk is an enterprise-class software application and platform, for quickly integrating and analyzing vast amounts of data spread across your various data sources. DataWalk enables both business and IT organizations to improve their efficiency, productivity, and competitiveness, while also meeting key technical and security requirements.
DataWalk meets a variety of needs across your organization by enabling you to fuse, clean, normalize, and connect your vast amounts of siloed data into a single source that can be used to feed AI/ML applications, Business Intelligence tools, and various other applications. DataWalk also enables you to perform sophisticated analyses of your data via complex no-code queries, graph algorithms, link analysis, machine learning, and more.