Frequently Asked Questions

Select your job role below to get answers to the questions that matter to you

Frequently Asked Questions

Select your job role below to get answers to the questions that matter to you


For Analysts And Analyst Team Leaders

From an analyst perspective, what is DataWalk?
DataWalk is an Enterprise-class software platform for rapidly connecting numerous large data sets into a unified view/repository, for fast data access, visual analysis and investigations.
Who is DataWalk for?

In general DataWalk is for teams:

  • Doing intelligence analysis, fraud detection/investigation, AML alerting, and KYC 
  • With multiple power user analysts
  • Who have a need to do analysis of data that spans many systems/sources
  • That value a collaborative, Enterprise approach for the analysis system
Who are your customers?
Our customers include U.S. Federal Government agencies such as DoD, DHS, and Department of State; leading financial institutions (e.g., a top-20 bank in the U.S.); national intelligence agencies; insurance companies, larger municipal law enforcement agencies, and various other commercial businesses.
How does DataWalk help me do my job?
In general DataWalk enables you to get much better results, much faster. Specifics depend on your particular application and use case.
How does DataWalk help me get faster, better results?
In many environments data is spread across multiple silos, such that it is a slow, manual process to locate, collate, and query/analyze the desired data. In DataWalk all data is pre-connected and at your fingertips for access, which in itself saves a significant amount of time. DataWalk then further accelerates results by automating key analytical processes; enabling users to easily generate, understand, and quickly run complex queries; and to execute graph algorithms that quickly automate the process of finding distant connections and automatically finding patterns indicative of organized crime groups across all your data.
Is DataWalk just a visualization tool?
No!! DataWalk has powerful visualization capabilities, but DataWalk is built to enable you to generate analytical products. Our system is uniquely suited for analysis (e.g., advanced visual querying of complex data, easily generating and modifying sophisticated scores, easily integrating scripts for complementary functionality via the App Center, and graph analytics).
Is DataWalk just a link analysis tool?
No! DataWalk includes a robust link analysis facility, but link analysis is only one of DataWalk’s capabilities. Others include rapid data integration; easy creation (and quick execution) of complex visual queries; graph analytics across vast amounts of data, an end-to-end solution for machine learning using real data, and serving as an intelligence database.
What do you mean when you say DataWalk is a platform?
DataWalk enables you to build analytical products on top of the system. You can insert other applications into DataWalk, and you can insert DataWalk in other applications.
Does DataWalk support real-time fraud detection?
DataWalk can be part of a real-time decision making process via API integration.
Can DataWalk serve as my intelligence database?
Yes! DataWalk includes an embedded big data repository where any or all of your data can be connected, stored, and securely accessed for instant 360-degree views leveraging all of your data. One place to look for any of your data!
Can less technical users access this intelligence database just to pull a record?
Yes, this is enabled by DataWalk Quick Search, which is available for both desktop and mobile devices. This of course requires that users have appropriate permissions, and all such accesses are logged.
Can DataWalk analyze text?
Yes. DataWalk supports an NLP facility for entity extraction; the ability to process data in foreign languages, and the ability to automatically identify patterns in text content. DataWalk also can effectively parse documents in PDF, Word, and other formats.
Does DataWalk integrate with third party services such as Thomson Reuters CLEAR, LexisNexis, etc.?
Yes. DataWalk has existing integrations with many such services, and integrations with new services can often be generated by DataWalk in a few hours.
What do you mean when you say that DataWalk is an “Enterprise class system”?
  • DataWalk is a server-based system intended to support multiple users and use cases.
  • DataWalk can support separate application environments for production and development/test. 
  • DataWalk can support complex integration of both technology and business processes, and fit into Enterprise environments and workflows. 
  • DataWalk supports an optional high availability ("HA") configuration 
  • Enterprise environments may have more data sources and they may be more complex.
  • Enterprise systems often need to be able to scale to larger numbers of users and vast amounts of storage capacity.
Who are your competitors?
Functionally, Palantir Gotham is probably the most similar competitive product (though DataWalk is far more cost effective than Gotham).
How is DataWalk different than i2 Analyst Notebook?
Harris/IBM i2 Analyst’s Notebook is a desktop-based system specifically for link analysis. DataWalk is an Enterprise-class system for link analysis and much more, including the ability to effectively analyze vast amounts of data. It’s like comparing a Ferrari to a motorcycle.
Does DataWalk require any client-side software?
No. DataWalk is a server-based system accessed via a web browser.
Does DataWalk support machine learning?
Yes. DataWalk enables users to easily prepare data for Machine Learning, including features, vectors and training sets. Once data is prepared, it can be used as an input to customer’s models. Simultaneously, DataWalk provides an autoML/ xAI capability that allows analysts to train and explain models automatically. Regardless of whether models are created in or outside DataWalk, the system enables analysts to produce, deploy, measure and maintain them in DataWalk.
Does DataWalk include case management?
DataWalk includes some rudimentary case management capabilities, but most customers have existing case management systems, and our strategy is to integrate with those.
What are training requirements for DataWalk?
Most analysts can productively use DataWalk after taking our half-day eLearning course. Multi-day training is typically appropriate for power users.
Can I just download your software and do a quick free trial?
No. DataWalk is robust Enterprise-class software, such that a downloadable self-driven trial is not practical.
Is there any way that I can do a trial of DataWalk?
This is a possibility that your DataWalk account manager can consider.
If needed can I just use DataWalk as a drawing tool, to draw nice link charts?
Though DataWalk is not optimized for simple drawing, it is possible to create drawings in the system.

For Data Science Leaders and Data Scientists

From a data science perspective, what is DataWalk?

DataWalk is an enterprise-class software platform for graph and advanced analytics. DataWalk enables you to fully understand your data and produce better data science products by rapidly connecting numerous large data sets into a unified knowledge graph for fast data access, analysis and investigation.

DataWalk can be thought of as a feature store that enables you to easily share, understand and reuse features. Features can be both tabular in nature and/or graph in nature, which is a particularly powerful DataWalk capability. Features can be created from data at scale. You can use ML on vast amounts of data stored in DataWalk without moving data.

Who is DataWalk’s ML/AI solution for?
DataWalk typically fits best for organizations that want to use context (i.e., insight about relationships) in their advanced analytics and AI projects, and who value our solution for other analytic and data management facilities (e.g., complex no-code querying, graph algorithms, knowledge graph and data store).
What part of the Machine Learning pipeline does DataWalk cover?
DataWalk covers and automates the end-to-end ML pipeline, starting from data ingestion and preparation, to feature generation, model training, deployment and monitoring.
What Machine Learning language does DataWalk support?
Can I use Jupyter Notebook for data exploration and model creation?
Yes! DataWalk provides an embedded and integrated Jupyter Notebook interface and offers a Python Machine Learning library that enables data exploration, visualization and machine learning with or without data movement.
What libraries are enabled by DataWalk? Can I import sklearn or use Pandas?
The DataWalk ML facility provides out-of-the-box embedded ML algorithms that enable ML processing at scale and not embedded libraries such as sklearn and Pandas. If desired it can be expanded with any other Python packages.
What ML algorithms does DataWalk offer?
DataWalk provides a variety of machine learning algorithms for regression, classification clustering, anomaly detection, time series and dimensionality reduction sorts of tasks.
What tasks can be performed with DataWalk ML?
DataWalk ML enables structured embedded and non-embedded ML analysis. For unstructured tasks such as text or image analysis, standard DataWalk features, 3rd-party tools and other non-embedded Python packages can be used.
Does DataWalk fit in enterprise environments?
Yes, all machine learning enterprise processes, such as data governance, security, performance management, and data feed workflow, are enabled and ensured by DataWalk.
Does DataWalk offer ML pipeline automation / MLOps?

Yes, DataWalk automates the machine learning pipeline at various stages: 

  • DataWalk ensures data consistency and no manual actions are required when new data is introduced to the system.
  • Kedro starter guides Data Scientists to follow a predefined project structure to create both training and inference pipelines. 
  • When the best performing model is selected, it can then be quickly deployed in production without re-coding.
  • DataWalk ML/AI feedback is instantaneous and based on real data. When your models are deployed, then feedback is automatically captured.
  • Data access control is provided continuously
  • Models can be re-trained automatically with new data and then deployed in production
Does DataWalk offer any data engineering tools? How can I connect my data?

Yes. DataWalk enables you to access any data source with a defined interface. Many connectors are currently available, and new connectors can typically be configured in a couple hours by DataWalk field engineers. Then you can quickly link all data in a business representation through the knowledge graph capability. In addition, DataWalk embeds AI technology to automatically complete, clean, overwrite, fix and enrich data so that it can quickly be used for further analysis.

With DataWalk you do not need to think about maintaining dozens of ETL tasks created from hundreds of lines of code. DataWalk is designed to support an entire enterprise data processing pipeline and automating it in the most optimal way, such that you can eliminate the need for further optimization regardless of whether it’s feature generation or any other data manipulation.

How and where can ML models be deployed?
Embedded models created in DataWalk are deployed directly inside DataWalk DB, while models not embedded are deployed using the DataWalk App Center as a container.
What are the key benefits of DataWalk’s ML solution?

DataWalk accelerates Machine Learning across the entire lifecycle and delivers better results by:

  • Enabling ML for vast amounts of data by eliminating data movement
  • Enabling the end-to-end machine learning pipeline in a single platform
  • Deriving new insights from relationships (context enabled by graph)
  • Data governance, analytics and AI unification
  • Automated data engineering pipeline
  • Collaboration at the enterprise level
  • Instant feedback loop
What is “graph ML”?
Graph Machine Learning is a set of tools used for processing network data and leveraging the power of the relationships between entities that can be used for predictive, modeling, and analytics tasks. Graph ML is a key DataWalk capability.

For IT

From an IT perspective, what is DataWalk?
DataWalk is an Enterprise-class software platform built around DataWalk’s Graph/Relational Database Hybrid technology.
Can DataWalk integrate with my SSO?
Yes. DataWalk can integrate with any SSO that supports SAML 2.0, LDAP, Kerberos, or Kerberos+CAC.
Can DataWalk integrate into a broader workflow?
Yes, the DataWalk Platform is a multilayered architecture with a RESTful Interface layer. There are also other interfaces to integrate DataWalk in a larger enterprise workflow.
Can DataWalk meet my security requirements?

DataWalk is proven in highly secure environments. DataWalk has Authority To Operate (ATO) in institutions with a Top Secret data security level, as well in Federal Networks of DoD, DOJ, and others.

Security requirements vary by organization, and the DataWalk team will strive to ensure that our solution will meet your security requirements.

Does DataWalk support an audit capability?
Yes, DataWalk collects every possible action that was performed by users. System administrators have the ability to configure information collected from 40+ different metrics.
Does DataWalk support a high availability configuration?
Yes, DataWalk can be configured in a high availability configuration with no single point of failure. In addition, data backups can be performed without interrupting system operations.
What type of servers does DataWalk require?
Linux: Red Hat 8.5-8.9, or Amazon Linux 2
How many servers are required in the DataWalk infrastructure?
A minimum DataWalk implementation typically consists of four servers: an Application Server, an Integration Server, a Compute Server, and a Management Server. As DataWalk is a scale-out system, additional capacity can be added by adding additional Compute Nodes. The number of servers required, and the CPU/RAM requirements of each, will depend on the volume of data, number of users, and any requirements for high availability.
What storage is supported?
Storage options include cloud storage (AWS, Azure Blob), as well as specified on-premise S3-compatible storage arrays.
Can DataWalk be deployed on virtualized environments?
Yes, DataWalk can run on VMware and most other virtual environments.
Can DataWalk be deployed in the cloud?
Yes. DataWalk can run on AWS, Azure, and other cloud environments and supports dynamically scaling the environment by adding or deleting nodes as needed.
What are requirements for network interconnectivity of DataWalk servers?
For connectivity between Application Servers and Compute Nodes, GbE is sufficient. For connectivity between DataWalk Compute Nodes, 10GbE is recommended.
Is DataWalk a SaaS offering?
No, though if needed DataWalk partners may be able to provide a DataWalk SaaS solution. Contact us for more details.
Can DataWalk integrate with my source systems?
Yes. DataWalk connects to many sources, and if the source has a defined interface, any new data source can typically be integrated in a couple hours.
Does DataWalk integrate with my backup software?
Yes, though the DataWalk backup facility should be used, with the output then collected by your backup software.
Does data need to be transformed/cleaned before being loaded in DataWalk?
No. If needed, data can efficiently be transformed and normalized in DataWalk.
Can I use container technology (e.g., Docker) to deploy DataWalk?
No. DataWalk uses internal technology for creation of containers, but DataWalk itself cannot yet be deployed into a container. This item is on the DataWalk product roadmap.
Can I deploy DataWalk on-premise?
Would DataWalk personnel have access to my data?
Do you support TLS 1.2?

For Architects And Engineering Managers

From an architect’s perspective, what is DataWalk?
DataWalk is a Commercial Off The Shelf (COTS) software platform built on proprietary DataWalk graph vectorization technology. The DataWalk platform enables you to leverage DataWalk’s unique way of computing to build internal and external data products.
Isn’t DataWalk just a link chart visualization tool?
No! DataWalk certainly provides the ability to visualize link charts, but this is a very small part of what we do. DataWalk’s core capabilities are around fusing and analyzing data across many data sources for fast visual querying, graph analysis, and machine learning.
What are DataWalk’s unique technologies and capabilities?


  • Enables graph-based data exploration on multi-TB data sets.
  • Can effectively and economically analyze huge graphs.
  • Is uniquely suited to calculating complex clusters and automatically discovering hidden/distant relationships.
  • Enables you to easily generate, understand, and quickly run complex queries which traverse multiple data sets.
  • Typically enables you to apply changes to the ontology - and make it actionable - without modifying upstream processes.
  • Can rapidly execute flow analytics on vast amounts of data.
  • Ensures that any request executed via the DataWalk API is horizontally scalable and highly efficient.
  • Delivers reliable results in dynamic environments where new data is being added.
  • Delivers the above capabilities with a dramatic reduction in hardware requirements compared to solutions that operate using in-memory analytics.
Can you tell me more about DataWalk’s ability to handle large graphs?
Yes, DataWalk can economically and effectively analyze ultra-large graphs with billions of nodes/edges. This is enabled by the ability to quickly move data from disk drives to analytics, via DataWalk’s graph vectorization. To our knowledge DataWalk is the only commercially available tool that enables the file system to be used for these types of computations; this enables lower hardware cost by enabling computations on data that doesn’t fit in RAM.
Can you tell me more about calculating complex clusters and automatically discovering hidden/distant relationships?
Yes, DataWalk is uniquely suited to calculating complex clusters and automatically discovering hidden/distant relationships. With DataWalk you don’t need to worry about depth of the traversal, as you can easily reach data at distances greater than six degrees of separation.
Can you tell me more about DataWalk’s flexible ontology?
Yes, DataWalk typically enables you to apply changes to the ontology, load new data, and - unlike a data lake, data warehouse or other mechanisms - makes it actionable without modifying upstream processes. This is enabled by DataWalk’s flexible ontology, which in turn is enabled by DataWalk’s graph vectorization capabilities.
Can you tell me more about DataWalk’s ability to do flow analytics?
DataWalk can rapidly execute flow analytics on vast amounts of data, as DataWalk’s unique approach to data indexing enables execution of graph and OLAP analytics at the same time, without data movement. This enables DataWalk to build pivot tables and aggregations on-the-fly.
Can you tell me more about DataWalk’s ability to deliver reliable results in dynamic environments?
Yes, DataWalk delivers reliable results in dynamic environments where new data is frequently added. DataWalk’s Consistency Assurance facility automatically, properly inserts new data into the index.
I have many different internal and external data sources. Can you connect to these?
Yes. DataWalk has numerous connectors available for a variety of data sources, and importantly, we can typically configure integrations with new sources in days.
Does DataWalk run on top of a graph database?
No. DataWalk has developed a hybrid graph/relational database.
How is DataWalk different than a graph database?
  • Graph databases require that you write an application (using a graph query language) in order to solve relationship-oriented analytic challenges. In contrast, DataWalk is a full-stack solution where coding is not required.
  • Graph databases are excellent for OLTP applications, while DataWalk is far superior for complex analytical workloads. Pivots cannot be executed in graph databases.
  • DataWalk is optimized to do batch analysis of nodes and edges with minimal (linear) complexity.
  • Graph databases typically run in-memory, but this is not required with DataWalk.
How is DataWalk superior to graph databases for analytics?
Four key things:
  1. A graph database lacks the semantics to summarize data along some attributes, while DataWalk has many functions to do this. 
  2. Speed of access/analysis when operating on sets of objects and links.
  3. Far superior for graph traversal.
  4. Automatic connection generation
  5. DataWalk has consistency checks, which graph databases lack
Can you tell me more about DataWalk’s advantage over graph databases using semantics to summarize data along attributes?
DataWalk has an Online Analytical Processing (OLAP) capability. While a graph database lacks the semantics to summarize data along some attributes, DataWalk has many functions to do this. For example, you may want to create a view of the number of sales reported by each sales rep for each of the last 24 months. This is impossible to do in a graph database without creating a program, while in DataWalk this is a built-in function.
Why is DataWalk far superior to graph databases for graph traversal?
In contrast to graph databases, DataWalk performs massive joins for graph traversal. With this approach, proper sorting of disk tables can make all operations to be merge joins instead of hash joins. This reduces the complexity of solutions from exponential O(N2) to linear O(N). Further, such operations use sequential disk access which is more efficient than traversal in graph databases, which in general requires random access to memory. This is analogous to using an array and linked list. Scanning the second one is slower than the first one because every step requires access to a random address in memory when the array just proceeds to the next memory cell.
Why is DataWalk’s ability to automatically generate connections a key advantage relative to graph databases?
A graph database is just that, a database. You must write programs to generate the links between objects and then store those links in the database. DataWalk allows you to specify an unlimited number of algorithmic links between objects, and the DataWalk system then generates these links automatically whenever objects are added or updated.
Can’t I just build the equivalent of DataWalk myself, maybe by using a graph database?
In general this is not practical. First you have the significant issue of building and maintaining an application, and note that building DataWalk has required hundreds of Engineering man-years of development and over ten patents. Second is that - as indicated above - a graph database simply cannot support the same functions as DataWalk.
How can I feel confident that DataWalk can scale to handle vast amounts of data?
DataWalk is architected specifically around the design principles associated with linear scalability. The linear scalability of DataWalk has been proven with deployments analyzing many billions of records, and our customers are using DataWalk to operate on data at levels previously unachievable.
Do I have to load all my data into DataWalk?
No, you don't need to load all the data. Our technology will index the information necessary to support your business users and represent it as a high-hierarchy ontology. This usually means that only part of the data is indexed.
Why do I have to load any data into DataWalk?
In order to execute sophisticated algorithms on vast amounts of data, it is required that the data is indexed. Attempting to execute such algorithms via a federated approach (reaching out to other data sources ad-hoc) is not practical with billions of records.
What is the underlying database engine for DataWalk?
DataWalk has developed a hybrid graph/relational database. The DataWalk application is self-maintaining such that no Database Administrator (DBA) is required, and all maintenance activities are executed via DataWalk application interfaces. You should think of DataWalk as a single application with an embedded database, and you will not need to certify the DataWalk database.
Is DataWalk a monolithic application?
No. DataWalk is based on a multi-layered architecture, and each layer can be distributed in a different network.
Get a live demo