DataWalk Technology

DataWalk has developed a hybrid graph/relational database to enable quick integration and fast analysis of vast amounts of complex data.


Full-Stack Analytics Platform

DataWalk is a full-stack analytics platform which is architected to seamlessly scale to handle vast amounts of data, regardless of the shape of your data and the questions you want to ask of your data:

  • Ingest without extensive preparation, using either passive sinks, an ETL tool, DataWalk connectors, or the REST API. There’s no need to worry about how you will want to later use or analyze the data. High performance, full security, and no need to map extensive ontologies.
  • Transform data during or after ingest, so that it takes the shape that you want. DataWalk can automatically repeat any transformation steps with new data.
  • Store data with full compression on a highly scalable, secure storage infrastructure.
  • Query data visually and ask any question of your data without limitations. There is no query that is too complex!
  • Visualize your results.
  • The DataWalk API enables you to use all of the above as part of your automated workflow.
  • A User Interface for all of the above eliminates the cost and delays associated with relying on data scientists and scripting.

Unique DataWalk Technology For Data Processing and Querying

As shown in Figure 1, DataWalk is a comprehensive system with various intuitive interfaces for accessing and analyzing data. The application server level takes requests from users or other systems and transforms them to a set of executions at the graph/relational database layer. DataWalk’s unique scale-out technologies are in the areas of information processing and querying at the database layer.


Hybrid Graph/Relational Database

A foundational technology of DataWalk is an innovative graph/relational database hybrid. Data and connections are persisted, and data is shared at the data level to facilitate multi-user access and collaboration. The graph structure enables management and analysis of complex data, the ability to derive value from connections as well as values, support for graph algorithms which can be run across vast amounts of data, and analysis of the relationships between individual objects on a link chart. Complementing this is the relational structure, which enables OLAP analytics and traditional analysis of data values. The result is a powerful capability not only to do both graph and relational/OLAP functions, but to do this against the same instance of data without requiring data movement.


Designed For Scalability

DataWalk is architected to scale in order to enable analysis of vast volumes of data. DataWalk achieves this using a horizontally scalable Massively Parallel Processing (MPP) architecture for storing and processing data, with unique technology that solves the three major problems associated with horizontal scalability, regardless of the business model or data mapping performed:

  1. Even distribution of data across multiple nodes
  2. No data rebalancing needed to execute queries
  3. Maximum information-join on stored content

DataWalk’s unique, commercial-grade data storage solution provides flexible information management with high efficiencies required for deploying Enterprise-class analytical environments. This technology delivers fast, complex, multi-dimensional analyses that are quickly completed on large, multi-billion record data sets.



Modeling And Querying Via A Knowledge Graph

The DataWalk Universe Viewer is an enterprise knowledge graph which provides a view of all imported data through an intuitive graphical interface showing how all the data is interconnected. If desired, you can re-organize your data around understandable business objects such as people, transactions, events, or anything else.  This abstraction layer enables you to operate at the level of data sets, makes it practical to integrate and analyze vast amounts of data, and enables you to find matching patterns.

The Universe Viewer is the system interface for both modelling and querying, and allows precise identification of complex relationships between data, as well as rapid and immediate filtering of both directly and indirectly connected data sets. The Universe Viewer is easily configured for each deployment, without a fixed ontology. Combining business modeling methods with data discovery and data blending creates simple, reproducible structures and analyses. Thus, the integration process is often several hundred times faster than traditional systems. Integrating different data types and structures, from many sources, into one cohesive picture reflects a natural, human perception of information and makes DataWalk an easy-to-use system for performing complex analytics.


Optimized For Complex Querying

The Universe Viewer allows you to directly perform ad-hoc, no-code complex queries via an intuitive visual interface, such that neither technical expertise or programming skills are required. Queries are created in an iterative manner, such that you can clearly understand each step of a complex query and be assured that results will be reliable. Patented DataWalk technology ensures that complex queries complete, and will complete quickly. As shown in Figure 3 below, DataWalk has published benchmark results showing that unlike traditional relational database systems which fail to generate a result after a relatively small number of joins, DataWalk maintains linear response time through 600 joins.


DataWalk Flexible Data Representation

DataWalk utilizes a logical data structure that is presented on the Universe Viewer and is easily modified on the fly. There is no need to make changes to the physical model or disturb system operation to change this structure. The DataWalk structure is highly standardized, with data evenly distributed across many compute nodes to rapidly obtain answers. With DataWalk the cost of changing the logical structure is so low (and easy) that you can experiment with the logical model and freely modify it in real time. For example, you can easily create new connections, edit existing ones, or add new sources and object descriptions.


The DataWalk App Center

The DataWalk App Center enables programs or subroutines to run inside the DataWalk environment, although they are not part of the DataWalk code base. Apps can utilize DataWalk data and analyses, but run in a separate execution space such that an App cannot impact the reliability of the core DataWalk application. Apps can be generated by DataWalk as well as by certified partners and customers, for functions including integrations, features, and machine learning models. The App Center enables open-source libraries to be integrated into DataWalk, thus dramatically expanding the potential scope of system functionality. The App Center includes a graphical user interface, and Apps can also be initiated via the DataWalk API (loosely analogous to AWS Lambda).


DataWalk API – Communication

The DataWalk system is built in Java. Internally each component of the application has its own service (REST), and external access is similarly supported using APIs. Data and analyses done in DataWalk are easily made available to other programs. The connection to enterprise data stores is easily accommodated with DataWalk technology. To retrieve information from/to platforms such as Oracle, DB2, Hadoop, or other commercial systems, you can take advantage of RESTful access.


DataWalk Link Generator

The DataWalk Link Generator permits complicated analyses to be efficiently executed, based on advanced connection rules. Instead of fixed tables and pre-programmed, pre-designed analytical flows, DataWalk supports flexible, persisted data connections in a logical layer. When something is changed, the entire analytic process updates without the need for programming or interrupting system operation.

A link calculates and stores information on relationships between objects. Links can be generated based on simple rules (e.g., Field A = Field B) or with more advanced business rules to connect data, even in the absence of a primary key – foreign key relation. Links are aggregated on the fly to generate fast and accurate results. Note that while links typically are automatically calculated based on dependencies and rules – including highly advanced rules with fuzzy logic - DataWalk also enables you to import data with connections already defined.


DataWalk Data Ingestion

DataWalk uses a very flexible, adaptable, and generic method for importing content. To retrieve information from a relational database you can utilize JDBC. There are a number of available connectors to various data services and commercial solutions. You also can configure custom connectors based on JSON or XML based REST interfaces. Any data ingestion triggers a model recalculation (dependency refresh) process to keep data consistent. External systems can register ingestion events in DataWalk, after which DataWalk will reach into the registered source when appropriate.


DataWalk User Permissions & Access Control

A major challenge with analysis of sensitive data is guaranteeing that data and the results of system processing are consistent with user privileges. DataWalk explicitly addresses this challenge with three levels of privileges that support granular permissions while maintaining exceptional performance:

  1. Access to sets of objects per user
  2. Access to an attribute of an object per user
  3. Access to an object using access filters per user


The system administrator defines access filters, per dataset, for a given user or group of users. The filters are applied transparently each time the system is queried by the user. The added value of filters is supported by the following features:

  1. The access rights are not demanding; they are processed by the system while performing a query, not after the query has been processed, which increases efficiency.
  2. The access rights are manageable and do not affect system efficiency. Filters set on calculating columns automatically change the objects according to data; filters can be set on columns which cannot be accessed by a user.

DataWalk Search: Optimized For Vast Amounts Of Data

The DataWalk search facility is optimized to meet the challenge of providing useful results when searching vast amounts of data. DataWalk supports a targeted search facility that enables you to configure the specific fields you want to search – across any of your desired data sources - for any type of entity, delivering results that are far more precise.


Watch DataWalk Technical Overview Video


Learn what DataWalk can do for you

Talk With Our Experts
Get A Live Demo