One of the essential needs of Enterprise-class analytic tools is the ability to scale to large volumes of data.
DataWalk technology uses a horizontally scalable architecture for storing and processing data, with a highly standardized, universal physical data model. This universal physical model is never changed, and translates any set of business queries into a finite set of queries with exceptional efficiency. What makes it possible is a connection of characteristics of graph databases (though DataWalk does not actually use a graph database) with the efficiency and scalability of MPP system processing (Massively Parallel Processing).
A unique data storage solution enables flexible information management with the high efficiency required for deploying Enterprise-class analytical environments. PiLab’s “Abstraction Engine” technology allows users to ask any queries in business terminology, without using SQL or other programming languages. Thanks to this technology, fast, complex, multi-dimensional analyses can be done on large, multi-billion record data sets.
PiLab’s FlexStructure technology is what enables the physical structure of DataWalk data to remain unchanged, regardless of the type of implementation, business logic or objects involved. Thus, the logical data structure can be flexibly modified on the fly (i.e., a logical data model) and there is no need to introduce changes in the physical model or interrupt system operation to change this structure. As the DataWalk structure is highly standardized, data can be evenly distributed across many nodes, which enables equal distribution of computing power necessary to quickly get answers. With DataWalk, the cost of changing the model of business objects is so low that users can experiment with the logical model and modify it freely in real time (e.g., creating new connections, modifying the existing ones, or adding new sources and object descriptions).
DataWalk API – Communication
The DataWalk system is built in Java. Each component of the application has its own service (REST). Therefore, data and analyses done in DataWalk can easily be made available to other applications. For example, data can easily be exported to R for statistical or predictive analytics.
The connection to data is one of vital elements of the DataWalk technology. To retrieve information out of DataWalk, customers can take advantage of the RESTful access and JDBC / ODBC.
DataWalk Universe Viewer
The Universe Viewer, based on patented technology, provides an aggregated view of imported data sources via a graphical interface. The DataWalk Universe Viewer describes meaning, topology and data catalogs for the end user in a unique way: no additional work is necessary and both the queries and the answers received can be interpreted very easily. Further, the Universe Viewer visualizes data and connections between them. The DataWalk Universe Viewer combines a flexible analytical data model with a fully operational analytical environment where users can directly perform complex analytical queries (analyses, hypotheses) without using SQL, scripts or additional programs. The Universe Viewer allows easy identification of complex relationships between data, as well as fast and immediate filtering of n-distant (directly and indirectly connected) large data sets. Combining business modeling methods with data discovery and blending facilitates easy iteration structures and analyses. As a consequence, the integration process is several hundred times faster, and the cost of making mistakes is dramatically reduced, as compared to traditional systems.
DataWalk Link Generator
The DataWalk Link Generator calculates and stores connections on the basis of defined rules so that the system will not have to compute them each time an analysis is performed. The Link Generator enables complicated analyses to be quickly executed, based on advanced connection rules.
Instead of fixed tables and preprogrammed and predesigned analytical flows, DataWalk has a built-in mechanism of flexible data connection in a logical layer. When something is changed, the entire analytic process proceeds without the need for programming or interrupting system operation. A link calculates and stores information on relationships between objects. Links can be generated based on simple rules (e.g., Field A = Field B), or on very advanced business rules, which enable connecting data even in the absence of a primary key – foreign key relation. Links can be aggregated on the fly to create composite connections. As a result, more analyses can be carried out on the same data set so there is no need to create additional data sets.
Adding an ontology is performed on the fly. Due to a visual character of data exploration, it is the end user who decides about the meaning of particular objects, descriptive attributes and links. This enables data of different types and structures, and from many sources, to be integrated into one cohesive picture which reflects natural, human perception of information.
DataWalk Drop Folders
Each time a new structure is designed in DataWalk, the system creates specialized folders, called “Drop Folders”, which are compatible with CSV files. When a CSV file appears in a newly made folder, DataWalk automatically maps headings to the created structure, and then imports data and calculates connections between them. In addition, the system has a built-in “Add Excel” function which enables adding new data in .xlsx files, and connecting them to the analytical environment in a few seconds so as to carry out an analysis or add new filters or conditions to existing analyses. Data, after being added, is ready to be used at once, which is revolutionary as hypotheses are verified by means of analytical context extension.
DataWalk Predicate Permissions – access control
One of the biggest challenges with data storage and queries in data systems is to ensure that data and the results of system processing are consistent with user privileges. DataWalk addresses this challenge by means of three levels of privileges:
- access to sets of objects per user
- access to an attribute of an object per user
- access to an object by means of access predicates per user
The system administrator can define a range of filters per set that are connected with a given user. The filters are applied transparently each time the system is queried by the user. The added value of such a solution is provided by the following facts:
- the rights are not demanding; they are settled by the system while performing a query, not after the query has been processed, which increases efficiency and decreases loading,
- the rights are manageable and do not affect system efficiency,
- predicates set on calculating columns allow a dynamic change of rights to objects according to changing data,
- predicates can be set on columns which cannot be accessed by a user.
DataWalk LDI – adding data in real time
DataWalk Live Data Insertion (LDI) enables the system to acquire data on the fly, without impacting its analytical potential. The system can handle processes that require a constant data upload without interrupting system operations (i.e., there is no need for maintenance windows).
DataWalk Object Search – a search engine
A built-in search engine enables a user to find an object of interest quickly in various data sets, e.g., a customer, a contract, a facility, an insurance policy, an incident, etc. Simultaneously, the solution shows which set a searched element comes from and search results themselves are presented in tabs of particular sets and sorted according to the degree of matching the search term. A user can move directly from the object search to the analysis of the chosen object.