Frequently Asked Questions
Select your job role below to get answers to the questions that matter to you
For Analysts And Analyst Team Leaders
In general DataWalk is for teams:
- Doing intelligence analysis, fraud detection/investigation, AML alerting, and KYC
- With multiple power user analysts
- Who have a need to do analysis of data that spans many systems/sources
- That value a collaborative, Enterprise approach for the analysis system
- DataWalk is a server-based system intended to support multiple users and use cases.
- DataWalk can support separate application environments for production and development/test.
- DataWalk can support complex integration of both technology and business processes, and fit into Enterprise environments and workflows.
- DataWalk supports an optional high availability ("HA") configuration
- Enterprise environments may have more data sources and they may be more complex.
- Enterprise systems often need to be able to scale to larger numbers of users and vast amounts of storage capacity.
For Data Science Leaders and Data Scientists
DataWalk is an enterprise-class software platform for graph and advanced analytics. DataWalk enables you to fully understand your data and produce better data science products by rapidly connecting numerous large data sets into a unified knowledge graph for fast data access, analysis and investigation.
DataWalk can be thought of as a feature store that enables you to easily share, understand and reuse features. Features can be both tabular in nature and/or graph in nature, which is a particularly powerful DataWalk capability. Features can be created from data at scale. You can use ML on vast amounts of data stored in DataWalk without moving data.
Yes, DataWalk automates the machine learning pipeline at various stages:
- DataWalk ensures data consistency and no manual actions are required when new data is introduced to the system.
- Kedro starter guides Data Scientists to follow a predefined project structure to create both training and inference pipelines.
- When the best performing model is selected, it can then be quickly deployed in production without re-coding.
- DataWalk ML/AI feedback is instantaneous and based on real data. When your models are deployed, then feedback is automatically captured.
- Data access control is provided continuously
- Models can be re-trained automatically with new data and then deployed in production
Yes. DataWalk enables you to access any data source with a defined interface. Many connectors are currently available, and new connectors can typically be configured in a couple hours by DataWalk field engineers. Then you can quickly link all data in a business representation through the knowledge graph capability. In addition, DataWalk embeds AI technology to automatically complete, clean, overwrite, fix and enrich data so that it can quickly be used for further analysis.
With DataWalk you do not need to think about maintaining dozens of ETL tasks created from hundreds of lines of code. DataWalk is designed to support an entire enterprise data processing pipeline and automating it in the most optimal way, such that you can eliminate the need for further optimization regardless of whether it’s feature generation or any other data manipulation.
DataWalk accelerates Machine Learning across the entire lifecycle and delivers better results by:
- Enabling ML for vast amounts of data by eliminating data movement
- Enabling the end-to-end machine learning pipeline in a single platform
- Deriving new insights from relationships (context enabled by graph)
- Data governance, analytics and AI unification
- Automated data engineering pipeline
- Collaboration at the enterprise level
- Instant feedback loop
For IT
DataWalk is proven in highly secure environments. DataWalk has Authority To Operate (ATO) in institutions with a Top Secret data security level, as well in Federal Networks of DoD, DOJ, and others.
Security requirements vary by organization, and the DataWalk team will strive to ensure that our solution will meet your security requirements.
For Architects And Engineering Managers
DataWalk:
- Enables graph-based data exploration on multi-TB data sets.
- Can effectively and economically analyze huge graphs.
- Is uniquely suited to calculating complex clusters and automatically discovering hidden/distant relationships.
- Enables you to easily generate, understand, and quickly run complex queries which traverse multiple data sets.
- Typically enables you to apply changes to the ontology - and make it actionable - without modifying upstream processes.
- Can rapidly execute flow analytics on vast amounts of data.
- Ensures that any request executed via the DataWalk API is horizontally scalable and highly efficient.
- Delivers reliable results in dynamic environments where new data is being added.
- Delivers the above capabilities with a dramatic reduction in hardware requirements compared to solutions that operate using in-memory analytics.
- Graph databases require that you write an application (using a graph query language) in order to solve relationship-oriented analytic challenges. In contrast, DataWalk is a full-stack solution where coding is not required.
- Graph databases are excellent for OLTP applications, while DataWalk is far superior for complex analytical workloads. Pivots cannot be executed in graph databases.
- DataWalk is optimized to do batch analysis of nodes and edges with minimal (linear) complexity.
- Graph databases typically run in-memory, but this is not required with DataWalk.
- A graph database lacks the semantics to summarize data along some attributes, while DataWalk has many functions to do this.
- Speed of access/analysis when operating on sets of objects and links.
- Far superior for graph traversal.
- Automatic connection generation
- DataWalk has consistency checks, which graph databases lack