Data mining generally refers to using advanced techniques – including machine learning, statistical analysis, and querying - to find patterns and anomalies in large volumes of data. DataWalk is a full-stack solution for data analysis, with a robust set of features for data mining:
Scalability: A baseline capability for data mining is the ability to process large volumes of data, and DataWalk is architected specifically to process vast amounts of structured and unstructured data. DataWalk uses a scale-out architecture, such that the increased processing power and capacity which may be required for data mining is available simply by adding more servers to the pool.
Saved queries/workflows: Advanced querying capabilities are a fundamental aspect of data mining, and in DataWalk you can easily create complex visual queries and workflows, and save those for re-use. Unlike other systems for data mining, where there may be a risk of generating queries that simply cannot be executed because they include a number of database joins, DataWalk’s unique technology enables response time to track linearly with the number of joins, and ensures that complex queries done as part of data mining can always be executed.
Support for Machine Learning models and Neural Networks: DataWalk is an exceptional platform for supporting machine learning and neural networks, which are commonly utilized in data mining. DataWalk provides an end-to-end solution to serve, maintain, and retrain models that you create, so that you can easily monitor and tune your models throughout their life cycle. You can do all data prep in DataWalk, creating an entry vector for ML models, and then train the model based on data that resides in the DataWalk repository, and serve the model with that same data. The results can be stored in DataWalk and be used to derive new discoveries.
Statistical Analysis: All of the above for machine learning also applies to doing statistical analysis in DataWalk as part of data mining activities. DataWalk currently provides Python and R libraries, though any other tool can be used as well.
Graph Algorithms: DataWalk includes facilities for exceptionally fast execution of graph algorithms that are often used in data mining. Compared to the fastest graph databases, DataWalk provides dramatically faster execution across large graphs for algorithms such as “find paths”.
Clustering: One method of data mining is the automatic identification of clusters. In DataWalk you can easily specify cluster characteristics of interest, and the system will scan your data – on a regular schedule if desired - and automatically identify cluster patterns.
Rules: With the ability to identify patterns a key element of data mining, DataWalk’s ability to enable you to quickly generate, save, retrieve, and modify rules via a simple visual interface is a key capability.
Scoring: DataWalk provides a scoring mechanism where you can quickly generate or modify scores, across any number of queries, statistical analyses, and machine learning modules. You can score any objects (e.g., people, properties, activities, locations, etc.) to effectively spot patterns across all your data during the data mining process.
Interoperability with other data mining solutions: DataWalk is an open system which is built to easily integrate and interoperate with other data mining tools such as DataRobot, NetOwl, RapidMiner, Rosoka, and many others.