Stephen Arnold of Beyond Search sat down with DataWalk’s Chief Analytics Officer Chris Westphal to get an update on the DataWalk product.

Chris Westphal

Steve: This is Stephen E Arnold. DataWalk is a next-generation policeware and intelware system. The company’s approach makes the DataWalk technology platform attractive to fraud investigators as well as commercial firms wanting to extract value from their data. DataWalk is an analytics system which allows a user to fuse data, detect issues, and perform investigations quickly, often in a fraction of the time required for Palantir Technologies or IBM i2 Analysts Notebook users. DataWalk, founded in 2016, has offices in Redwood City, California. Today I am speaking with Chris Westphal, who is the company’s Chief Analytics Officer. Chris previously founded Visual Analytics and sold that company to Raytheon. DataWalk is now a leader in presenting insights from the firm’s platform in a way that eliminates the confusion and delays associated with investigative and intelligence systems. Thank you for agreeing to speak with me.

Chris: Thank you, Steve.

Steve: High profile policeware and intelware systems like IBM i2 Analysts Notebook and Palantir-type systems are criticized because the systems are hard to use. What's DataWalk done to make the investigative and analysis processes less onerous?

Chris: First, we have our icon interface – you drag-&-drop new data sets and model their content after loading.

Second we have micro-services and a growing app-center of third party providers’ - libraries, content, access points and advanced functionality. For example, Dark Web content from providers like Shadow Dragon.

Third, an investigator can select a data set and perform no code ad-hoc queries, create complex workflows, generate risk scores, and have rapid access to those services or data.  Better and faster results on large amounts of data.

Steve: My understanding is that i2 and Palantir type systems require the user to go through data loading hoops which takes time?

Chris: Correct… these two platforms are built on older-frameworks and concepts and both use proprietary formats. Users have to “map” their data into a pre-defined ontology. This means a user has to transform the raw data to make sure everything fits/maps into their data structures. This approach locks the users into their platform. The time and expense of getting data “out” of these systems can be high. Just ask The New York Police Department and other investigative teams – they’ve found its time consuming, expensive, and inflexible.

Steve: Mapping content --- sometimes called tagging or flagging --- is a key function. How fine-grained is the DataWalk mapping process?

Chris: Users often think in terms of signals or flags. Some are subtle; for example, whether a person of interest is alive/dead, associated with a gang, arrested for violent crimes, owns a weapon, been incarcerated, etc. DataWalk operates at this level of detail. And in DataWalk, we configure graphical glyphs to mark an object for specific content to let the user know “hey, more info available here.” The user simply clicks and sees the deeper data. We like our customers to work smarter.

Steve: Data today arrives in gigabyte and terabyte blocks. What’s DataWalk’s real-world data capacity?

Chris: We are doing POCs with 25 terabytes of data, but the system scales far beyond that. We’ve even architected petabyte scale systems.

Steve: A recent article talked about a data fabric. What’s DataWalk’s approach to big data?

Chris: I saw the article and the phrase keeps turning up. There’s even a Data Fabric for Dummies book.  The “fabric” idea is that a framework (an architecture) combines and controls content (data services) to help organizations make use of their information across different application end-points (of which DataWalk is one). That’s a core capability of DataWalk - basically a single-pane-of-glass. We provide user-interfaces and APIs to create different data flows and content based on a user’s role. Some users need to do a basic query; others have to do proactive pattern analysis. Users are handled coherently by the DataWalk data fabric across permissible sources.

Steve: A data fabric implies integrating many types of data from different systems. Can you give me an overview of the data intake capabilities of DataWalk?

Chris: Sure. Let me highlight several interfaces. First, we have SQL-compliant database drivers. Second, the system handles flat files like spreadsheets and delimited value content. Third, we offer a data parsing framework to handle special formats such as XML, JSON, UFED, and others. Fourth, our “App Center” is where DataWalk clients and partners can create specialized interfaces via our APIs - sky’s the limit.

Steve: What data providers are using your “App Center”?

Chris: Off the top of my head we are working on building connections to LexisNexis, Thomson Reuters, Whooster, PIPL, Machine Learning libraries, open-source exploitation vendors, some image and geocoding companies, natural language processing providers, and case management systems. The main point is that DataWalk is engineered to be future safe.

Steve: I think Palantir-type systems often include a pre-structured organizational set up for content. What’s DataWalk’s approach?

Chris: That’s a good point of comparison; the rigidity of competitive systems. DataWalk does not have a fixed ontology. Every source, every format, every layout can be accommodated and then using ELT (extract, load, transform) the appropriate analytical models are created post-load.  It’s easy to change and try different approaches.

Steve: So DataWalk is a data blender?

Chris: I like to think of DataWalk as a next generation “data router.” Our platform can access content, add-value to that content, and flow the content to other systems as needed.  

As discussed earlier - it is an open-data-fabric framework, an architecture with micro services.

Steve: Can you give me an example?

Chris: A user can load some data from different systems, quickly identify target-entities, pull out a remarks or comment field, send that to a natural language processing function, and have the NLP (like Rosoka) extract entity names, bring those back into DataWalk where we check those names against active watchlists. Once the data router or fabric is set up and configured, it does what it does with the click of an icon.

Steve: What about TikTok and other types of rich media content?

Chris: DataWalk can handle most types of content. TikTok or any other platform can be processed to pull metadata out of the content. Coupling directly with their APIs or using something like ShadowDragon, DataWalk can access titles, creator names, comments, and dates plus other information returned from the API call.

For example -- extract specific items of data from the source, then make a call to an ML video detection library or an NLP system to detect objects like weapons or vehicles, recognize entities, transcribe audio to text, or perform foreign language translation.  

Steve: You mentioned mobile phone data. How can DataWalk contribute to an investigation with the information from these devices?

Chris: One of our clients had boxes of seized mobile phones. The extracted data - phone calls - were processed by DataWalk’s interface and connected to thousands of images sent to a machine learning system to answer requests like “show me only pictures with money” or “flag images with nudity.”

Steve: So you are saying that DataWalk has no investigative glass ceiling.

Chris: Correct. DataWalk can incorporate any needed libraries, logically assimilate their capabilities into our workflows and analytical processes. This is the breakthrough idea behind the data router/fabric functionality.

Steve: So DataWalk is an investigative data orchestra conductor.

Chris: You have the main idea. Plus, if you think about silos of data in some large organizations, maybe a medical research facility or a government agency, then DataWalk acts as the integrating framework where different silos are combined, different federated queries (via the App Center) are initiated, and other (non-silos) data are integrated – all managed with proper security.

Steve: Can you give me an example?

Chris: A good example of this is the ARCOS data, which is a DEA data source of controlled substances reporting Oxycodone transactions between the manufacturers and distributors to doctors and pharmacies. DataWalk made it possible to identify, in minutes, high-volume pill distribution in low-density populated areas. Why is pharmacy-X in a town of 2,500 people ordering in excess of 50,000 pills?

Steve: What hoops did you have to jump through to identify this result?

Chris: We simply used a CENSUS dataset, connected the zip code from the address of the point-of-sale entity and we quickly had our answers. We’ve been doing this on a much larger-scale across multiple silos for a variety of different agencies. Other models included open-source crime reports, supplier/distributor registrations, and entities with prior actions like prosecutions.

Steve: Palantir Technologies’ system pricing can hit seven figures. What the price of a DataWalk system?

Chris: Our license costs are significantly lower. Our customers don’t have to decide between reducing headcount or having a next-generation system. The savings apply to support, maintenance, and engineering time - plus we can become operational more quickly.

Steve: If a viewer wants to contact DataWalk, what do you suggest?

Chris: Use the form on our Web site at

Steve: OK Chris, thank you very much.

Chris: Thank you Stephen!



Get a live demo