The DataWalk Universe Viewer (UV) provides a global view onto all the datasets available to the user based on their specific access privileges. Data sets come from sources including any standard relational database, Microsoft® Excel files®, CSV files, web pages, Hadoop® HDFS, and any other source with a JDBC, ODBC, WSDL, or RESTful interface. This example demonstrates a combination of data ranging from synthetic and open-source to social media and subscription services.
In this hypothetical scenario, a data set was manually created (shown with green-arrow) using open-source content describing the Jalisco New Generation Cartel (Cártel de Jalisco Nueva Generación, CJNG). CJNG is a criminal group based in Jalisco, Mexico and headed by Nemesio Oseguera Cervantes ("El Mencho"), one of Mexico's most-wanted drug lords. The primary content of this set was assembled from DOJ poster boards of the key members, leadership, and familial relationships.
In our example, an analyst wants to search for a specific person related to the CJNG investigation. Using the basic “SEARCH” option from the primary DataWalk menus, a list of available fields is presented, and can be adjusted and configured to meet any specific content. The user is looking for information on ABIGAEL GONZALEZ VALENCIA. Not knowing the exact spelling of his name, the analyst applies a Soundex-transformation to a common spelling of Abigail by simply typing in SX(Abigail) to identify variations such as Abigael, Abigail, Abigale, Abigayle, Abegail, etc. More advanced matching for nicknames such as Gabby, Gail, Abbie, and Gayle could be applied with advanced features [shown later].
DataWalk supports a range of different search-options to help identify variations in the data. The (i) icon at the top of the menu (upper-left) provides a balloon-help cheat-sheet for different functions. These functions include wildcards (*), SX (Soundex), TYPO (letter errors), STEM, AND/OR, and REGEXP (to handle special configurations). Additional functions can be added per user-needs (e.g., Metaphone).
The search is conducted on different fields from different loaded sets to identify potential matches. DataWalk identifies results from four (4) sets including CJNG-People, Arrests, FBARs, and MSBs. Each is categorized by the source along with a sample set of fields to help the user identify which records/entities best match their select. In this case the name in the CJNG set is selected and the user brings the results into a Link Chart for link analysis
Using visually appealing icons, glyphs, colors, and related components, DataWalk link charts offer a viewpoint into the data, emphasizing key content, important information, and critical connections. In this example, the blue-star (image below, upper-left) represents the entity is in a leadership position within the organization and is set according to a value defined in the underlying data. Additionally, the red-circle (upper-right) signifies a status of the entity. In this case the letter “I” indicates he is currently “incarcerated/imprisoned” and others show (A) arrest, (F) fugitive, and (D) deceased. These markers are easy to customize to meet the needs of any investigation.
The next step is to see how this entity relates to other members of the CJNG organization. The analyst continues the link analysis by selecting the entity and invoking the “Add Linked Objects” menu on the right-side of the link chart interface. From here there are several options available including 1st and 2nd degree connections and a list of any connected sets. For this example, the CJNG-People set is selected and "Add Objects" is initiated to show the next-level of connections.
Based on the set-connections defined in the Universe Viewer (UV) the system follows all selected connections and brings back any new entities. The results shown below quickly depict the relationships from/to Abigael to other cartel members and family, and being able to add, visualize, and analyze this data reflects a fundamental value of link analysis. Every connection also displays the type/name of the relationship for these ten (10) new entities. The analyst quickly sees that NEMESIO OSEGUERA CERVANTES is another leader in the CJNG cartel and is currently a wanted (F) fugitive. Additionally, another entity (JENNIFER BEANEY CAMACHO CÁZARES), with an image, shows as his wife. Any desired level of detail can be shown in the labels, comments, or the diagram arranged using different placement techniques.
All the entities are selected and Add Linked Objects is reapplied to show the next level of connections. The results are easily arranged using different placement methods to minimize link cross-over.
At this point the analyst realizes there is some missing information from the diagram. Specifically, the link between the wife/mother and her children. The analyst decides to create a new link between JENNIFER and NOEMI using the “add connection” feature available in DataWalk top-level menu. Being able to manually create such links is a basic feature of a link analysis system.
Once this mode is activated, the analyst simply clicks on one of the entities and holds down the mouse (and the link follows the cursor) and selects the second entity and releases the mouse to establish the link. In this case, the “direction” of the link is not important, but in other cases, the order of connection defines the “flow” of the relationship.
Once the mouse is released, a pop-up menu requires the analyst to select what type of link to create. In our CJNG model there is only a single type of connection called “related” used to define the role and connect all cartel members. In other models, there can be multiple types of connections based on different needs and requirements, and it is a simple process to add additional link-types to the model.
In this specific model, the “related” connection allows the analyst to enter additional information and details regarding the linkage. As entities can have different types, roles, and relationships over a period of time, it is important to capture all of the details to ensure the proper fidelity is maintained for the analytics. In Figure 10, the number of attributes is fairly basic and is easily extended to add/change them to meet evolving needs.
The analyst enters the type of relationship (mother/child). Often these values are defined as an enumerated-type (e.g., a predefined list) chosen from a pick-list. Different types of components (e.g., date selectors, selection-boxes, spinners, etc.) are used to simplify data input. Once completed, the analyst “saves” the results and Figure 11 shows the new connection between the selected entities.
In this environment, a special configuration provides a “supervisor” with a notification that new information is added to the system. The system automatically detects this change and then signals an alert to the designated personnel regarding this situation. In the upper-right part of the display, an icon (red bell) visually displays an active alert with a count of the total number of outstanding (unread) alerts. Optionally, the supervisor can receive an email notification (or other notification via an external ticketing system) of this alert.
The supervisor can review the alert by logging into the system and invoking the “Workspace” dashboard to see which alert triggered. Using the same ringing red-bell (animated) the alert is identified and the supervisor clicks on the “New Objects” tab showing the one (1) new entry available for review. In this case, the system requires the supervisor to “Approve” the data change before any other users can see this information. Note: the analyst that originally creates the data can always see it in their own sandbox, but others are excluded until it is approved. In this example, the supervisor has 3 options; approve, deny, or request more information. The select-values are configurable to meet various agency or investigative needs.
Note: this same process is applied to the creation of new entities (e.g., cartel members). Once the supervisor approves the new data, all other authorized users will see the data next time they query the system.
Expand The Network (Walk Data)
At this point, the analyst continues the link analysis by expanding out the cartel network showing additional levels and relationships among its membership. The highlighted entities in the following diagram show those added entities.
Using the various placement techniques available within DataWalk, the analyst can define the best format to meet their analytical needs. The screenshot below shows the “hierarchy-top” to position each of the three (3) leaders at the top of the diagram and allow their connections to flow downward. This helps the analyst understand the different roles and significance of members in the cartel. Although this example is limited in size, there can be many levels represented.
At this time, the CJNG set is exhausted, as there are no additional data available to expand the network. However, in the Universe Viewer (UV), the CJNG set is connected to the “People” set which is comprised of the names of people derived from many different sets (investigations, corrections, financials, watchlists, arrests, registrations, etc.). The analyst selects the People set to see any new connections and uncovers there are two (2) matches from Zachary Manning (cousin) and Denise Cook (friend) both stemming from LILIANA ROSA CAMBA located in the lower-right of the cartel network diagram.
The People set has connections to a wide range of other sets and contains much more robust content. The analyst does a drill-down on both Zachary and Denise to see more specific details about their backgrounds. Then using the “Add Linked Objects” panel, the analyst chooses all of the available sets to see any additional connections. Note: the values for names, social security numbers, addresses, phones, and other personal details used in these examples are "synthetic" and are not intended to reflect any real-world person.
The system accesses each selected set to pull out any connections for either Zachary or Denise, as shown in Figure 18 below:
At this point there are two viable options to pursue to determine additional connections, behaviors, or related activities.
In the expanded view of Zachary, the BSA set shows both SAR (Suspicious Activity Report) and CTR (Currency Transaction Report) transactions and presents them geographically on a map showing their activity relative to their home address. We see a heavy concentration of SARs at two specific banks and wider usage of banks for CTR deposits.
The number of people shown in the network diagram related to this address indicates some type of “safehouse” usage and invoking the street-view option by right-clicking on the home address provides an automatic link to Google Street View to validate the address. As seen in the screen-capture, this property also has a larger number of vehicles present. Drilling-down further (expanding the network) on the other people shows they all have additional BSA transactions. The analyst classifies this group as “money mules” and will investigate further.
Switching back to Denise and expanding her BSA shows a similar number of transactions for SAR and CTRS. All the SARs are under $10,000 indicating some type of structuring behavior.
Showing the timelines for both SAR (green sphere) and CTR (orange sphere) transactions indicates there was a mix of both types of filings in the March-August timeframe. The analyst knows that people change their behaviors when their actions are being recorded. Beginning in July, Denise started to structure her cash deposits under the $10k limit (around $8k) to avoid the CTR filing forms. From that point, the bank started to exclusively file only SARs to document this suspicious behavior.
When her activity is presented on a geospatial map, we can see that her transactions are clearly conducted at locations along the US/Mexico border (US side) at various/different banks and institutions. Most SARs are reported from one specific location while the CTRs are reported by a number of different banks. Clearly there is some type of explicit intent for Denise to travel almost 30 miles to make her cash deposits. The analyst will further review this information.
When performing a Google Street View of her address, the results show a high-end estate nearing the end of its construction. The value of this property is $1.1M.
The analyst returns to the link chart to determine if the other connections will return any additional entities. When choosing the Add Link Objects, there is an option to Show Object Counts that provides the total number of entities that will be returned if the query is run. In this case the Intercepts set shows there are 707 records available for the phone. To avoid cluttering the display with this new data, the analyst right-clicks on the phone number entity and copies it into a new Link Chart display.
In this new display, the analyst expands the network using the Intercepts set resulting in a large concentration of connections. This type of data is not often used for “network” analysis but is much better suited for geographical (lat/long) and temporal (date/time) analyses.
These intercepts are the location records tied to Denise’s mobile phone and when displayed using the heatmap option, it shows a heavy concentration of activity in Brooklyn, NY. Each sphere represents a location reference. The darker colored spheres indicate higher concentration of activity (e.g., stay over, lingering, stopped).
When placed on a time chart, the analyst sees the activity occurred over a 5-day period: March 13-18. Each spike in the timeline shows the relative activity for that period and zooming-in provides better resolution (hours/mins).
As the analyst manipulates the timeline and focuses in on the first spike, it becomes clear that Denise flew to New York and landed at John F. Kennedy (JFK) airport around 4:30, taking about an hour to get her bags and hail a taxi (or rideshare). It appears she went directly from the airport to the Upper West Side in Manhattan via the Bronx (via the Major Deegan Expressway) where she stayed for approximately 1 hour.
Over the next several days, she concentrated her movements around Brooklyn, spent time in Staten Island, traveled out to Islip and Medford on Long Island, and also Queens. The analyst can infer the specific locations visited by Denise to cross reference them with other data sets to determine if they are significant or have any additional intelligence value.
Returning to the original link chart, the analyst further expands the phone number and finds a match against the Federal Firearms License (FFL) set. It connects with Fine Jewelry Inc. with locations in Southern California including San Diego, Solana Beach, Oceanside, Rancho Bernardo, and La Mesa (the location where Denise lives). Based on the connection between the Jeweler and the Cartel, there may be some type of high-end luxury buying, money laundering, or a front business, or some combination thereof. The analyst will continue to do research and find additional content to determine the nature of these relationships.
Fuzzy Matching (Aliases)
As a last step, the analyst checks to see if there are any potential “alias” entities matching Denise. For this configuration, the system is set up to identify entities that match on several conditions including: same gender, same race, same ethnicity, same year-of-birth, similar last names (Soundex), and live within 25 miles of each other (via zip code). In this set, there are three (3) matches generated: Didi Cooke, Densie Cooks, and Deniece Cooks. Any type of condition can be defined for “fuzzy” matching and can vary from set-to-set.
The final diagram shows all the entities and their connections. On the bottom of the display, there are a series of thumbnails showing the 19 steps used to go from the original entity to the final results. When the chart is saved (or restored) all of these steps are available for review. Thus, if the analyst is asked about the process used to find these results, it can easily be played-back using the history. And other analysts that access this chart will also see this history.
Finally, if the analyst needs to send a report (aka targeting package or dossier) to another person without access to the system, they simply create a PDF report with all of the relevant details. These reports are configured to client specifications to include headers/footers, watermarks, disclosure statements, and even agency logos. The analyst generates a report from the Folder associated with Denise.
DataWalk is a trademark of DataWalk S.A. Microsoft and Excel are trademarks of Microsoft Corporation. Hadoop is a registered trademark of the Apache Software Foundation.
This article is derived from an article previously published by the author on LinkedIn.