Graph Analytics:
The New Game-Changer For AML

whitepaper

 

Executive Summary

Whenever anti-money laundering (AML) professionals or those in the fraud community talk about their problems, they often say that one of the biggest challenges is the number of false positives generated by existing monitoring/alerting systems. As regulators increasingly push banks to capture any indications that might lead to an escalation, this results in having alert rules being tuned to flag anything that is suspicious, which results in many unnecessary alerts and increased costs.

Anyone analyzing financial crimes, whether they are reviewing alerts or are an intelligence analyst, seeks to speed up the examination process and increase the number of effective escalations. 

Graph technologies are game-changers in this fight of combating money laundering. Gartner Group lists “graph” as one of the industry’s key trends and has stated that graph analytics are critical for understanding misbehavior[1]. Graphs fundamentally change how we interact with data, enabling us to understand the context extracted from the relationships between entities, places, accounts, and so forth. Graph is used for:

  • AML investigations
  • Dramatic reduction of AML false positives
  • Fraud detection
  • 360-degree KYC analysis
  • Customer Due Diligence 
  • AI/Machine Learning 
  • Other non-investigative sorts of applications

Graph extends data discovery capabilities, enriching information and going far beyond what is possible with conventional technologies. This can significantly reduce false positives and increase detection of the true suspicious financial patterns.

Conventional analytics technologies utilized by traditional monitoring systems store and display data in tables or charts. Graph technologies complement such systems by identifying and analyzing the relationships and connections among the data elements. 

According to Gartner’s research, currently only 10% of innovative projects currently use graph technologies, but this will significantly rise to 80% in 2025[2]. According to this forecast, most banks and other enterprise organizations will become users of graph in the coming years regardless of their core area of focus. Organizations who more aggressively adopt graph technology can reduce costs, improve compliance, better protect their company, and gain competitive advantage.

 

Graph: A Quick Introduction

While traditional systems focus on storing and analyzing values, graph technology identifies and analyzes the connections and relationships between data objects providing additional insights for further analysis. 

With traditional analysis, things like basic aggregated information on the number of transactions for a specific bank account are commonplace. With a graph, it’s easy to derive information whether that account transacted with other accounts of interest, or if an account is directly or indirectly connected to a suspicious person, IP address, phone number, or anything else.

Figure 1. Comparing traditional table-based data to graph analysis

Figure 1. Comparing traditional table-based data to graph analysis

 

To many, “graph” may simply imply visualizations like bar charts or line charts. However, the network graphs referred to in this document are very different. Figure 1 below shows a simple graph, visualizing data elements and their relationships.

Figure 2. Network graph is very different than traditional charts.

Figure 2. Network graph is very different than traditional charts.

 

Graph naturally leads us to the term graph analytics; a set of techniques and capabilities to help discover new patterns and trends. The table below describes several types of graph analytics:

Figure 3. Types of graph analytics

Figure 3. Types of graph analytics

 

Too many false positives

Let’s answer a critical question: beyond the over-regulated control system, why do financial institutions have so many false positives?

Generally, we can divide this issue into various categories:

  • “Data lens” problems – Several years ago, to build a digital representation of a bank's customer, you'd use maybe 50 attributes stored in 3-5 data sources, including PII data, a risk score, some KYC data, and maybe products/services used. Doing the same today might take over 500 attributes stored in dozens of silo'd sources, including mobile app logs, new local/global payment systems, leaked-databases (e.g., Pandora Papers, Panama Papers), social media, customer segmentation, and so forth. Effectively finding false-positives or false-negatives hidden deep inside an intricate network of data sources is time-consuming and often impractical.

    Using conventional technologies, generic filters and static reports are often applied against the data, which limits the ability to understand the intricate relationships among people, accounts, phones, and other entities. Without the full context, monitoring systems may either recognize innocent customers as being suspicious, or miss activities that truly are suspicious. It is easy to miss important insights about a customer or counterparties without using a broader lens, i.e., data from other company systems. For example, the monitoring system won't generate an alert for multiple transactions of $4K made in the same hour from unrelated business bank accounts. However, investigators after a detailed examination may find that these businesses have common stakeholders. All transactions may have been done from the same IP address, while other browser fingerprints are the same. Lack of such context generates false positives and limits the ability to identify misbehaviors.

  • Data quality problems – Disparate silo'd data sources, data inconsistencies, varying formats, missing values, and data duplicates all negatively impact the detection processes and cause a large number of false positives. To borrow a phrase from computer science, "garbage in, garbage out," where “garbage” input data produces useless output regardless of the analytics used.

    A complicating factor is that criminals may intentionally seek to change their personally identifiable information to try and outmaneuver investigators by changing a single letter or digit in names, addresses, or social security numbers. Using all customer data, buried in dozens of data silos, to identify duplicate records causes significant problems for legacy systems and existing analytical infrastructures.

  • Data analytics problems – Although the number of data sources and quantity of data has dramatically increased in recent years, the human capacity to understand and interpret results remains constant. From a technology perspective, a typical set of rules or vectors for detecting suspicious transactions is relatively flat and has been a default approach for years. As data becomes more complicated and intertwined, traditional approaches using SQL queries to analyze data start to breakdown, can’t scale, or become too complex. In practice, this results in configuring conventional alerting systems to perform basic checks such as an account transacting with any counterparty with previously escalated alerts, without verifying the entire existing context.

Example 1: A transaction by a high-risk customer previously alerted when they transacted with a party on a sanctions list, but that alert was later deemed not to be problematic. Then the alerting system will again flag a second similar transaction as high risk, as that customer has multiple alerts and the counterparty is on the sanction list. It does not consider that the previous alert was generated for the same counterparty and was closed, indicating this transaction shouldn’t be alerted or be alerted but flagged as low risk. This scenario is visualized in the network graph shown below.

Figure 4. High-risk customer and alert - false positives generated by conventional systems without using the connections context.

Figure 4. High-risk customer and alert - false positives generated by conventional systems without using the connections context.

 

Example 2: The conventional alerting system cannot identify indirect, multiple-hop relationships that may indicate a suspicious transaction. This scenario is visualized on the network graph below.

Figure 5. With conventional AML tools a customer was marked as low-risk regardless of the fact that he is connected with blacklisted accounts and customer.

Figure 5. With conventional AML tools a customer was marked as low-risk regardless of the fact that he is connected with blacklisted accounts and customer.

 

Graph can significantly change the trajectory of today’s inefficient fight against money laundering.

Graph eliminates the problem of data silo’d 

Graph technology is uniquely suited for being able to quickly integrate and link internal and external data, regardless of its complexity and size, through connections between business entities, processes, and events. A knowledge graph can potentially take this one step further, enabling you to re-organize the data not around data sources, but around relevant business objects such as customers, transactions, phone numbers, or anything else. This enables you to derive new insights and identify patterns from vast amounts of complex data. With graph analytics, customers can easily combine AML data, fraud data, external data such as public registries or offshore leaks databases (e.g., Pandora Papers), and seamlessly uncover tacit knowledge from the connections and relationships. Systems like DataWalk generate and save all of these connections, to assure all customer data is pre-connected and ready to use. With graph, all relevant information can be delivered on a single pane of glass, eliminate data fragmentation problems and dramatically accelerate the triage process.

Graph improves data quality
Graph technology is an excellent fit for entity resolution for key attributes such as names, addresses, phone numbers, bank accounts, and identification numbers. Using techniques such as graph-based expert rules or advanced fuzzy matching, graph technologies support quick identification and merging of possible matches or duplicates at scale. Once found, graph algorithms (e.g., page rank) can determine which entity should be dominant considering the number of existing interactions compared to others. In addition, graph is well-suited to track the lineage of all entities, and data transformations become fully transparent and explainable for non-technical users.

Example 3: Entity resolution graph visualization. With graph you can group alerts having a single analyst manage them at once, contextually not separately. This will accelerate the triage process.

The network graph below presents four entities matched using various fuzzy techniques such as Eudex, Soundex, Double Metaphone, etc. Graph technology can provide full transparency and explanation of what techniques have been applied to match records.

Figure 6. Matching entities on a network graph

Figure 6. Matching entities on a network graph

 

Graph powers data analytics

Graph analytics are uniquely suited to expose anomalies in large and/or complex datasets. Further, graph can power other types of analysis such as AI/machine learning, providing sophisticated contextual insight derived from relationships.

Some graphs provide a knowledge for interacting with “graph” data. Knowledge graphs vary in their nature and available facilities, but generally they provide a streamlined view of large amounts of complex data in a simple visual interface oriented around understandable business objects such as people, accounts, SARs, transactions, and anything else. This aligns the analysis with the way people think instead of exploring data with complicated technical constructs.

Figure 7. Example of a knowledge graph for a financial institution – AML 360°

Figure 7. Example of a knowledge graph for a financial institution – AML 360°

 

Graphs have the unique ability to quickly provide answers such as which objects are connected and how. Graph algorithms, which as the name suggests are algorithms used specifically to analyze graph data, leverage this connectivity, uncovering hidden conditions such as: 

  • one SSN used by multiple people
  • multiple SSNs used by the same person
  • multiple SSNs being used by different people
  • an address (safe-house) used by many people
  • the same IP address used to access multiple accounts by unrelated people

 

Example 4: A graph algorithm called shortest path can automatically determine the shortest number of connections or hops between two objects in a graph. For example, we can specify if there are fewer than seven hops between a customer and an entity on a blacklist AND there is shared PII data and transactions made within 5 days, then mark that customer as high risk. 

Figure 8. With graph-based solutions a customer was marked as a high-risk respecting of the fact that he is connected with blacklisted accounts and customer

Figure 8. With graph-based solutions a customer was marked as a high-risk respecting of the fact that he is connected with blacklisted accounts and customer

 

Example 5: Consider a case where a previous alert (by a conventional alerting system) was generated for the same counterparty but was closed; i.e., the name of the counterparty and the sanction list name is likely a coincidence and someone has already verified it. This results in lowering the risk for an alert/customer from high to low.

Figure 9. When analyzing the graph, John (client) was marked as low risk, taking into account the fact that the previous alert was marked as a false positive

Figure 9. When analyzing the graph, John (client) was marked as low risk, taking into account the fact that the previous alert was marked as a false positive

Example 6: A graph algorithm called community analysis connects objects to known networks (also called clusters), such as an organized crime network. This is a very powerful capability, potentially enabling in minutes identification of a crime ring that otherwise may take days or weeks to identify manually. For the example below, customer and counterparties 1 and 2 will be marked as suspicious as they share PII with counterparty 3, which has multiple SARs filed.

Figure 10. Graph algorithm (community detection) enables you to capture suspicious graph constructs (e.g., organized crimes) as rings describing them with variables such as cluster size, number of rings, perimeter, etc.

Figure 10. Graph algorithm (community detection) enables you to capture suspicious graph constructs (e.g., organized crimes) as rings describing them with variables such as cluster size, number of rings, perimeter, etc.

 

The unique insight from graph algorithms can be converted into rules or machine learning features in order to enhance predictive capabilities for reducing or eliminating false positives and identifying new suspicious schemas. 

Following are examples of features that can be used for machine learning, or for generating graph-based rules:

Example rule: if the distance between object A and blacklisted objects is lower than six hops, AND the object A is a part of a suspicious cluster with at least one ring, then mark this object as high risk.

 

For today’s financial institutions, the channels of distribution/promotion, interactions with customers, and regulatory requirements can be highly complex. This results in a critical need to ask complex questions on complex structured and unstructured data, which is often impractical using conventional technologies such as SQL.

 

Figure 11. Visualization of a complex question

Figure 11. Visualization of a complex question

 

To help answer complex and ad-hoc questions, many use cases require graphs as an enabling technology. The ability to generate complex queries across all data supports quick verification of hypotheses and comprehensive examination of conventional rules to know which should potentially be tuned to reduce false positives.

In systems such as DataWalk, graph-based queries, also considered as analyses or rules, can be weighted and combined to create features for a score or for machine learning coefficients. Powering machine learning with more data and insight from connections makes this sort of analysis more accurate and helps organizations determine which alerts are false and true positives.

 

The Bottom Line

Graph technology can provide new capabilities and compelling benefits to help financial institutions (do a, b, c). Companies that adopt this emerging technology will enjoy a competitive advantage over their peers.

 

About DataWalk

DataWalk is a scalable, no-code, graph analytics software platform. DataWalk’s graph analysis foundation enables you to connect all your data, understand structures, and identify patterns in large, highly connected datasets through an intuitive knowledge graph. This includes data import, data prep and linking, data exploration, data analysis (including machine learning) and data lineage. DataWalk effectively supplements case management and monitoring systems weeding out false positives and improving the number of successful escalations. 

---

To learn more visit: https://datawalk.com/solutions/anti-money-laundering/

[1] “Connecting the Dots: Why Graph Analytics Are Key to Understanding Human and Machine Misbehavior,” by Jim Hare, Gartner Group.

[2] Source: https://www.gartner.com/en/newsroom/press-releases/2021-03-16-gartner-identifies-top-10-data-and-analytics-technologies-trends-for-2021

 

Register to Get PDF

    Umów prezentację!

      Przesyłając swoje dane osobowe, zgadzasz się na otrzymywanie wiadomości od DataWalk SA. Możesz zrezygnować z otrzymywania dalszych informacji w dowolnym momencie. Aby uzyskać więcej informacji, zapoznaj się z naszą Polityką prywatności
      X
      Get A Live Demo!
      Form is temporarily not available. Please visit our contact page.
      X
      Get A Live Demo!

        Please note that by submitting you agree to receive messages from DataWalk. You may opt out of receiving further communications at any time. For further information, please see our Privacy Policy
        X