Connecting the Dots:
Know Your Customer Better With Graph Analytics And Machine Learning



For decades all customer risk score models and KYC analyses have relied on flat data, thus missing any insights that could potentially be gained from connections and relationships. Leveraging AI applications hasn't eliminated these limitations. BSA regulations require financial institutions to use all information to know who their customer is. How can you know who your customer precisely is and how to assess their risk without understanding their relationships between other customers, accounts, employees, and companies?

The same regulations also require banks to continuously monitor customers' behaviors for changes that may impact customer risk. Unfortunately, even though advanced analytics such as machine learning has enhanced traditional risk models, those processes are difficult to maintain and are not optimized, considering how data and requirements change over time. This actually increases the exposure to risks and requires reinventing again the way of doing risk analysis.

With the combination of graph analytics and machine learning you can now effectively address these challenges and meet regulatory obligations.

Seamlessly combine all your bank data to get a holistic view of your customers

Expressing the customer risk for assessment of creditworthiness or to meet anti-money laundering regulations has become an extremely complex task over the years. Modern risk assessment requires a holistic approach including a wide variety of data, techniques, and processes. The rhetorical question arises: how much do we really know about our customers without connecting all the dots?

Today, when trying to determine who our customer is we should use all customer data that has been generated, both when interacting with the bank and also outside the bank. A holistic view may shed a fundamentally different light on the risk of the customer. In practice, this often means needing to fuse and analyze dozens of internal and external data sources.

Unfortunately, disparate silo'd data sources, data inconsistencies, varying formats, missing values, and duplicate data all negatively impact the analysis processes, consume time, and cause many fundamental problems. The problem is exacerbated with vast amounts of data, which is now commonplace with the dramatic increase in data sources and quantity of data in recent years.

To cover the ever-changing AML or KYC scenarios required by regulators, the ability to quickly integrate siloed sources, where AML/KYC and compliance teams become active model creators uncovering tacit knowledge from relationships, becomes a key emerging trend for the financial crime community.

With this, an approach that enables extreme agility in fusing large disparate data sources – in a manner that is fast, simple, takes minimal effort, and does not require coding - is highly desirable.

The circumstances in which banks operate requires a radical change in their interaction with data, where data is ideally reorganized not around data sources, but around understandable business data elements such as customers, accounts, transactions, SARs, products, alerts, and so forth. This interaction should not require any technical skills, such that the dots can be automatically and permanently connected with the assistance of subject matter experts.

If needed, the entire data picture should be easily expanded with whatever data sources banks want to have available for analysis regardless of the size, and without relying on IT staff. In such a way, they can easily combine all customer data and continuously expand the perspective by using up-to-date external data such as public registries or offshore leaks documents (e.g., Pandora Papers). This enables capturing the holistic knowledge banks may have about customers to assess the risk more precisely, reducing the potential number of false positives in monitoring.

Uncover insight from relationships to enhance risk scoring

The typical approach to assessing customer risk is very flat and is constrained by data sources. This limits the ability to identify and explore the intricate relationships among people, accounts, companies, and other entities. This results in inaccurate risk models and generates a huge number of false positives.

To formulate the contextual picture of customer risk, banks have a critical need to generate contextual insight on complex structured and unstructured data from multiple sources, which is often impractical and even impossible using conventional technologies such as SQL. Moreover, SQL is the domain for technical staff who are focused on technology, not on business problems.

Thinking about risk models holistically, the ability to connect the dots for vast amounts of complex data form various sources without coding is becoming a panacea for the AML/ KYC and compliance teams. The entire context should freely and instantly be generated by analysts from relationships and be used by the risk models to indicate things like an entity having adverse media being escalated which is connected to an organized crime network, or finding the relationships between transacting parties and how these relationships have looked like over time. Quick answers to the above questions convert a flat risk assessment into a multidimensional assessment, increasing the organization's knowledge of the risk.

Examples of contextual insight that can be used in assessing holistic risk:

  • Bank customers with transactions over $5K executed from darknet IPs, with parties having SARs, whose UBOs have been mentioned in the Pandora Papers offshore data leaks in the context of money laundering.
  • Organized networks that consist of multiple low and high-risk accounts, with minimum of one ring with funds transfers of over $1M.
  • High-risk customers with multiple alerts that have previously been rejected as false positives for the same reason.
  • Bank customers who performed transactions from an Eastern European location extracted from IPs, who physically visited other banks to set up accounts. A while later, they transferred funds to those new accounts.
  • Bank customers who belong to a suspicious ring network that consists of at least one high-risk account and multiple low-risk accounts with the value of all transferred funds of over $1M within last 6 months.

Convert machine learning into supreme human learning and automate the monitoring cycle

The complexity of an entire customer risk universe forces the need to leverage modern technologies such as machine learning, which is a powerful technique for performing advanced analytics. Banks willingly and increasingly use machine learning for analytical purposes, such as fraud detection, KYC customer profiling or reduction of AML false positives.

Unfortunately, a significant challenge appears at the crossroads of collaboration between data scientists and business users. Those two groups are so different that it is challenging to reach a common understanding and perform machine learning effectively. There are many critical limitations where explainability and automation seem to be a key missing ingredient.

Considering the customer risk assessment, profiling is a very useful technique enabled by machine learning. Once a machine learning model is delivered, the challenge is to provide the desired results in a digestible form for compliance teams who typically are not analytically and technically savvy.

As an example, an unsupervised machine learning algorithm can cluster customers to get desired profiles relatively easily, but these profiles then need to be justified and explained. Let’s imagine that the profiles have been generated based on 50 variables, such as the average value of transactions for the last six months or the sum of transactions for the same timeframe. Then, to be able to explain what the profile means, compliance KYC or AML experts have to correlate the values from those 50 variables to get to know the profile details. Or alternatively they can ask data scientists to generate the written description to each, relying entirely on them. This is an awkward approach.

In addition, customer behaviors change over time, so customer profiles must be refreshed periodically, and due to recent regulations, the customer’s risk assessment then requires adjustment.

Unfortunately, with the conventional approach, these processes are not optimized and require repeatedly engaging machine learning specialists. This is the reason why using machine learning in the AML space is a painful process and why many banks are desperately looking for improvements. The hope is to automate the profiling process in

order to avoid engaging data scientists for continuous monitoring or updating profiles for every single change. They aim to have designed models fed with new data automatically to continuously monitor if the customer reference profile must be replaced with a new profile considering changes in behavior. And finally, the desired results are supposed to be delivered in an understandable form for non-technical users.

Converting machine learning into supreme human learning is the first step. Instead of providing the raw output from machine learning analysis, a business ontology can be easily applied, so the interpretation for business users becomes extremely simple and automated. If we assemble all these flat and contextual variables into several characteristics that represent a customer behavior within a selected timeframe (e.g., the last 6 months) in a specific context of cash transactions, credit card transactions, foreign transactions, adverse media or SARs and we assess their values statistically by the scale of low, medium, high, etc., the results become self-interpreting.

Most importantly, these characteristics should be freely determined or expanded as desired with minimal effort, reflecting data you own and considering changing AML regulations.

The second step is to automate monitoring of the reference profile. Instead of engaging machine learning experts every month or so to recalculate reference profiles for customers, the process should be entirely automated, repeatedly checking reference profiles against new data to see if the customer's reference characteristics values have changed over time.

The third step is to determine if the customer behaviors have changed permanently, such that the reference profile should be automatically replaced with the new one. This is possible by capturing a broader perspective via assembling characteristics values from several observation periods and measuring how many changes have been noticed and how significant those changes are (e.g., whether this was a change from low to medium or from very low to very high).

Quickly configure alerting rules

The ability to quickly implement new rules for alerting, combined with the ability to seamlessly tune them, is critical today in the face of rapidly increasing costs related to implementing new scenarios and handling false positives.

This should be provided for non-technical users, deeply ingrained in the business. Quick and agile testing and implementing of new alerting rules or improving the existing rules shouldn't be an exceptional achievement but a standard and critical capability in a modern financial services organization.

Technically, this should correspond to the ability to seamlessly link new data sources and combine and convert graph and machine learning output into alerting rules without scripting or expertise with a programming language. When the alerting engine is up and running, an instant feedback loop should be provided to monitor which rules generate false positives for further iterative tuning.


Learn More About DataWalk KYC Software >


Register to Get PDF

    I'd like to book a demo

    Get a live demo