Graph Embeddings:

A Breakthrough For Detecting High-Risk Accounts & Transactions

A superior alternative to conventional machine learning

 

Graph Embedding vs. Conventional Machine Learning

Graph embeddings are algorithms used to represent graphs in more computationally digestible formats. They are very useful in reducing the complexity of computations in machine learning (ML) and other AI tasks while retaining the contextual and structural information inherent in graph representations.

Recently, graph embedding has emerged as a powerful machine-learning technique for analyzing transactional data in the banking industry. The technique has the potential to generate insights from this data faster and more accurately than traditional ML approaches, dramatically accelerating time-to-value for the customer.

In traditional machine learning, relevant business features (e.g. number of transactions) must be manually chosen and aggregated for each subject (e.g. customer, bank account), and the input data is typically represented as a fixed-length vector of features. The problem with this approach is that it’s slow, laborious, and unable to fully capture the complex relationships that require context from aggregating multiple hops on a network graph. This results in lower accuracy in graph problems.

The graph embedding technique addresses these challenges by transforming the nodes of a graph (an otherwise abstract structure) into numerical vectors, a format that’s compatible with machine learning methods in general. Put differently, this transformation maps the transactional data into a low-dimensional vector space, where the relationships between entities are captured. This approach enables machine learning models to identify patterns and anomalies that would otherwise go undetected in traditional methods, leading to more efficient analyses and more accurate conclusions.

Figure 1: Traditional Machine Learning vs Graph Machine Learning
Figure 1: Traditional Machine Learning vs Graph Machine Learning

To illustrate this, Figure 1 depicts the difference between traditional and graph machine learning approaches. The image on the left side of the figure depicts traditional machine learning. Because data in traditional machine learning is represented as a table, rather than a graph, we can access only the first hop between data elements (i.e. sender and receiver), as depicted by the darker green nodes (or data points) and edges. This means that transactional data points are considered in isolation, leading to a constrained view of potential risks and relationships, or a narrow receptive field, as depicted by the smaller red-dashed circle. (A receptive field is generally the set of nodes in a graph from which a particular node can directly or indirectly gather information during the learning process.) As a result of this constrained receptive field, we lack visibility to more complex relationships that might exist among the data (as depicted by the light gray nodes and edges just outside the receptive field), underscoring the limitations of traditional methods in analyzing the complex web of financial interactions.

On the right side of Figure 1, the image of graph machine learning illustrates an expansive network approach. Here, the receptive field is broadened, capturing multiple hops and thereby a wider range of transactions and account relationships, as depicted by the now larger red-dashed circle enclosing a broader set of green nodes and edges. This representation aligns with graph embedding techniques, emphasizing their ability to map data into a comprehensive vector space where deeper connections are analytically accessible, leading to more nuanced and predictive insights.


Analyzing Banking Data Using Graph Embeddings

To further explain the value of graph embeddings, we’ll use a geospatial analogy. Consider how a physical address translates into a point on a map. As a simplified, two-dimensional representation of our world, maps turn complex landscapes into understandable coordinates.

In a similar vein, graph embeddings can transform vast amounts of banking transaction data into a format that's easily interpreted by machine learning models. Just as a map converts the addresses of physical locations into two-dimensional vectors of longitude and latitude, graph embeddings convert transactions and accounts into numerical vectors in a lower-dimensional space. For anyone new to machine learning, it might help to think of graph embeddings as the GPS for financial data, offering a navigational tool that provides direction through the terrain of transaction networks.

Figure 2 illustrates the concept of mapping a transaction graph onto embeddings. On the left side of the figure is a “Transaction Graph” in which each account (A1 through A5, as depicted by the yellow nodes) transfers funds to another account via a transaction (T1 through T5, as depicted by the blue nodes). The connections between the transactions and accounts (depicted as light blue vectors) imply transactional relationships, representing a flow of funds between the accounts (whose direction is indicated by the direction of the vectors). For example, the act of Account #1 sending funds to Account #2 is a transaction that connects the two accounts (depicted as T1 in the graph).

Figure 2: Graph embeddings as a map of a transaction graph
Figure 2: Graph embeddings as a map of a transaction graph

On the right side of the figure is the “Embeddings” space, which depicts the same five accounts, but reduces the otherwise complex graph into a form in which patterns can be analyzed by most machine-learning algorithms. (Because ML algorithms require vectors as inputs, the output of embeddings is also in the form of vectors.) Such mapping of transaction graph data onto embeddings allows the machine learning models to reveal a bigger picture—detecting patterns, identifying anomalies, and understanding the financial ecosystem on a macro scale.

To emphasize the limitation of a transaction graph as compared to graph embeddings, Figure 3 illustrates the same network of financial transactions and accounts as illustrated in Figure 2, but now in a scenario where high-risk factors have been identified. In this scenario, the risk level of each account is predetermined (by the bank’s internal risk-assessing system) and depicted in the figure with color coding—red for high risk, green for low risk, and gray for undefined risk—providing a clear picture of potential vulnerabilities. Once again, the connections between the transactions and accounts (depicted as light blue vectors) imply transactional relationships, representing the interactions between accounts.

Where the transaction graph falls short is in its focus on the one-hop proximity (i.e., from one node to another); it presents transactions directly linked to an account, without revealing the more extensive network of interactions. Taking this narrow view means we might evaluate an account's risk based only on its immediate connections, overlooking the broader, multi-hop transactional pathways that can expose deeper risk associations (as initially depicted by the broader receptive field in Figure 1). Thus, the transaction graph is constrained to immediate transactional connections, which is insufficient for a comprehensive and accurate risk assessment of accounts.

Figure 3: Workflow of financial risk detection with graph analysis and clustering
Figure 3: Workflow of financial risk detection with graph analysis and clustering

Graph embeddings, on the other hand, aim to capture the entire transactional pattern of each account, looking at the broader picture rather than individual transactions in isolation. We can think of the embedding as a way to measure how, on average, one account is similar to other accounts, based on a constellation of transactions.

Returning to Figure 3, the right side (once again, as in Figure 2) illustrates the “Embeddings” space resulting from applying a graph embedding technique to the “Transaction Graph” on the left. This process translates the complex relationships into a format easier for machine learning models to process. Within this new representation of the embeddings, the five accounts are now spatially clustered based on their transactional behavior patterns. By applying K-means clustering (a machine learning algorithm) on top of the embeddings, we can group together the accounts with similar risk profiles, as depicted by the dashed circles—red dashes for high risk, and green for low risk.

Unlike the transaction graph, the embeddings show how the connections between accounts affect their risk levels. Embeddings make these connections clear by placing related accounts close together. This allows us to see that Account #2 (initially labeled as an account with undefined risk) might actually be at high risk because of its connections to high-risk accounts. Conversely, Account #5 has been placed into a low-risk area because of its association with Account #4, a low-risk account.


Conclusion

Conventional machine learning falls short in identifying high-risk clusters in transaction and bank account data because of its inability to fully capture the complex relationships among the data that require context from aggregating multiple hops on a network graph. The graph embedding technique addresses this limitation by transforming the nodes of any transaction graph into numerical vectors. Doing so enables machine learning models to identify patterns and anomalies that would otherwise go undetected in traditional methods.

Graph embedding, therefore, especially when combined with a clustering technique, is a far more effective approach than traditional machine learning in identifying high-risk transactions and bank accounts, providing a robust framework for dissecting the transactional network's complex structure. Any software system capable of deploying this approach can greatly help to recalibrate a financial institution’s risk profiles, increasing the accuracy of its fraud detection mechanisms.

 

Get This Paper

Get a live demo