Entity Resolution: What It Is, Who Needs It, And The DataWalk Solution

What is Entity Resolution?

Data sets may have multiple records for one entity (e.g., a person, a company, etc.). The process for determining whether different records may actually represent the same entity is called entity resolution.

 

Entity Resolution Example

There are various techniques for entity resolution. A simple example is to compare whether both a name and a unique ID (e.g., a social security numbers or driver’s license number) are the same in two records. Entity resolution is of course more difficult when such unique identifiers are not available, such that various other attributes may need to be analyzed. Complicating the challenge of entity resolution is that data entered manually may be entered in different formats, and is subject to data entry errors.

Consider the fictitious data below, where there are multiple records as follows:

Index

Name

Address

Phone

Date of Birth

1

Robert Smith

123 Main Street

555-555-1234

3/12/89

2

Rob Smith

123 Main St.

555-789-3456

March 12, 1989

3

Bob Smith

789 Broad St.

555.555.1234

3/2/89

4

B Smith

456 Church St.

222-333-4444

3/12/89

 

In the above example, records 1 and 2 are likely the same person, as name, address, and date of birth all match. Record 3 is perhaps the same person as in records 1 and 2, as the name and phone number match, and it’s possible that there was a data entry error in the date of birth. Record 4 is more likely to be a different person, as the names are only a possible match, and the address and phone numbers do not match the other records.

 

Entity Resolution: Who Needs It

There is a widespread need for entity resolution capabilities across both commercial businesses and government agencies. For commercial businesses, entity resolution is important for truly understanding your customer, as often a 360-view of a customer is possible only by correlating data across different data sources. For law enforcement and fraud-fighters, entity resolution helps organizations identify instances of fraud and deception.

 

Entity Resolution: The DataWalk Solution

Entity resolution is a feature of DataWalk, which is a comprehensive Enterprise-class software platform for fusing data across your various sources, and then enabling easy access and analysis of that data.

 

DataWalk’s entity resolution facility enables you to:

  • Easily fuse your data
  • Structure, clean, and compare addresses, phone numbers, names, dates, and identifiers.
  • Generate combinations of highly flexible rules to evaluate whether records may match
  • Manually or automatically merge records which are believed to be the same entity, while preserving all data and connections
  • Track the lineage of each object throughout the entire process
Get a live demo