
Entity Resolution: What It Is, Who Needs It, And The DataWalk Solution
What is Entity Resolution?
Data sets may have multiple records for one entity (e.g., a person, a company, etc.). The process for determining whether different records may actually represent the same entity is called entity resolution.
Entity Resolution Example
There are various techniques for entity resolution. A simple example is to compare whether both a name and a unique ID (e.g., a social security numbers or driver’s license number) are the same in two records. Entity resolution is of course more difficult when such unique identifiers are not available, such that various other attributes may need to be analyzed. Complicating the challenge of entity resolution is that data entered manually may be entered in different formats, and is subject to data entry errors.
Consider the fictitious data below, where there are multiple records as follows:
Index | Name | Address | Phone | Date of Birth |
1 | Robert Smith | 123 Main Street | 555-555-1234 | 3/12/89 |
2 | Rob Smith | 123 Main St. | 555-789-3456 | March 12, 1989 |
3 | Bob Smith | 789 Broad St. | 555.555.1234 | 3/2/89 |
4 | B Smith | 456 Church St. | 222-333-4444 | 3/12/89 |
In the above example, records 1 and 2 are likely the same person, as name, address, and date of birth all match. Record 3 is perhaps the same person as in records 1 and 2, as the name and phone number match, and it’s possible that there was a data entry error in the date of birth. Record 4 is more likely to be a different person, as the names are only a possible match, and the address and phone numbers do not match the other records.
Entity Resolution: Who Needs It
There is a widespread need for entity resolution capabilities across both commercial businesses and government agencies. For commercial businesses, entity resolution is important for truly understanding your customer, as often a 360-view of a customer is possible only by correlating data across different data sources. For law enforcement and fraud-fighters, entity resolution helps organizations identify instances of fraud and deception.
Entity Resolution: The DataWalk Solution
Entity resolution is a feature of DataWalk, which is a comprehensive Enterprise-class software platform for fusing data across your various sources, and then enabling easy access and analysis of that data.
DataWalk’s entity resolution facility enables you to:
- Easily fuse your data
- Structure, clean, and compare addresses, phone numbers, names, dates, and identifiers.
- Generate combinations of highly flexible rules to evaluate whether records may match
- Manually or automatically merge records which are believed to be the same entity, while preserving all data and connections
- Track the lineage of each object throughout the entire process