This blog post is part two of three in a mini-series we are calling The art and science of matching your data.
Matching data should be simple, right? Well, that depends on your perspective. As much as processes can be automated these days, when it comes to record matching, the results still depend on the context in which you want to view the relationships.
For example, if I’m looking to reduce my mailing costs on catalogs, I may be a little “loose” with my matching criteria. Worst case, I let “Alexander Smith” and “Alexandra Smith” match together and one of them doesn’t get a catalog. The overall effect should be minimal, and I’ll be more likely to achieve my goal of keeping mailing costs down.
The other scenario is with banking account information. In determining a relationship, the rules may need to be a bit more stringent. For example, I don’t want to have different people seeing other people’s account information, so I may not be able to consolidate information for “John Smith” and “John Smyth,” even if there are other similarities. This will reduce the amount of duplication that I can eliminate, but I am mitigating the risk of joining sensitive information that doesn’t belong together.
These are each examples of the art of data matching. Sometimes the end goal of the process will have a substantive impact on how the matching rules are defined. Here’s a situation I use all the time:
Record 1 – C. Smith
Record 2 – Chris Smith
Record 3 – Christopher Smith
If all other information is equal, then I would argue that these records could represent one, two, or three different individuals. The argument for one individual would be that they are all abbreviations or nicknames for the same person, Christopher Smith. Therefore, I would want to match all three records together as one individual. On the other extreme, these three records could represent three distinct individuals. Perhaps “C. Smith” is really Charles Smith, “Chris Smith” is really Christine Smith, and the final record is, as stated, Christopher Smith. Again, if all other information is equal, and this is the only information you have available, then there is no right or wrong answer. How you approach the matching methodology is less about the data and more about the goal you are trying to achieve in building these relationships.
After the fundamentals of matching, and the art versus science of matching, comes some more advanced topics that we’ll explore in the third and final part of this article. Be sure to check it out!
Could your data use some consolidating?