Skip to main content

Examining data quality errors of all shapes and sizes

Paul Newman Archive

As companies delve deeper into analytics and find themselves reliant on more and more troves of customer data, it becomes harder to adequately manage all the information at their fingertips. They begin to find that errors in data quality rear their ugly heads in a variety of ways, and it's not easy to fix them all and prevent them from happening again.

Mistakes happen at multiple stages in the data collection process. Sometimes, elements are input incorrectly at the very beginning of their lifecycles, while in other instances, errors in transcription creep into the equation later. Still other times, data isn't objectively wrong at all - it just isn't properly managed or standardized.

For companies that rely on data for improving operations, fixing all of these types of mistakes is vital. They need to examine the problem from a couple of different angles.

Fixing objective errors in data
Sometimes, the problem is simple, and people's data is flat-out wrong. If a mobile website user is filling out a form and misspells the name of his own street, his address won't be of much use to a company. Likewise, if he mistakenly provides an email address that he doesn't use anymore, his information is a waste. Or if he gives a phone number but forgets the area code, his data will lack completeness.

These mistakes are clear and objective, and companies can take tangible steps to fix them. Address management solutions can be used to check people's contact information and make sure it's valid, and likewise, email verification is a great way to sniff out phony online contacts.

Lending rhyme and reason to the subjective
However, there are also times when data is flawed in a way that's less obvious. Smart Data Collective recently offered up a great example.

Say, for instance, you have a form asking people to name their employers, and many of them say "Walmart." Except they can't all agree on how to spell "Walmart" - some use the standard spelling, but others enter "WalMart, or Wal-Mart," or "Wal-Mart Stores, Inc.," or what have you. The news source reported that according to logistics company US Xpress, there are 178 different ways of writing that one company's name.

Standardizing cluttered data is a more complicated problem, but it's one that's equally important. In order to get the best results from analytics, it's important that companies begin with data that is well organized and easy to translate into real business results.