Skip to main content

Tackling issues of data quality collectively and collaboratively

Paul Newman Archive

We're entering an uncertain time in business with respect to analytics - on one hand, it's exciting that technology is improving and companies have more channels for data mining available than ever before, but on the other hand, they still have to worry about data quality. As they're collecting information from so many people in so many places, how can they trust that what they're getting is reliable?

Issues with quality can arise in any number of ways. A problem might exist because of a single piece of information that's misspelled or outdated, or it might be something bigger - like a massive database that was imported in the wrong format and is now unusable. In either case, something needs to be done. Mistakes in data quality can lead to embarrassing mishaps and huge financial losses.

The nature of the problem
It's hard to even comprehend how big the data quality issue has become. According to Inside Big Data, it's now a massive source of financial hardship for businesses. Forbes has estimated a total cost of $5 million annually for a big business, while Ovum Research says data quality can eat up 30 percent of total revenues.

Andrew Hermann, president and director of the CorSource Technology Group, says that ignoring the problem is no longer an option. Companies need to do something about their numerous, disparate, unreliable sources of data if they want to find information they can trust.

"Data is flowing from internal business systems, typically legacy systems, SaaS systems, external online sources and the Internet of Things," Hermann warned. "Many companies collect at least some of this data via warehouses, but collection is about as far as most have made it. The primary reason for gathering the data is to use it for analysis, but most users don't trust their data enough to perform analysis that is actually useful."

Tackling the data beast
So if companies are aware of their data issues, what can they do to combat them? That's a difficult question. Hermann says that finding data quality is a matter of constant vigilance.

"For one, addressing data quality does not have to be an 'all at once' approach," he explains. "The most successful initiatives involve starting with a couple of systems, analyzing the data challenges, and setting up rules to fix and ultimately govern the data automatically."

Businesses first need to take stock of their data and figure out what they're working with. Then they need to cleanse, remediate their issues and institute solid long-term practices for data governance. The ultimate goal is to get their data clean and keep it that way.