Skip to main content

'Dirty data' represents a serious threat to analytics

Rachel Wheeler Archive

Analytics have been a major part of many companies' strategic outlook for years now, as marketing and sales leaders around the world have realized the power of collecting and analyzing data. But a problem has begun to crop up - the more channels that are out there for consumers to share their information with their brands, the more potential for error exists as well.

People today are transmitting data about themselves and their consumption habits with their mobile devices, their social media networks and their interactions in person and over the phone. With data being transmitted every which way, it can be extremely difficult for companies to maintain accuracy with all of them.

Quality is now a major concern, and it's one that businesses need to tackle from a variety of angles. This problem is going to get more complex before it gets simpler.

The dirty data problem
VentureBeat recently reported on the widespread problem of "dirty data" that's begun to emerge for businesses worldwide. The news source drew upon an executive summary report of research from Tek Systems, entitled "Big Data: The Next Frontier." The result was clear - inaccuracies in data are everywhere.

The firm found that among 2,000 IT leaders polled across the U.S. and Canada, 60 percent now believe that their organizations lack accountability for data quality. In addition, over 50 percent are currently questioning the validity of their data. It's believed that some high-level government organizations - such as the U.S. Department of Education - are afflicted by this problem. The DoE is dealing with error-riddled data clusters and missing information.

Concerns about quality affect everyone from large private corporations to important public offices. No one is immune.

The dangerous side effects
If companies allow dirty data to corrupt their operations, the effects can be fairly severe. Stefan Groschupf, a big data veteran and the current CEO of Datameer, has seen these problems firsthand. He specified, according to VentureBeat, that the specific fallout from poor data quality depends on your line of work.

For instance, for a retailer, bad data can include missing ID numbers within inventories, or inaccurate descriptions of products. Without standard and correct product data, sellers may have trouble with their stocking of products or fulfillment of orders, and workflow overall will likely be disrupted. In education, on the other hand, the effects might include teachers gearing their lesson plans toward the wrong audience.

Regardless of the specific field, it's become clear that bad data can affect anyone. If IT overseers are unwilling to take action, there can be serious adversity ahead.

What companies can do
Data quality is only going to become a bigger problem with time. The more outlets people have for sharing information, the more slippery the slope will get. That's why it's important for the IT community to act now before things get even more unmanageable.

"While companies have been able to monitor the quality of small data sets for some time now, the increasing size and scope of the data organizations deal with on a daily basis has made this task much more complicated," Groschupf explained. "This is where new big data analytics technologies that enable data profiling during every step of the analytics cycle becomes critical in helping organizations to pick out anomalies from enormous data sets from the get-go."

This might include checking names, addresses, phone numbers, emails or any one of a host of other data elements. It all depends on an organization's specific needs.

Using the right tools for data quality now will help business leaders free up additional resources for later. This could mean saved time, money and headaches.