Skip to main content

Bigger isn't always better when it comes to data quality

Paul Newman Archive

For the last year, enterprise opportunists have been abuzz about big data. Businesses of all sizes were developing or deploying big data plans that would give them a leg up on competitors, as they could finally harness all of the information about their customers and run it through processes to verify data quality and extract advantageous insights. 

Some data experts suggest that bigger is not always better. For example, Bob Warfield wrote in an article for Enterprise Irregulars that smaller companies may not actually need such complex data plans after all. Warfield points out that big data is like living on the island of Manhattan. It's crowded and complex, but it has allure and seems 'sexy.' That said, not everyone wants to deal with gridlocked commutes and packed trains on a daily basis.

Some people prefer to live outside of urban centers, Warfield writes. The suburbs, which are smaller, decentralized and do not necessarily require complex architecture, may be a better match for smaller businesses' purposes. When that is the case, they may only need "Suburban Data." 

Quality over quantity?
In fact, all companies might be better off if they were focused more on the three S's (slow, small and sure) than the three V's (velocity, veracity and volume), Stephen Few wrote in a blog post for the Visual Business Intelligence. Few suggests that businesses are in essence engaging in a tortoise and hare race. In order to win, they may need to take a slow, small and sure approach rather than one based on volume, veracity and velocity. 

"Data is growing in volume, as it always has, but only a small amount of it is useful. Data is being generating and transmitted at an increasing velocity, but the race is not necessarily for the swift; slow and steady will win the information race," Few explains. "Data is branching out in ever-greater variety, but only a few of these new choices are sure."

Update your filters
Data quality tools can weed out bad information, but businesses can also develop better filters so they're not letting in content that won't help them achieve their goals, according to the Obsessive Compulsive Data Quality blog. In the post, Jim Harris compares the deluge of data to the infinite inbox, in which you always have an infinite number of unread messages. 

Harris points out that in theory, there might an algorithm that could seamlessly filter out the valuable messages from the useless spam offers. Similar to important messages, there isn't yet a formula to separate the good information from hoards of unhelpful petabytes of data being created every day, so it might be more prudent to stream in only the most relevant content.