I love it when tradeshow season rolls around again. In New England the grass is green, spring is in the air, and people are rejuvenated by the possibility of being outside without being ensconced in swathes of goose down. This year’s #datasummit2019 in Boston was no exception. The freedom of peep-toe shoes intermingled with the excitement of possibility – possibility of what is to come with AI and ML, of how companies can compete with next-level analytics, and of course how data access, control, and quality drive the success of these initiatives.
MIT’s Michael Stonebraker opened the event with a complaint about the speech from the women who introduced him. He noted two errors due to poor data quality, which got a laugh from all the data dorks in the audience, including myself. To be clear, he is not, and never has been with IBM! Joking aside, Stonebraker shared some startling facts and anecdotes which reminded me of the foundational importance of having good data, and the negative impacts of poor data quality.
For example, he quoted a data scientist at iRobot: “I spend 90% of my time finding and cleaning data and then 90% of the other 10% checking the cleaning.” Even in 2019, data scientists across industries are still spending a majoring of their time cleansing data and preparing data rather than focusing on the important modeling and analytics work they were hired to do.
Another interesting anecdote that was shared was about GE. According to Stonebraker, GE has 75 different procurement systems. If the data within those individual procurement systems were merged into one and the procurement officers could access and de-duplicate the terms and conditions, it would save the company $100M annually at renewal time because the officers could negotiate the most favorable terms across all the vendors. It’s a $100M-a-year data quality problem.
While the data initiatives themselves have changed considerably over time due to the emergence of more sophisticated data, unfortunately, the data industry has not advanced as quickly as one would have hoped. According to a Harvard Business Review report cited at the event:
The critical underlying component of all data-driven initiatives is the data itself, and the level of its quality needs to be raised to ensure trusted data for analytics, BI, or any data-driven activity.
Attending this conference made me even more proud to work for Experian, where we aim to challenge that status quo. We are building products and combining them with services and proprietary datasets that enable data practitioners to better access and control data. If you would like to learn more about our latest innovations, please contact us.