This is the question that people are trying to answer when they undertake a data quality assessment. A typical assessment requires something resembling the following events:
The problem is that even with a comprehensive set of data quality rules, the data in your systems is only an approximation of reality. Your data may infer that 'Mrs Henry owns a Toyota Prius and lives at Maple Crescent, Eastbourne' but Mrs Henry may have sold the Prius last month and moved to Glasgow last year! It is this conflict between reality and perceived reality (in our systems) that leads to stranded assets, unbilled revenue, overbilling, customer churn and many other data quality-related impacts.
But of course, we don't just have a conflict between reality and the data in our systems. The typical enterprise data landscape means we often have conflicting sources of information. Billing, accounts, servicing, fulfilment - these could all have conflicting information for Mrs Henry - so which one should we believe?
Telecoms and utility systems, for example, are renowned for duplicating information assets. Following countless acquisitions and mergers, consolidations and silo expansions, there can be untold 'versions of the truth'. This proliferation of data is a particular problem when managing tangible information assets like plant equipment. In one extreme example, I discovered 27 systems were holding details of the same equipment. However, even if you have two systems with replicated information, it can still cause significant financial and operational headaches.
To understand the extent of the problem with telecoms and utility data quality you typically have to go on site to perform a site audit or review. Site reviews allow you to cross-check whether the equipment you can visibly inspect is accurately reflected in the data you hold in operational systems. These audits are incredibly valuable but can be costly, so it pays to focus your efforts and prioritise on the most valuable information.
A data quality assessment using suitable technology is a good starting point because it will flag up discrepancies between overlapping datasets. Taking a simple telecoms example, you would check to see if all active equipment has an active service by assessing data quality across the relevant systems.
Where you start to find issues, you can prioritise, typically by location 'hot-spots' where the likelihood of a change in the data compared to reality is highest. By visiting each hot-spot, you can again leverage modern data quality technology to support your audit.
My personal approach to site reviews is to use a decent spec laptop, a data quality tool that features an inbuilt repository and full volume data loads of multiple systems that are anonymised or masked for security where required. A modern data quality tool (like Experian Pandora) is capable of handling large volumes of data due to its highly compressed repository architecture. I would then inspect each piece of equipment and cross-check with all the underlying datasets that are meant to contain the equivalent data. In one site alone I found around 37% inaccuracies, just on one floor.
Examples would include:
The site audit has many uses, but perhaps the most important are to help identify where your field force requires greater training and governance of their installation and servicing procedures. For example, we found multiple instances where different systems would highlight certain equipment for servicing but on review, the equipment was found to be recently serviced. Clearly, records were not being updated correctly and re-training was required to coach the workers on the value of data quality for protecting jobs and safety.
But other issues were found that were pointing to data management failures. We discovered that the 'electronic bridge' between the provisioning, installation and servicing systems was defective. Many times we saw brand new equipment waiting to be installed despite perfectly operational machines available but standing idle due to ‘the accuracy gap’ between different systems and reality. The provisioning system was failing to spot idle equipment due to poor synchronisation controls between the different applications.
Site audits offer tremendous insights into how time, revenue, costs, efficiency and other bottom line drivers are impacted by poor quality data. The key is to go armed with all of your data, held securely, in a platform that supports the data quality auditing process. Your data quality tool can provide the answer to taking your 'data quality show on the road' by demonstrating the drift between reality and what is held in your systems, as well as the drift between systems themselves.
But where your data quality tool can really shine is helping you rapidly understand the root-cause of why these data defects are occurring so that you can implement long-term, cost-effective, preventative measures.