Skip to main content

Dirty data exposed: Why data profiling and data cleansing should be part of your everyday processes

Business leaders today are using data to power all sorts of initiatives, from uncovering revenue opportunities to complying with regulations. More and more, data is becoming a mission-critical asset that many argue should be tracked on a balance sheet. In fact, nearly all of the C-level executives in our 2018 global data management benchmark report (95%) believe that data is an integral part of forming their business strategy—a sentiment that has increased by 15 percent over the prior year.

While making better data-driven decisions is the goal of any organization, many face unforeseen challenges when it comes to siloed, unstandardized, and otherwise inaccurate data. According to our research, U.S. organizations believe 33% of their current customer and prospect data is inaccurate, on average. That begs the question: how do you make sure you’re set up for success in the data-centric era?

Having a foundation of quality, clean data is the natural first step for any data project—and that means data cleansing should be integrated into your ongoing data quality and data management processes. What exactly is data cleansing, and how does it work? Data cleansing, otherwise known as data hygiene, is a multi-stage process for improving the overall quality of the data. It typically involves verifying the data you have against a definitive source of record to determine the accuracy and timeliness of the information, appending missing details to formulate more complete records, and standardizing or deduplicating records for a consolidated view. Data cleansing can be performed on large volumes of records, or each component can be orchestrated in workflows to be performed on a recurring schedule.  

Nowadays, business users talk a lot about having trusted data, and oftentimes, they apply the aforementioned data cleansing tasks to remediate the known errors in their data sets. However, data cleansing is only really useful for the data sets you know are bad. More often than not, bad data lurks in our databases undetected and unaddressed until we shine a spotlight on that data using a specific SQL query. How can you identify the unknown bad data hiding in plain sight? Enter: data profiling.

Data profiling is a powerful way to systematically analyze vast quantities of data to identify the uniqueness and completeness of fields, null values, and any other statistical anomalies that might deviate from the normal values of the column or row in which it belongs. By profiling your data, you might see that a certain number of records are missing critical pieces of information or that there are duplicate records for a number of individuals. Data profiling can also determine if the content of the records is a date, text, or alphanumeric—helping you to identify inconsistent formats within the data.

In an ideal scenario, organizations should use data profiling capabilities in concert with data cleansing processes. Using data profiling, they can identify the errors and anomalies that exist in their datasets that are in need of further review by the appropriate data owners. These datasets can then be recommended for the data cleansing processes, where the records are verified, matched, deduplicated, and standardized to create a healthy and clean dataset.

Developing a trusted data source means trusting the tools you’re working with. Learn how Experian Aperture Data Studio helps enable confident, data-driven business.

Learn more