Data profiling is the process of examining and analyzing data to identify relationships, recognize outliers, and detect duplicate information in order to prioritize data cleansing and standardization tasks.
Our profiling and discovery solution allows business and IT users alike to instantly browse and interrogate data, as well as view more than 240+ metadata attributes as soon as data is loaded into the tool. Users can immediately identify relationships, outliers, or format distribution across disparate systems that need further investigation.
Our proactive data profiling and data discovery capabilities help you understand what problems exist within your data and identify what actions need to be taken in order to remedy the data issues.
Hi, I’m Rishi Patel, Strategic Technical Manager at Experian Data Quality. For this video, I am going to take you through the data profiling and analysis features of Experian Pandora.
We’ve already loaded some data into this repository, which is part of the tool and when that data load occurs, the tool automatically analyzes the data as it’s being loaded. So what we’re able to do, is look at the results of this analysis. We’re going to have a look at the column view and instantly get access to a whole bunch of metadata such as uniqueness, completeness, null values, and data types. We can also see how many formats occur for each one of these values. This is a great starting point to drill into more information about this data.
So just looking at the first column where we have a customer ID, we can see that it’s fully populated and there are no null values but it’s not entirely unique, which we might expect from a customer ID. I am able to drill into the actual values underlying this data interactively and here I am interested in the values that occur more than once. I can easily filter out the ones I don’t want and here I can see some of the values that occur in two or more rows. At any point, I can select the ones I am interested in and actually locate the underlying data.
At any point in this process, I can save or export this information. So what we’re able to do is go straight from the high-level statistics to the underlying information about each of those rows.
As another example, we can have a look at order dates. In this example, we can see there are a number of null values within here and we can right click on there and look at those values themselves, instantly seeing where null values occur.
We can take this a step further and have a look at the data formats. We can see that there are outliers over here, where we have the value dates of numeric and we can see that there are some outlying values where the date format doesn’t conform to the appropriate date. This format analysis can be performed on any of the data stores.
If we look at another example, we can analyze the post code field. We can go in here and have a look at the formats. We can see that there are forty-two different groups of formats available for the post code field. Let’s say we want to drill into something specific to a different country, we can look at the rows, navigate to the different countries that we have available to us, and profile that information instantly. The most dominant type up here is based on the United Kingdom. If we double click on that, it will bring us to the values.
Now at this point, we can then profile the information such as the post code. We can see for the United Kingdom, we’ve got seven different groups some of which look accurate and some of them that have data quality issues.
That was a brief demonstration but I hope it gives you a good idea of the profiling and analytical capabilities of Experian Pandora.
When your data is accurate, your organization operates more efficiently. Accurate data helps you save money by reducing time spent manually identifying and correcting errors, allowing your staff to focus on more impactful projects.
Our data profiling and discovery software is the only tool that links financial metrics to data quality. This helps you focus your efforts on fixing data errors that impact your business the most.
When customer information is accurate, standardized, and not duplicated in your systems, your customers are more likely to receive your correspondence. Properly formatted data improves your brand’s image and makes a great impression on potential customers.
Copyright ©, 2014-2017. All rights reserved.
125 Summer St Ste 1910, Boston MA 02110-1615, US