The value of artificial intelligence (AI) to data quality

We’re finding great value in AI as are other businesses. Here is how Experian Data Quality is using AI in our product suite today, so you can continue to innovate and automate your business.

AI is a term that’s grown in importance since generative AI solutions like ChatGPT, Bard, and others, became available earlier this year. Their success at demonstrating the capabilities of large language models to understand written and spoken prompts, their ability to search and summarize volumes of content, and their proficiency at providing well written, holistic answers as well as a range of novel and predictive capabilities, has brought widespread attention. So, it’s natural to ask, “how will artificial intelligence impact the field of data quality?”

What's the difference between AI, generative-AI, and machine learning (ML)?

First some definition of terms. I asked (Google’s AI chat bot) Bard to answer this question. This is what Bard said:

These three terms are all related to the field of computer science, but they have distinct meanings:

Artificial Intelligence (AI):

A broad term encompassing all efforts to create intelligent machines capable of performing tasks typically requiring human intelligence.
Focus: Replicating human-like intelligence and behavior in machines.
Examples: Robotics, natural language processing, computer vision, decision-making systems.

Generative AI (Gen-AI): A subset of AI

Focus: Creating AI systems that can generate new content, such as text, images, music, or code.
Methods: Often uses machine learning techniques, especially deep learning, and large language models (LLMs).
Examples: Text generation tools, image editors, music composers, code generators.

Machine Learning (ML): A subset of AI

Refers to the techniques used to enable machines to learn from data and improve their performance on a specific task without being explicitly programmed.
Methods: Algorithms that identify patterns and trends in data, then make predictions or decisions based on those patterns.
Examples: Image recognition, spam filtering, recommendation systems, self-driving cars.”

Here's a useful example:

AI:

Imagine a solution that can evaluate a dataset and fix any data quality issues for entered values without human intervention.

Gen-AI:

Imagine a solution that can analyze the data and predict values for entries that are missing or incomplete.

Machine Learning:

Imagine a solution that analyzes the outputs from the above processes, accepts input as to the validity or otherwise of the choices made and learns how to improve on them to make the data ever more accurate and complete over time.

That’s the kind of prospect that AI holds for data quality tools in the future. And apart from data quality tools, it’s also very important that organizations create a foundation of quality data to inform, train, and feed the wide range of AI tools that will likely be used across their own businesses going forward.

How does Experian Data Quality use AI today?

Our AI-powered data quality platform, Aperture Data Studio, automates and operationalizes data quality for businesses.

Experian's Aperture Data Studio (also known as Data Studio) solution uses AI to provide a self-service data quality and enrichment platform that enables organizations to efficiently manage data quality and create an accurate, trusted, and holistic view of their information. This AI-powered platform provides a range of features such as data profiling, data cleansing, data matching, data enrichment, and data monitoring. The platform also offers real-time data validation and address verification.

Over the last few years, we have put a great deal of automation into Data Studio and have received positive feedback from the analyst community indicating that Experian has some of the most advanced uses of automation on the market.

Leveraging AI and ML, automation is being built into Data Studio in nearly every area: data onboarding, data discovery, issue discovery and resolution, rule creation, matching, and data observability. This automation makes Data Studio far easier to use and helps our clients reach value faster with fewer resources.

Take rule creation, for example. Data analysts need to discover, document, execute, and maintain complex sets of rules across different datasets and domains to be able to keep their data fit for purpose. Data Studio incorporates machine learning algorithms for automatic data tagging that support the easy discovery and deployment of such rules, enabling them to be stored, shared, and executed, all via a business-friendly interface.

Automation is also present in Data Studio’s smart profiling capability, allowing users to automatically find data issues and receive suggestions on how to resolve inaccuracies. Leveraging auto-tagging and smart profiling, the Suggest Transformation option analyzes values in the data and recommends functions to improve data consistency, clearly explaining what each transformation will do to the data to preserve data integrity. Examples are Trim and Compact which remove unnecessary space characters or convert null to zero for numeric columns containing both. Also Hash, which obfuscates sensitive data so that it can be safely saved and shared. Once accepted, transformations are easily deployed in just a couple of clicks.

Other areas where machine-learning is used within Data Studio include:

Powerful outlier analysis to proactively detect and inform users of unknown and known anomalies within the data.
Observability features provide automatic data monitoring to detect interesting or unexpected changes to the data.
Tuned matching rules for optimized accuracy when comparing records from different sources.
Smarter merge suggestions when configuring how best to deduplicate records with duplicated data.

The Aperture Data Studio Roadmap indicates that further investment in AI is already under investigation. The goal is to determine how can Gen-AI natural language processing (NLP) models be used to increase user efficiency and improve collaboration through personalized experiences and AI-driven intelligent suggestions.

A robust data governance and data quality strategy is the prerequisite to AI business success

The early adopters of AI, ML, and Gen-AI were primarily organizations with robust data and analytics strategies. Now, as the hype continues, more organizations without that foundation are keen to take advantage of the new innovations. Many analysts advise them that they can’t get started without building a strong data strategy.

One big challenge is the breadth of information used to inform publicly available Gen-AI solutions. For example, today’s open GPT-based solutions such as Bard, Bing365, and OpenAI are trained on a broad spectrum of internet and social media data. Any frequent user will know that this often results in “hallucinations” where, to collaborate and simply provide an answer, the solution will misinterpret the data and present a totally incorrect result as the truth. Without human intervention and understanding, such “hallucinations” can cause significant misdirection and even harm.

The answer for businesses interested in using Gen-AI in their own products and decision-making is to narrow the input data to information that is relevant to the purpose and to make sure that the data is as accurate as possible. Without accuracy, the models can still produce hallucinations. Without trust, the resultant decisions will not be acted upon or acted upon slowly, after the wisdom of the decision has been thoroughly vetted. The latter course, eliminating much of the business value assumed for the AI solution.

Success is going to require a strong blend of data quality, data governance, and data security.

Data quality ensures that the “training” data is accurate, complete, and comprehensive.
Data governance manages the data quality and accessibility, determines ownership, and carefully catalogs and defines the information available so that decisions can be made about the best data to use.
Information security will be needed to protect the data from being shared inappropriately or being purposely corrupted to impact competitiveness or reputation.

A key example where governance, quality, and security could make an impact is in the call center. One use of Gen-AI is in call center applications where bots use customer data to efficiently respond to personalized customer questions. The benefits of improved customer satisfaction and efficiency should be significant, but if the customer data gets corrupted or is simply wrong, the opposite effects will likely occur. That is, poor satisfaction and less efficiency as the firm tries to do damage control.

The challenge for many firms is that the traditional, top-down approach to data governance is too expensive and unwieldy. It can take years for a business to mature enough to adopt a data governance program—and data quality often takes a backseat due to lack of ownership. Now, more agile firms are taking a bottom-up approach and seeing success.

Agile firms taking a bottom-up approach are building their data governance and quality programs one step and one issue at a time. Perhaps, leaders bring governance practices to the data analytics department first then expand involvement to other departments as issues arise and are solved. Over time, those with a vested interest in solving their department’s problems will become involved and take ownership, broadening the organic adoption of governance and quality across the business.

Aperture Data Studio and data governance

Experian has partnered with leading data governance vendors, such as Alation, to provide bi-directional interfaces to applications. Such integrated interfaces allow the governance solutions to take advantage of profiling, monitoring, and other data quality capabilities while providing Data Studio access to a wide range of metadata to increase its operational effectiveness. The net result for Experian and Alation joint customers is a far more robust data quality and data governance capability.

Experian continues to invest in data governance for Aperture Data Studio customers by expanding partnerships and integrations with companies like IntoZetta, who is a UK-based software company that specializes in data governance, quality, and migrations for specific industry sectors.

By further participating in the data governance market, Experian is focused on providing our customers with a well-rounded tool set that helps businesses innovate their use of artificial intelligence with a strong foundation of data quality, governance, and security.

Enjoy a free 30-day trial of ourdata validation software.

Enjoy a free 30-day trial of ourdata validation software.

Enjoy a free 30-day trial of ourdata validation software.

Enjoy a free 30-day trial of ourdata validation software.

The value of artificial intelligence (AI) to data quality

What's the difference between AI, generative-AI, and machine learning (ML)?

How does Experian Data Quality use AI today?

A robust data governance and data quality strategy is the prerequisite to AI business success

Aperture Data Studio and data governance

Have questions? Our team is here to help!

Enjoy a free 30-day trial of our
data validation software.

Enjoy a free 30-day trial of our
data validation software.

Enjoy a free 30-day trial of our
data validation software.

Enjoy a free 30-day trial of our
data validation software.