My last blog post was entitled “Why every business needs a single customer view” (SCV). It points out the incredible value that a consolidated and consistent view of your data—organized by customer—can deliver but also acknowledges some of the challenges that prevent companies from implementing such a view. For a real-time SCV, obtaining technology to link to existing systems and to collect and store data is one of the biggest issues. Before any investments are made, however, it’s important to carefully plan what data will be used, where it will come from, and how you will make sure that it’s fit for purpose. To prevent, in the words of that oft-quoted adage, “garbage in, garbage out”!
Before purchasing any new system, it’s necessary to define your requirements and work through any associated data quality issues. In other words, be prepared. It’s not just about the size of the storage capacity that you’ll need, the type of existing systems that hold your customer data, or what capabilities the new system will have—although those are all extremely important. Do you have a good understanding of what data is held in each of your existing systems? Do you know whether the meta data for each has unique definitions, and whether the data is complete and accurate? Often there is considerable overlap between databases. Is there a unique identifier for each customer or do you use their name, email address, or phone number? If there’s overlap, which system will take precedence? If there’s duplication, how will you eliminate it? If there are errors…or empty cells…or incomplete information, how will you account for it?
Here’s a checklist of key questions to answer:
1) Which systems currently hold the data needed to create a single customer view?
2) Can they be consolidated or will they be replaced by a new system when you implement an SCV? Or will the SCV system simply pull data from the existing systems?
3) What database structures are used and what query languages?
4) Are the requisite skills available in your organization to query your legacy systems?
5) What data fields exist in each system containing customer-related data?
6) Is a consistent naming approach used across all databases or do some fields have different names even though they contain the same data (e.g. the POS system that collects data on products sold might use the term “customer”, whereas the customer support database might describe the same data simply as “name”).
7) Is there consistency of data formats? For example, there are a variety of date and time formats that might have been used by implementers at different times, or if data was entered by store associates or consumers, there may be no consistent format at all.
8) Are their gaps in the data that would render the resulting records less valuable? For example, missing dates of birth or missing social security numbers (if applicable).
9) Are there errors, inaccuracies, typos, or out of range issues? Data collected manually without validation is prone to relatively high error rates, usually in the 30+% range.
10) Are there likely to be data duplications, e.g. multiple records with the same data but a different customer identifier – Bob Smith and Robert Smith, both the same age and both having the same social security number? Or different product codes used for the same product coming from different factories.
11) If there are partial duplications, which system should have preference? Which of the databases/ fields is likely to be the most accurate?
12) How will the fields from the different source databases need to be transcribed or transferred? Will data in certain fields need to be combined or separated or altered in any way to fit the needs of the SCV?
13) Will additional data need to be appended to make the SCV more useful/valuable? E.g. by adding demographic or psychographic (attitudinal) attributes.
14) Will additional fields be required to capture group statuses (e.g. family relationship, shared location/phone number, proximity to which store, generation)?
15) What capabilities will be required to allow self-service data interpretation and analysis?
16) When the new SCV is created, what will be the plan to ensure that any/all of the above considerations can continue to be addressed as new data is added to each of the current systems? I.e. to monitor any new data for data quality issues.
This list is not meant to be exhaustive, but rather to indicate the potential breadth of data quality issues and the importance of attending to them before attempting to create an SCV. If the list seems daunting, it’s because achieving an SCV can be! Some of our customers have multiple databases with millions of records that needed to be consolidated to create the SCV. Often legacy query language skill deficiencies hampered discovery. Without a powerful data profiling tool to help, one that’s easy to use and doesn’t use a query language, the consolidation process has sometimes extended over several years! However, with the right tool, discovering the answers to these key questions and implementing the necessary conversions—using a rules-based approach—can normally be reduced to weeks or months. And going forward, the entire process can be automated, including ongoing data monitoring.
What kind of tool can do that?