Deriving a physical data model of existing systems is a pressing requirement among many large IT departments at the moment. A clear and full understanding of enterprise data and how it all relates together is an essential prerequisite for projects concerning data governance, compliance, master data management, process optimisation, and so on.
This article describes how you should carry out the three most important tasks.
1. Discover Relationships
Relationship analysis is the single most important activity when deriving the enterprise model. It allows you to find and identify data items, and find intra and inter system relationships. The most reliable form of relationship discovery is based on finding common values in the actual data, which is exactly what Experian Pandora does, automatically.
- Quickly identify data items by finding overlap with known reference tables; perform relationship searches from the column of reference data to “everywhere else”.
- Identify the same data in multiple locations; possible duplication; look for high domain overlap and/or similar field names (use Edit Distance and phonetic matching functions to compare them)
- Identify the actual intra-system relationships between tables in a database; look for relationships with a domain quality of 100% or close to it between tables in the same schema.
- You cannot rely solely on the information in the database schemas because it is not always complete. For performance reasons the database integrity rules are often disabled, leaving their management and enforcement to various application programmes. Over time, invariably, something goes wrong and the relationships get broken.
- Identify how data flows through your systems – inter-system relationships; look for relationships with a domain quality of 100% or close to it between tables in different schemas.
- Database schemas simply do not have information about relationships with other systems. To analyse relationships across systems you would usually need to load the data from both into a relational database and write some SQL code for each potential/suspected relationship. This would be a long, fastidious task. No-one would want to do it and the risk of error is high. Experian Pandora performs this analysis automatically.
2. Search For Hidden Data
The next most important investigative activity is to search for hidden data; values which match certain patterns or which embed other values. Experian Pandora’s ability to quickly perform a search of all data is a practical necessity for this task to be effective.
- Use parsing functionality, reference/domain tables and regular expression pattern matching to identify values
- Search for particular values explicitly
3. Centrally Manage Your Findings
Document and share what you find directly within Experian Pandora.
- Create Notes about everything, investigations, findings, conclusions. Include example data (attach drilldowns) to illustrate a point.
- Build a dictionary of Business Terms and associate them with each of the data items identified. This is essential for subsequent project scoping and establishing consistent and complete Data Quality monitoring.
Experian Pandora data management software enables organisations to understand, transform and manage the quality of their data more easily and quickly than any other product or manual approach. Experian Pandora is used on projects such as data integration, DWH, fraud detection and data governance. Contact us for more details.