Data Discovery describes a range of techniques designed to collect and consolidate information before analysing it to find relationships and outliers between entities (or data items) that may exist. This process may be done on data from the same database or across multiple, disparate databases.
With Big Data being maintained by more and more companies from an ever-increasing number of data sources – the ability to actually drive decisions from this mass of data has become more important than ever. Until the raw data has been analysed using techniques such as Data Discovery, it has very little value for a business.
Data Discovery can help by:
An example of Data Discovery would be discovering which systems are connected by certain keys or identifiers. This is important because understanding connectivity between systems is useful for building accurate data models and a true account of what business services depend on certain data sources.
Data Discovery can also refer to the discovery of dependencies between data elements, both within the same table and across disparate tables. We say that two attributes are dependent when the value of one attribute has a possible influence on values of another (or more) attributes. Dependency analysis is a valuable technique for uncovering hidden data quality rules that require ongoing management and control.
Data Discovery and Dependency Analysis require complex analytical processing functionality to be executed and ideally a correlated architecture for performance reasons.