The first version of Experian Pandora, aimed at making it simple for non-technical users to produce 100% correct designs for data integration projects has been released.
The first release of Experian Pandora leverages the capabilities of the X88 Panorama database engine to provide the most powerful data profiling solution ever. Although a wide variety of products already exist, they are either based on 10 year old technology, unable to support current data volumes, or are simply glorified query tools that require the user to know what problems they are looking for in advance.
With Experian Pandora, you can take data from anywhere, no matter what condition it is in. Experian Pandora simply loads that data and, as part of the load process, profiles it in its entirety, using 100% data volumes for complete accuracy.
What is remarkable is that Experian Pandora automatically evaluates relationships between every column and every table, simply by loading the data – even if there is only one value in common. This adds nothing to the load time and is a fantastic by-product of the way Experian Pandora stores data inside the Panorama database. For example, if I wanted to know how many times Tony appears across my entire enterprise, I can see that information instantly. I can navigate to all the places Tony is stored, regardless of how many tables or columns reference it, and get to the rows – all with zero wait time. Even if Tony is embedded somewhere, I get the same capabilities.
Experian Pandora tackles data profiling & discovery from the data up – reverse engineering everything about the content, structure, quality and relationships inherent in the data from the data itself. Any metadata that is supplied or available is purely documentary, or used to provide initial syntax constraints (e.g. expected lengths, datatypes etc). Experian Pandora never rejects data, never samples and pro-actively tells you what problems are in the data.
Performance is astounding, both in how fast it loads and analyses data, and in the drill down performance. For example, Experian Pandora will tell you the distribution of values in every column – instantly. Selecting, some of those and drilling down to the rows containing those values is instant – regardless of the number of rows in the table or the number of values you select. Loading performance is amazing, scaling across multiple processors and being able to utilise 64-bit processor technology. It’s also completely linear, keeping consistent pace across small and large datasets alike..
As well as unique innovations, such as automatic relationship analysis, the team has also addressed some of the real pain points of existing technologies.
Dependency analysis is generally viewed as useless. Why? Because you can never analyse enough of the data to get an accurate result. Very few products do dependency analysis, and those that do, in some cases, limit themselves to discovering those that are very close to 100% correct ( with 1 or 2 errors only), and only use a very small, unrepresentative random data sample. This is because the basic technique that is used requires multiple stages of sorting and comparison of data, and the only available optimisation is to use the well-known Armstrong reductions to do fewer comparisons. All of the current implementations also run entirely in memory, and take ages.
Experian Pandora has an entirely new way of discovering Functional Dependencies of any quality, and multi-column keys, using the entire dataset, and doing so in less time than existing product can analyse a small sample.
It is radically different to anything else available, and exploits the unique way we store the data. It makes dependency analysis truly useful, accurate and per-formant. Before we came up with this new approach, we took the standard route, that many other products employ, but at least took advantage of the fast processing of our database engine. For an example file of 1 million records with 32 columns at full volumes, and only looking for a single column on the left-hand-side of the dependency, it took several days on a workstation to obtain the results! Using our new technique, and with 3 columns on the left-hand-side, we had 100% accurate results in several minutes.