Skip to main content

Was data preparation simply an S.E.P. (Somebody Else’s Problem)?


Derek Munro 5 minute read Data preparation, Data quality

Data preparation is nothing new. We all have experience of it, but to quote the English writer Douglas Adams, it was an S.E.P. (Somebody Else’s Problem) and we didn’t notice it was there. Until now it hasn’t been recognised and named as a business requirement so of course there have been no recognisable solutions.

"It was an S.E.P. and we didnt notice it was there" - Douglas Adams

Having acquired a taste for self-service in the world of BI and reporting, business people want more. Now, self-service data preparation offers organisations a chance to be yet more agile while avoiding some infrastructure and technical specialist costs.

In this article I will explain where I think this new market has come from, why it is relevant to all organisations, and where the benefits can be seen. In a subsequent article I will list some of the things to look for in a self-service data preparation solution.

For years many of us have been using the tools at our disposal to transform raw data, possibly from multiple sources, into some new ad-hoc format requested by colleagues in a business area. The data asked for could be a simple report, or for use in another tool or process, or to provide specific information to an external business partner, or to perform some clever analysis, or identify and quantify particular categories of customer, product or transaction - uses and formats are varied.

Excel might not cut-it 

Most business people try to prepare the data themselves using desktop tools such as Excel, and sometimes this works. But increasing volumes, diverse formats and sophisticated needs are becoming more of a barrier to this approach. Once we throw in the importance of data quality, governance and compliance concerns, the desktop solution becomes increasingly unlikely.

The innocent and inexperienced ask their IT departments who hold an emergency meeting about organising a workshop to estimate a plan and another meeting about justifying how they want to divert resources from something urgent that was requested last year, and so on. “We’ll get right on it, it’s going to take three months, but other projects are going to suffer…”.

A friend of mine, who works in IT, said that data preparation requests would arrive on his desk only once things had become urgent, and once official channels had been exhausted. He was considered as one of those “alternative” routes to a solution, a friendly colleague in the right place who was able to bypass official management processes to get a quick result – something we all deplore in other departments but which we fall upon gratefully “when needs must”.

Business self-service

Now, the marketing folks have turned their attention to what is clearly a very real gap in the market, given it a name, and thrown in some of the most popular buzz-words of our time, ‘analytics’, ‘Hadoop’ and ‘cloud’, to make it sound new and exciting.

The real innovation in data preparation, however, is not around analytics, or Hadoop or cloud applications, it’s the additional concept of “business self-service”. In my experience the vast majority of uses can be seen in more traditional business areas, mainly finance and sales & marketing with their traditional data sources, even before the aforementioned new technologies have appeared.

Software vendors are rushing to participate in what I have heard described as already being a $400M software market, with potential to grow to $2B. There are something like thirty vendors in this market, from garden-shed entrepreneurs, to well-funded startups and well-established software giants. Not all the vendors with products that actually provide self-service data preparation currently market their products for that. Some use different words, such as wrangling, transformation, conversion, blending, enrichment or data munging. Some look to position entirely differently as analytics vendors.

When are we doing data preparation?

Even business people don’t always recognise when they are doing data preparation. Our software product, Experian Pandora, was recently unmasked as a self-service data preparation tool by Michelle Goetz, from the analyst firm Forrester. Indeed we have many customers actively “preparing data”, some of whom are still oblivious to the fact they are doing it!

I would expect and encourage all businesses to look at self-service data preparation software, not just for their exciting Hadoop analytics projects, but above all for their existing business processes. Have a real look at how business people are trying to manipulate and transform data outside of your operational systems and then evaluate the variety of solutions available. If they really are ‘self-service’ it shouldn’t be difficult and it really shouldn’t take too long either.