What is data munging?

The standard definition of munging is computer jargon for irrevocably changing or damaging data beyond its original state.

The term is thought to have originated as a backronym for “Mash Until No Good”.

However, when referring specifically to Data Munging, we mean preparing your data for a dedicated purpose and use-case scenario; taking the data from its raw state and into something else, normally for use beyond its original intent.


Examples of data munging

In 2016, email address munging has become common practice. Typically, to prevent spam, a user will destroy the valid format of an email address by writing it in a way that humans understand but computers do not, such as:

JohnDOTdoeATJohnDoeDOTcom or John(dot)doe(at)John(dot)doe(dot)com

Conversely, Data Munging can refer to organising unorganised data. In other words, making the data fit for purpose.

A specific example of data munging might be used in Machine Learning, in order to restructure data in a way that could be used by a learning algorithm. 



