Skip to main content

Genetics expert explains big data management

Rachel Wheeler Archive
The motive behind data quality programs has always been clear. What is less certain is the way to ensure that information stays accurate as companies acquire larger data reserves, a pressing problem as modern technology has encouraged and enabled large-scale retention efforts.

The ENCODE consortium is dedicated to logging genetic information. As this data is, by its nature, huge, the strategies used by the group are relevant to modern corporate interests. ENCODE coordinator Erin Brinney contributed an overview of consortium data management principles to Nature.

Brinney stated that the early stages of data collection should have slightly different quality requirements than general storage. She noted that correcting errors in a timely manner is more vital than keeping high absolute integrity. She urged more and earlier access to data generated by scientific projects. In the big data age, that lesson could translate to the enterprise world.

The rise of big data in a corporate setting has led to a number of transfers between different types of systems. According to eWeek, these migrations should come after careful quality checking. The source noted that data transfers from legacy, hardware-based systems are especially suspect, quality-wise.