Skip to main content

Library of Congress' big data problems similar to those in smaller enterprises

Paul Newman Archive

At the beginning of 2013, the Library of Congress provided an update on its massive Twitter archive, which includes approximately 170 billion tweets to date. Like many other organizations, the Library, hopped on the big data bandwagon and encountered technology challenges when it came to organizing the content in a way that upholds data quality for meaningful interpretation.

The Library agreed to collect, preserve and organize the tweets starting April 2010, including the 21 billion messages that had already been tweeted since 2006. This was meant to serve as a new kind of archive to capture the era of social media communication. Now, the Library's system can harvest and store the information, but it needs to develop a method for organizing and cataloging it to generate comprehensive insight.

Many companies encounter similar issues when deploying new business intelligence technology. Deploying the infrastructure simply isn't enough to churn out the actionable interpretations they want. Companies must include data governance in their plans to parse old and new content andensure quality measures are implemented across departments, according to IBM Data Management.