Big Data is as important to the U.S. economy as agricultural products, according to the team behind a new report about how the federal government can better use huge data collections. The report, from the TechAmerica Foundation, was released Wednesday.
Entitled Demystifying Big Data: A Practical Guide to Transforming the Business of Government, the report's aims are to define Big Data and provide policy guidance to the federal government.
The Five V's
In March, the Obama administration announced a new Big Data initiative, with more than $200 million in projects at six agencies designed to advance the technologies and develop the required workforce. Projects include a National Institutes of Health effort to make human genome data more accessible to the public.
Big Data, the report noted, is either structured information in relational databases, or unstructured information, such as e-mail, video, blogs, call-center conversations, or social media. Unstructured data currently constitutes about 85 percent of the data generated today and, the report noted, poses "challenges in deriving meaning with conventional business intelligence tools."
The characteristics of Big Data are its volume, velocity, variety and veracity. The volume is being driven by the increase in data sources and higher resolution sensors, and the velocity -- how fast data is being produced, changed, and processed -- is being driven by improved throughput connectivity and enhanced computing power of data generating deivces, as well as more data sources.
Variety, created by new sources and sources inside and outside the organization, is being pushed by social media, sensors, the rise of mobile and other factors. And veracity, or the quality of the data, is a key requirement of data-based decisions.
Values of Big Data
The report pointed to possible values of Big Data analysis for the federal government, including determining the most effective medical outcomes across large populations, analysis of health anomalies in the hospital or home through sensors, new levels of real-time traffic information, a better understanding of the most effective online learning techniques, better fraud detection, or improved weather predictions.
The report recommended that IT structures at the agencies evolve into massively scalable storage and network infrastructure designs, with planning for data protection, data sharing, data reuse, ongoing analysis, compliance, security and privacy issues, data retention and data availability.
Other recommendations include taking an inventory of data assets, an assessment of a deployment entry point based on current agency capabilities, and an evaluation of which data assets can be opened to the public in order to spur innovation.
Policy considerations include a formal career track for line-of-business and IT managers relating to Big Data management, establishment of a broader coalition with industry and academia, an expansion of national R&D strategy to encourage development of new techniques and tools, and further guidance on applying privacy and data protection practices.
The foundation's Big Data Commission, which generated the report, consisted of appointed commissioners from academia and industry, including representatives from IBM, Western Governors University, Amazon Web Services, SAP, MicroStrategy, EMC, Splunk, Dell and Microsoft. The foundation is the nonprofit arm of TechAmerica, an industry group.
Posted: 2012-10-04 @ 7:14am PT
Enjoyed the piece Barry. Great to see the industry finally adopting the "Vs" of Big Data that Gartner first introduced over 12 years ago. For future attribution, here's a link to the piece I wrote first publicly defining them in 2001: http://blogs.gartner.com/doug-laney/deja-vvvue-others-claiming-gartners-volume-velocity-variety-construct-for-big-data/. Note that you mention 5Vs, but only discuss 4. ??? Anyway, we contend veracity isn't a defining characteristic of Big Data, just an aggregate measure of quality. Note that Gartner has now suggested and published on 12 dimensions of data. --Doug Laney, VP Research, Gartner, @doug_laney