Big data is an IT trend on the fast track. It has pushed past the phase of technical articles and has arrived at the point where entire conferences are dedicated to the topic. But government will have to overcome some barriers to get the most out of big data.
The term "big data" defies precise definition. One useful explanation might be that it involves tools and techniques to use the vast quantities of digital data that are created by IT systems to make decisions that improve efficiency and quality.
Currently, providers use the large quantities of data and improved processing capabilities to make quicker, fact-supported decisions to better meet consumer needs, even down to individual consumers. For example, police departments are using big data to better target deployment of resources to areas with expected criminal activity. And large retailers use big data to improve product selection, supply chain efficiency, pricing and more.
In August 2010, I wrote about a basic public- policy question that big data poses: On the spectrum from reactive government to proactive government, where should America be? (I defined "reactive government" as identifying violations after the fact and seeking redress, which is our current state, and "proactive government" as using technology to monitor activities in real time to prevent violations.)
Government can use big data to gain the same benefits as for-profit firms. Government would be improved by better understanding the discrete needs of its constituents, by improving the efficiencies of its processes, by understanding performance and results, by preventing fraud, by preventing loss — the possibilities are endless.
Big data can improve reactive government: improve the efficiency of government; reduce waste, fraud and abuse; better tailor government benefits; improve the science that underlies regulations; reduce the cycle time of enforcement actions — all of these would improve the performance of government and help manage its cost.
The more interesting question to me is how far the capabilities of big data will eventually push us toward proactive government, and when. If we as a society trust that our government, armed with big data and public support, could have mitigated the worst effects of the real estate collapse and prevented trillions of dollars in losses to our economy, I suspect most people would prefer such a change to happen sooner rather than later. We have, after all, already grudgingly accepted the use of big data, with its invasions of privacy, in the fight against terrorism.
Whatever your vision of the future of government, big data can make government more efficient, effective and even equitable. The move to using big data, however, is not without challenges.
First, to use big data, you need lots of timely data. Some standards and investment will be required to ensure the right data for the specific need is being created and obtained. The data needs to be accessible to those who need it, which will require appropriate policy development. The data also needs to be of known quality and source, and the chain of custody needs to be reliable, requiring additional policy and process development across government.
Second, the IT used to handle big data needs to continue to develop and provide increased capabilities. Innovative software that can process multistructured and multisourced data and perform complex, real-time analytics will be required. Significantly more capable data storage mechanisms and appliances, relying more on solid-state, in-memory storage instead of traditional disk storage, will be needed. Improved sensors and devices to accurately collect data will be required.
Third, big-data projects need a new cadre of professional staff who are proficient in statistics, mathematics and scientific methods. These staff members must ask the right questions and consume the results of analysis effectively. Such personnel needs will require the government to retrain its current staff and attract talented new graduates.
Fourth, the underlying organizational and business models of government will be further stressed. In-place structures and leadership tend to seek and protect the status quo; big data will have a revolutionary impact, with innovators and outside influences pushing the incumbents. As we have observed over time, technology tends to be ambivalent to organizational structures. Big data will provide yet another solid reason to rationalize the organization of government and to attempt to eliminate the significant duplication that exists.
Finally, to use big data effectively, especially toward proactive government, we need to make great strides in creating trust among government and its citizens. The current environment of blame and distrust is the antithesis of what is needed.
A key byproduct of advances in IT — such as the Internet, networked sensors, mobile technology and social media — has been the creation of vast quantities of digital data. We all use technology that creates this data, as consumers, in our professional roles, and in our personal lives.
The speed at which data is being created today is staggering. Ninety percent of the world's data has been created in the past two years, and the growth rate is on an upward trend.
An entire industry has evolved to provide database management systems to organizations that need to process structured data. Much of the contemporary growth in data, however, is unstructured and unsuitable for storage and processing in traditional database management system technologies.
One key source of unstructured data is video. One second of video creates more than 2,000 times as much "data" as a page of text. A growing number of people around the world have the tools to create and distribute video, mainly using cell phones. As a source of information, video certainly can be more valuable than text in promoting understanding.
Here are a few more statistics I found that underscore the huge flow of data being created:
Do the math — it's mind-boggling, even without knowing the difference between a petabyte and an exabyte.