With all the hype around “Big Data” lately, you may be inclined to shrug it off as a business fad. But there is more to it than a buzzword. Data science is emerging as a new field, changing the ways that companies get to know their customers, governments their citizens, and relief organizations their constituents. It is a field which will demand entirely new skill sets and information professionals trained to collect, curate, combine, and analyze massive amounts of data.
Today, we create data both actively—as we socialize, conduct business, and organize online—and passively—via a host of remote sensing devices. McKinsey projects a 40% growth in global data generated annually. Companies and organizations are racing to find new ways to make sense of this data and use it to drive decision-making. In the health sector, that includes investigating the clinical and cost effectiveness of new drugs using large datasets. (McKinsey estimates that the efficient and effective use of data could provide as much as $300 billion in value to the United States healthcare sector.) In the public sector, it could mean using historical unemployment data to reduce the amount of time it takes unemployed workers to find new employment. And in the retail sector, it leads to tools that helps suppliers understand demand in stores so they know when they should restock items.
The information field is the right place for data scientists to learn how to apply statistical methods to real world social data online. But data science is so often driven only by statisticians, computer scientists and mathematicians. By bringing social scientists and information professionals into the mix, we hope to shape a more holistic future for data science. Since the data is multidisciplinary, so, too, should the field dedicated to making sense of it.
To explore such opportunities, the UC Berkeley School of Information’s upcoming DataEDGE conference not only presents the thinking of some of the country’s top statisticians and computer scientists, but brings into the conversation ethnographers, whose perspectives can help us understand what broader research and analytics fields need to consider as they define the roles and responsibilities of data science. For example, Intel anthropologist Genevieve Bell will give a thoughtful portrait of how we might be “fetishizing big data” in the same way that we’ve credited previous technological advances with transformative power.
Instead of PowerPoint presentations and talking heads, the conference will be small and conversational. Some of the highlights:
danah boyd , Senior Researcher at Microsoft Researcher, examining some of the key challenges for managing privacy in an era of big data.
NPR commentator Geoff Nunberg facilitating a discussion between sociologist Matthew Salganik and linguist Mark Liberman about data and the human sciences.
Factual’s Gil Elbaz talking about tomorrow’s Internet with a thriving data layer that he believes will enable access to clean, structured data at massive scale and at unprecedented speed.
All in all, this promises to be a diverse, thought-provoking and engaging event, set to have a big impact on the direction of data science in this formative stage.
DataEDGE will be held at UC Berkeley May 31 to June 1. If you’re interested in this opportunity to participate in the future of data science, register at http://dataedge.ischool.berkeley.edu .
Heather Ford is a Research Specialist at UC Berkeley School of Information.