Policy by the Numbers: May 2012

The other open source for the civic sector

Tuesday, May 29, 2012

Last month, the Consumer Financial Protection Bureau (CFPB) released two statements summarizing their software policies:

"We use open-source software, and we do so because it helps us fulfill our mission"

"When we build our own software or contract with a third party to build it for us, we will share the code with the public at no charge."

The first sentiment dominates the discussion around open source and the civic sector: How can governments leverage open source tools in their operations? An April 2010 study by the Center for Strategic and International Studies surveyed government open source policies “as reported in the press or other media” and found 354 policies. The results are shown on the map below, categorized by region and progress—approved, proposed, or failed (click on region marker to display totals).But it’s the the CFPB’s second statement that is truly groundbreaking. I propose three reasons that every country, city, and state should follow CFPB’s lead and make customized IT solutions publicly available.First, open models can produce better products. The current products of large IT procurements are too often bespoke, closed, and expensive. With the rise of Code for America and a growing community of civic technologists, government can engage citizens to produce better products. As Eric Raymond put it, "Given enough eyeballs, all bugs are shallow." A github account for every city and state agency would create an opportunity for hacker-citizens to participate in improving government at almost no cost.Second, since taxpayer dollars are used to create source code, the taxpayer should have access to it. Just as many governments proactively release data and information to the public, so should they proactively release their code to the public. This may seem scary—governments operate under heightened scrutiny and critics have ways of repackaging mistakes as scandals. Code, however, should be thought of as just another type of public record that, like all records in the public domain, are open to scrutiny.Finally, positive network effects occur when one entity’s use of a good increases the value of that good for others. For an example, Wisconsin is currently procuring a new student information system. If the state required the developer of that system to release it under an open source license, each consecutive city or state that needed a similar tool could save millions by adapting WI’s tool for their own use. Better yet, Wisconsin would not have had to spend so much on a new system if another state released the bulk of their code for an existing similar system.There is no compelling reason for most government software to be closed source. IT contracts are often customized solutions built by large developers or consultancies and could easily include public access provisions. Ottawa-based Getting Open Source Logic Into Governments (GOSLING) argues that because of redundant development, the Canadian government spends “$1.5 billion buying software that could cost only a third of that.” The amount of unnecessary expenditures in larger national governments is likely even higher. Governments could share and adapt solutions, thus driving down the costs for the entire civic sector.Zac Townsend is co-founder and Executive Director of OpeningData.org

Transparency for copyright removals in Google search

Thursday, May 24, 2012

Cross-posted from the Official Google BlogAt Google, we believe that openness is crucial for the future of the Internet. When something gets in the way of the free flow of information, we believe there should be transparency around what that block might be.So two years ago we launched the Transparency Report, showing when and what information is accessible on Google services around the world. We started off by sharing data about the government requests we receive to remove content from our services or for information about our users. Then we began showing traffic patterns to our services, highlighting when they’ve been disrupted. Today we’re expanding the Transparency Report with a new section on copyright. Specifically, we’re disclosing the number of requests we get from copyright owners (and the organizations that represent them) to remove Google Search results because they allegedly link to infringing content. We’re starting with search because we remove more results in response to copyright removal notices than for any other reason. So we’re providing information about who sends us copyright removal notices, how often, on behalf of which copyright owners and for which websites. As policymakers and Internet users around the world consider the pros and cons of different proposals to address the problem of online copyright infringement, we hope this data will contribute to the discussion.For this launch we’re disclosing data dating from July 2011, and moving forward we plan on updating the numbers each day. As you can see from the report, the number of requests has been increasing rapidly. These days it’s not unusual for us to receive more than 250,000 requests each week, which is more than what copyright owners asked us to remove in all of 2009. In the past month alone, we received about 1.2 million requests made on behalf of more than 1,000 copyright owners to remove search results. These requests targeted some 24,000 different websites.

Fighting online piracy is very important, and we don’t want our search results to direct people to materials that violate copyright laws. So we’ve always responded to copyright removal requests that meet the standards set out in the Digital Millennium Copyright Act(DMCA). At the same time, we want to be transparent about the process so that users and researchers alike understand what kinds of materials have been removed from our search results and why. To promote that transparency, we have long shared copies of copyright removal requests with Chilling Effects, a nonprofit organization that collects these notices from Internet users and companies. We also include a notice in our search results when items have been removed in response to copyright removal requests.We believe that the time-tested “notice-and-takedown” process for copyright strikes the right balance between the needs of copyright owners, the interests of users, and our efforts to provide a useful Google Search experience. Google continues to put substantial resources into improving and streamlining this process. We already mentioned that we’re processing more copyright removal requests for Search than ever before. And we’re also processing these requests faster than ever before; last week our average turnaround time was less than 11 hours.At the same time, we try to catch erroneous or abusive removal requests. For example, we recently rejected two requests from an organization representing a major entertainment company, asking us to remove a search result that linked to a major newspaper’s review of a TV show. The requests mistakenly claimed copyright violations of the show, even though there was no infringing content. We’ve also seen baseless copyright removal requests being used for anticompetitive purposes, or to remove content unfavorable to a particular person or company from our search results. We try to catch these ourselves, but we also notify webmasters in our Webmaster Tools when pages on their website have been targeted by a copyright removal request, so that they can submit a counter-notice if they believe the removal request was inaccurate.Transparency is a crucial element to making this system work well. We look forward to making more improvements to our Transparency Report—offering copyright owners, Internet users, policymakers and website owners the data they need to see and understand how removal requests from both governments and private parties affect our results in Search.Posted by Fred von Lohmann, Senior Copyright Counsel at Google

Visualizing Economic Recovery: G+ Hangout Debate

Wednesday, May 16, 2012

At the end of March, we announced a data visualization competition we’re co-sponsoring with the Guardian Datastore.On May 15, we’ll gather three economic experts—Martha Lane Fox (lastminute.com, Antigone), John Kao (Institute for Large Scale Innovation, Daily Beast), and Larry Elliott (Guardian)—in a Google+ Hangout to discuss ways that the world’s economies might recover from the current recession as well as which countries are creating positive climates for new technologies. The Guardian’s Simon Rogers will chair the Hangout debate.The Hangout be broadcast via Hangouts on Air, which will stream live to Google+ and to YouTube.We want to know what you want to hear our experts talk about! Use the comment field on the Guardian post to let us know!

Risks of the open Internet

Tuesday, May 15, 2012

In February the FTC published their yearly Consumer Sentinel Network report. The report examines the various kinds of complaints the FTC gets from consumers on the Internet. One way to think about the report is that it provides us with a key to prioritize actions—government, academia, industry, etc.—in response to risks on the open internet. The complaints included in the Sentinel Network—7 million of them from the period 2007-2011—give a helpful portrait of the online risk landscape.First, we learn that the Internet, with its vast offering, still garners fewer consumer complaints than the old telephone network. For example, in the same period internet complaints were collected, the Do Not Call Registry registered 9 million complaints. Second, the composition of the complaints helps us assess the priorities of consumers on the Internet. 55% of the complaints are about fraud. A total of 1.5 billion USD was reported to have been lost in Internet-related fraud, by the 43% of consumers that reported a loss. If all consumers who experienced a loss had reported fraud, the number could be at least double that figure. These are not small amounts: the median reported loss was around 500 USD, and the average was above 1000 USD. (For comparison it would be interesting to know what average amounts are for offline fraud). Identity theft trails fraud with 15% of complains. Together, these two concerns make up 66% of the complaints in the CSN, suggesting that government resources should be concentrated on these two issues.Third, in 2011, the CSN received 1.8 million complaints in the US. The US had 240 million internet users in 2010, indicating that 0.75% of Internet users reported to the CSN. This suggests that for the vast majority of US Internet users, their 2011 internet experience was a safe one. While this is good news, the 1.5 billion USD lost in fraud is nothing to sneeze at, however, and should be taken very seriously.Fourth, the web at large is not the primary channel for online fraud. Instead, it’s email: of the 60% of complainants that reported how they were first contacted, 43% said email. Only 13% had their initial contact through a website. The real dangers are not the web sites you visit, then, but rather unseemly messages in your inbox. Luckily, many sites have sprung up that allow you to check if the offer that seems to be too good to be true really is. The internet has its share of entrepreneurs and parasites, making trust online extremely important. The reason the Internet can carry e-commerce worth more than 180 Bn USD in the US alone is that consumers trust the network. Finding ways to inform and empower people to have trustworthy relationships matters. Given the findings of the CSN report, any agenda for improving trust should start with addressing email fraud and identity theft, ideally through research that includes crowdsourced data gathering. Eric Davis is a Policy Manager at Google.

Data-driven peacebuilding

Wednesday, May 9, 2012

The very nature of peace negotiations has fundamentally changed. Once an art involving a select few and practiced behind closed doors, negotiations are now broader social conversations that include the voices of diaspora. Except for cases of the most draconian censorship, high-level stakeholders in peace negotiations have very little control over this social hubbub, out of which vital concerns, critiques, ideas, and alternatives can emerge. Indeed, the very agenda of official negotiations can now be contested and debated in real time on the web. Thanks to reams of official texts, a plethora of social media updates, and mainstream media coverage, today’s peacebuilding and peacekeeping efforts generate, respond to, and are informed by massive amounts of data. We must forgive the negotiator or peacekeeper trained in traditional methods for feeling helpless and confused. This is a complex new world for peacebuilding, and one that is in constant flux. As more data becomes available, new insights threaten and even overturn staid assumptions. For example, an August 2011 Fast Company story reports on the US Department of Defense’s plans for the Empirical Studies of Conflict (ESOC) archive, funded by an $8.6 million grant from DoD. ESOC’s mission is to make available to academics previously hard-to-access data on global conflict. Fast Company reports, “ESOC discovered a previously unnoticed—and counterintuitive—correlation between unemployment rates and politically motivated violence,” where higher unemployment was associated with less politically motivated violence.

Sanjana Hattouwa's TED talk on citizen journalism.Contentious as it may seem, using data analysis to challenge assumptions and even predict outcomes is a growing trend. For example, The Grill’s computer modeling has been used globally to predict the outcomes of conflict. The PAX initiative “plans to launch a global digital system to give early warning of wars and genocide.” And the non-profit organization Benetech has been contracted by the likes of Amnesty International and Human Rights Watch to address controversial geopolitical issues via data science. Of note, in an exhaustive analysis of over 80 million documents from the secret files of Guatemala’s National Police, Benetech’s scientists employed random sampling to confirm that genocide was committed against the Mayan population during the country’s civil conflict, which lasted from 1960 to 1996. Image analysis is another area with potential data-intensive peacebuilding applications. The LRA Crisis Tracker combines images with a number of sources, including on-the-ground and situation reports from the UN system, to present a temporal and geospatial representation of one of the world’s most brutal terrorist groups in one of the world’s most unstable regions. The visual impact of this representation compels the viewer to investigate the conflict and ways to help, which may lead to meaningful and early intervention in crimes against humanity. All of these early examples hint at data’s potential to meaningfully impact the domain of peacebuilding and peacekeeping, but we cannot simply assume that predictive modeling, random sampling, and other “Big Data” applications are automatic, easy solutions. Figuring out how to effectively leverage massive amounts of data to save lives and build peace is going to be a challenge—but it is one worth taking on. Sanjana Hattotuwa is a TED Fellow and Special Advisor with the ICT4Peace Foundation.

Jack and Jill the Innovator, Episode 8: Art Neill, New Media Rights

Thursday, May 3, 2012

We’re back with another edition of our Jack and Jill the Innovator video series, which spotlights innovators of all shapes and sizes and gives them an opportunity to tell their stories.Our next episode features Art Neill, the founder of New Media Rights, a nonprofit that helps artists and entrepreneurs navigate the complex legal thicket around creating art and media online. To set the context for why this is important, you might also read John Tehranian’s Infringement Nation. Tehranian explores how, without committing anything close to copyright piracy, you might find yourself in a legal grey area when you create content online. “To illustrate the unwitting infringement that has become quotidian for the average American, take an ordinary day in the life of a hypothetical law professor named John...

“By the end of the day, John has infringed the copyrights of twenty emails, three legal articles, an architectural rendering, a poem, five photographs, an animated character, a musical composition, a painting, and fifty notes and drawings. All told, he has committed at least eighty-three acts of infringement and faces liability in the amount of $12.45 million (to say nothing of potential criminal charges). There is nothing particularly extraordinary about John’s activities. Yet if copyright holders were inclined to enforce their rights to the maximum extent allowed by law, barring last minute salvation from the notoriously ambiguous fair use defense, he would be liable for a mind-boggling $4.544 billion in potential damages each year. And, surprisingly, he has not even committed a single act of infringement through P2P file-sharing. Such an outcome flies in the face of our basic sense of justice. Indeed, one must either irrationally conclude that John is a criminal infringer—a veritable grand larcenist—or blithely surmise that copyright law must not mean what it appears to say. Something is clearly amiss. Moreover, the troublesome gap between copyright law and norms has grown only wider in recent years.”Derek Slater is a Policy Manager at Google.