Policy by the Numbers: The challenges of censorship detection

The challenges of censorship detection

Monday, April 30, 2012

When I lived in Washington, DC, I was lucky enough to be on the same power grid as the local power company’s headquarters. During an outage, the utility’s site would show me a near-live map displaying how many customers were without power, where they were located and most importantly, where repair crews were headed next. Similar dashboards for centralized systems are all around us. When my car tires lose air pressure, my car tells me. When I order a package, FedEx tells me where it is. So where is our Internet dashboard? If I can find out that a package is being held in Buffalo, shouldn’t I know why my Internet packets didn’t make it to Beijing?

It turns out that identifying Internet censorship, filtering or other web blockages is much more challenging. As Jonathan Zittrain described in his 2009 TED Talk, the Internet is more akin to a mosh pit than FedEx. There’s no “absolutely positively” on the Internet; instead, there’s a series of interconnected, loosely affiliated servers that voluntarily pass your data in the general direction of its destination.

This decentralized structure is what makes the Internet so robust, but it also makes filtering very hard to pinpoint. For example, when Chinese citizens suddenly inundated President Obama’s Google+ page with comments about Internet freedom and other topics, we could see that Google+ was accessible in China. Although the result was clear, the cause was not.

Often people believe that the “Great Firewall” is the extent of China’s censorship, but as Rebecca MacKinnon has explained, that much-scrutinized feature is only the outermost layer of the censorship apparatus. The self-censorship that ISPs and content providers impose upon themselves is actually far more effective. Researchers at Carnegie Mellon University have recently documented how this multi-tiered approach can have a disparate impact on Chinese netizens. On Sina Weibo (China’s Twitter equivalent) 53% of messages originating from the politically contentious region of Tibet are deleted, but only 12% and 11.4% of messages from Beijing and Shanghai, respectively, are removed. They also showed that deletion is inconsistent; 17.4% of posts are deleted, while other posts with the same political terms remain.

So when we see something like Google+ comments from China or the sudden accessibility of previously blocked search terms, the distributed nature of both the network and the filtering makes it hard to know exactly what is occurring.

Thus, what we need is a dashboard for Internet health. The Herdict project at Harvard is one piece of that. By aggregating over 200,000 user-submitted reports about accessible and inaccessible websites, we can map a slice of the end-user experience. Constructing a true dashboard, however, will require data as distributed as the network itself. What each piece of the network knows about itself and its neighbors may be inconsequential on its own, but can powerful when aggregated. Creating a useful measurement tool requires that leaders from browser manufacturers, ISPs, registrars and backbone providers recognize the crucial role they can play in helping identify filtering, censorship and other web blockages.

by Ryan Budish, Project Director of Herdict.org and Berkman Center fellow

Policy by the Numbers

The challenges of censorship detection

No comments :

Labels

Archive

Feed

Disclaimer

Company-wide

Products

Developers