Canada’s Citizen Lab project has compiled an interesting list of resources aggregating a raft of global efforts to determine which search terms are blocked within Chinese search engine results. Since the sources vary in formatting and other significant respects, and since the data, while not necessarily qualifying as ‘big’, is rather ungovernable, CCL has also taken the trouble to compile the various source lists into a public online spread sheet. The sample Google doc will not be updated, but its source Github repository will be.
An initial glance at the sample data inspires memories of that first time you looked up rude words in a dictionary as a kid. Well, they’re all in there, and then some. But after indulging one’s initial prurience, there are some fascinating and mystifying entries in this lexicon of the forbidden.
The reactive and ongoing nature of Chinese censorship becomes quite clear by the specificity of many of the translated examples provided, such as the initially bewildering references to ‘Ferrari hits bridge’, ‘Ferrari — 4 am’, ‘Ferrari car accident’, etc. On initial investigation these particular terms seem initially to refer to a sad but routine motorway accident in Beijing this year, where one of three passengers involved in the crash died when their Ferrari hit a guardrail.
Join The Stack in September for a look at the latest Pharma Tech - at the largest gathering of industry professionals in Europe.
In fact the cause of China’s ‘Ferrari alert’ seems to be a 2012 sex-scandal where Ling Gu, the son of president Hu Jintao’s chief-of-staff, was killed when he crashed his black Ferrari 458 Spider into a wall, leaving his two female passengers injured and one paralysed. All three were reported to have been in a state of undress and suspected of being involved in sexual activity at dangerous speed at the time of the accident. The blacklisting was likely an unfortunate anniversary present for the Italian luxury car manufacturer’s 20th anniversary in China.
And so the baby goes out with the bathwater in the Great Blacklist of China (GBC). ‘Ferrari hits bridge’ clearly refers to a traffic incident with no political significance, but because another politically-charged Ferrari hit a wall two years earlier, you can’t search for the latter incident, apparently. It’s the Scunthorpe Effect upgraded.
Amazingly, the Chinese word for ‘system’ seems to be on the blacklist – a considerable hindrance to online researchers in many fields. The explanation given (by Sina Show Censorship Research) is that the exclusion may be related to a news story in 2012 that 1 in 3 Chinese shoppers were victims of fraud to an annual loss of £2.9bn ($4.7bn).
Not all of the Chinese words in the repository have yet been translated, and those that have are divided between human and machine translations (sometimes with the former shedding light on the latter).
A great deal of the denied words are predictably concerned with sex, crime, drugs, warez, methods of breaking through the Great Firewall of China (GFC) and the events at Tiananmen Square in 1989. Once you have scrolled passed a large tranche of dissidents, references to gathering points, hunger strikes and official figures who are none of your business (if you’re Chinese), obscurities and oddities become more apparent; for instance, the apparent banning of Korean woman golfer Lee Jee-young, who may have become confused in the blacklist with the case of a South Korean woman of the same name kidnapped in 2007.
The appetite for online Asian erotic fiction is extraordinarily well-represented among the data, with a large swathe of online titles excluded from search terms.
The data as collated by CCL is categorised, and ‘Prurient interests’ occupies a considerable number of rows, with much of the content predictable. Where it isn’t, I find myself curiously reluctant to Google the likes of ‘Chrysanthemum flying fish’ (which appears in this category) in a work environment, and likewise ‘Grass pomegranate community’. The simple term ‘human body’ is also apparently excluded, as is, mystifyingly, ‘Hotel-related work experience preferred’ (from Sina UC). Well, better safe than sorry, I suppose.
The ‘Context unclear’ category contains the apparently generic ‘A good reputation in the industry’, ‘Audit’, ‘Public Bus’, ‘Jointly signed’ and ‘Interview’.
‘Peace’ is also a banned term in this category, as is ‘Overseas’, ‘News or information’, ‘Mental illness’, ‘Independence’, ‘Have a conversation’, ‘defeat’, ‘Little-known’ and ‘Negotiation’.