Researchers in China and the U.S. will shortly launch a new platform designed to determine just how inaccurate viral internet news is – and preliminary tests indicate that the lag between false information and the facts that show how inaccurate that news was stands at roughly 13 hours.

Dubbed Hoaxy, the platform is the brainchild of Chencheng Shou of the National University of Defense Technology in Changsha and a team of Italian scientists led by Giovanni Luca Ciampaglia at Indiana University, and is discussed in the new paper Hoaxy: A Platform for Tracking Online Misinformation [PDF].

Architecture of the Hoaxy system

Architecture of the Hoaxy system

Hoaxy employs a custom spider written in Python and diffused via the Scrapy web crawling framework, and compares the content of the four possible kinds of Twitter output – Tweets, re-tweets, quotes and replies – to corresponding information made available via fact-checking sites including,,, and

The primary source for ‘misinformational’ news was a group of 71 domains obtained from research done for the 2015 paper Exposure to Ideologically Diverse News and Opinion, Future Research [PDF] by Eytan Bakshy and Solomon Messing, with patently satirical ‘news’ sites such as The Onion filtered out.

Due to the diverse number of URLs which are likely to lead to the same page (i.e. because of query parameters), Hoaxy strips out all non-essential URL information, including the http/s schema, in order to obtain a definitive canonical URL.

For a three-month period between October 2015 and January 2016 the team collected filtered tweet traffic and ran the information derived against the fact-checking sources, and were able to establish a significant imbalance, even taking into account that the volume of unreliable news sources outstrips the number of verifying sources by an order of magnitude. The paper notes ‘The results suggest that, in the limited number of examples at our disposal, there is a characteristic time lag between fake news and fact checking of approximately 13 hours’.

To run the comparison between the trending story and the (eventual) facts behind it, Hoaxy selects both a distinct source URL and a corresponding URL from a fact-checking site.

In a second strand to the experiment, the researchers developed these ‘opposing’ URLs for a specific viral event – the death of the actor Alan Rickman in January of this year.


The primary thrust of the inaccurate news about Rickman’s death was the misinformation that he had not actually died, likely fuelled both by the popular superstition that famous people die in threes (Rickman’s death followed shortly on that of musician David Bowie), and the corresponding likelihood that the death had been manufactured in an opportunistic way by hoaxers for this reason – and by previous false reports of the deaths of famous people, including Rowan Atkinson, Madonna, Scott Baio, Sharon Osbourne and Morgan Freeman – among many others.

The researchers note that the spreaders of the misinformation, whether hasty journos racing for early traffic or actual hoaxers, constitute ‘few very active accounts that bear the brunt of the promotion and spreading of misinformation, whereas the propagation of fact checking is a more distributed, grass-roots activity.’

Investigation into possible parameters for Hoaxy to take into consideration reveal, predictably, that while the untruth spreads at great speed across prime channels, the actual facts emerge at leisure in secondary channels such as non-prime Twitter feeds and comment sections of secondary sites.

The team behind Hoaxy plan to investigate whether the active spreaders of fake news are actually social bots, and to investigate variances in misinformation>correction times for different categories of news.

Comment Though Hoaxy addresses the digital phenomenon of journalistic misinformation, and though the researchers take into consideration how much ‘citizen journalism’ and viral channels have exacerbated the problem, the syndrome obviously stretches back at least as far as the Caxton press. In the print age resolution and corrections came far later than they are now able to – though with similarly diminished prominence, then as now. Even now, if a lie hits the front page in 100-point type, its apology inevitably languishes in a 12-point face on page 27, weeks, months or (if lawyers are involved) perhaps years later – long after the rush of interest has benefited the publication, and too diminutive to do it any corresponding harm, as opposed to that benefit.

We are certainly not immune to the problem ourselves.