Researchers at Stanford and Princeton have succeeded in identifying 70% of users by comparing their web-browsing history to publicly available information on social networks.
The study De-anonymizing Web Browsing Data with Social Networks found that it was possible to reattach identities to 374 sets of apparently anonymous browsing histories simply by following the connections between links shared on Twitter feeds and the likelihood that a user would favour personal recommendations over abstract web browsing.
The test subjects were provided with a Chrome extension that extracted their browsing history; the researchers then used Twitter’s proprietary URL-shortening protocol to identify t.co links. 81% of the top 15 results of each enquiry run through the de-anonymisation program contained the correct re-identified user – and 72% of the results identified the user in first place.
Browsing history is not directly accessible to websites, but data brokers and advertisers can gather up sufficient history via tracking cookies, supercookies and flash cookies to potentially conduct reidentification. The NSA has done advanced work in this field, and earlier researchers have noted that such information is for sale.
Ultimately the trail only leads as far as a Twitter user ID, and if a user is pseudonymous, further action would need to be taken to affirm their real identity.
Using https connections and VPN services can limit exposure to such re-identification attempts, though the first method does not mask the base URL of the site being connected to, and the second does not prevent the tracking cookies and other tracking methods which can provide a continuous browsing history. Additionally UTM codes in URLs offer the possibility of re-identification even where encryption is present.
Graduate student Ansh Shukla, one of the paper’s contributors, believes that use of the TOR browser is probably the strongest defence:
“We speculate that this attack can only be carried out against Tor users by well-resourced organizations on high-value targets…Think cyber-espionage, government intelligence, and the like.”
“It is already known that some companies, such as Google and Facebook, track users online and know their identities,” commented assistant professor of computer science at Princeton Arvind Narayanan, another contributor the research, and emphasised that users who choose to sign up with social networks are likely to be easier to track.
“Users may assume they are anonymous when they are browsing a news or a health website, but our work adds to the list of ways in which tracking companies may be able to learn their identities.”