TCP/IP headers can leak details about what you are watching on Netflix despite HTTPS protection, according to new cybersecurity research.
The study, conducted by Andrew Reed and Michael Kranch of the U.S. Military Academy at West Point, explored Netflix’s implementation of HTTPS and the ability to identify content information in real-time through the passive capture of network traffic.
The paper, titled Identifying HTTPS-Protected Netflix Videos in Real-Time, explains that the TCP/IP headers of a Netflix HTTPS stream provide a 99.5% success rate for identifying video content – with the majority of identifications occurring less than two and a half minutes into the video stream.
The research found that the variable bitrate (VBR) encoding leaks the contents of a particular flow, despite the use of encryption, notably as the byte-range portion of the HTTP GET commands sent by the browser perfectly align with individual video segment boundaries.
From a dataset containing over 42,000 Netflix videos, the team applied an automated crawler to collect video fingerprints and gathered an average of 7.86 fingerprints per video.
‘A researcher can build the fingerprints for a video by capturing the metadata transmitted during the first few seconds of a stream; it is not necessary to watch the entire video,’ the researchers explained.
By indexing the metadata against the fingerprints, Reed and Kranch were able to capture the fingerprint on someone else’s connection and identify the video content.
Despite using a 10-year old machine with 2x quad-core Intel Xeon 2.0 processors running at 2GHz, on Linux Mint 17.3 MATE, the equipment was able to process around 184 million fingerprints in 15 minutes.
The research found anomalies with the two films 2001: A Space Odyssey and The Gospel Road: A Story of Jesus – both of which have lengthy periods where the screen is completely dark, resulting in ‘”flat” windows that consist of 30 identically-sized segments.’
In testing, the research duo arranged two MacBook Pros to ‘watch’ 20 minutes of each video stream from a randomly-selected list of 100 Netflix titles, before switching to the next. They discovered that on average the algorithm correctly identified the videos within three minutes and 55 seconds.
The paper suggests that the problem can be resolved – ‘the browser could average the size of several consecutive segments and send HTTP GETs for this average size. As an alternative approach, the browser could randomly combine consecutive segments and send HTTP GETs for the combined video data.’