New research out of the University of Zagreb proposes a novel new method for protecting the privacy of non-implicated individuals, by analysing the subjects’ motion using the Microsoft Kinect sensor and then replacing the person with rendered CGI animation of equivalent human models – and then hiding the original information inside the altered frame.
Towards Reversible De-Identification in Video Sequences Using 3D Avatars and Steganography [PDF], by Martin Blaževic ́ et al. contends that unrestricted access to footage of innocent individuals from locations such as transit areas, banks and shopping centres, could enable tracking of them in real time. Therefore the group’s research is aimed towards obfuscating the specific individual with a CGI ‘avatar’, but re-encoding the original frame inside the doctored frame using the Least Significant Bits (LSB) algorithm.
The process begins with input video (image 1, below), followed by pedestrian detection (image 2) using Histograms of oriented gradients (HOG) detection. Segmentation (image 3) involves the localisation and tracking of an individual using GrabCut, a tool that Microsoft is currently researching for the purposes of automated background removal.
The next task is pose estimation, which can be approached by many possible methods, including the use of neural networks. Though this can provide adequate information to proceed, at this point the research group’s process splits away from the analysis of flat ‘legacy’ security footage and continues from the presumption of more intelligent sensors which use spatial mapping, such as Microsoft’s Kinect sensor apparatus, and the sequencing becomes similar to that used in motion-tracking systems which use actors’ movements to animate CGI creations, most famously (in the last 15 years) employed by Weta Digital to create the Gollum character for Peter Jackson’s Lord of the Rings and Hobbit trilogies.
Using the Kinect SDK the researchers, who presented this work at the Fourth Croatian Computer Vision Workshop (CCVW) 2015 in Croatia, map the perceived Kinect joint-estimations into a skeletal animation which can then be pipelined into a skinned model processor to produce a skeletal animation upon which pipeline-enabled 3D animation and rigging programs such as Maya, MakeHuman and Blender can be utilised to build up colour-matched human CGI imagery, complete with clothing that mimics the source original.
Steganographic encoding is envisioned for the process in order not to permanently lose identification information which may prove useful later in an investigation, and part of the encoding process involves the replication of pixel colours taken from the original individual depicted in the footage. Future research aims to employ multiple Kinect arrays to capture a wider radius of possible joint and limb movements, and there is also considerable work to be done regarding accurately re-mapping sudden movement or extreme poses such as crouching.