Researchers are investigating ways that attackers can fool voice-based security systems by impersonating a person’s voice.
A team at the University of Alabama, Birmingham (UAB), has found that using readily available voice morphing software, hackers are able to administer voice imitation attacks to breach automated and human authentication systems.
The research was presented last week at the European Symposium on Research in Computer Security (ESORICS) in Vienna, Austria.
“Because people rely on the use of their voices all the time, it becomes a comfortable practice,” said Nitesh Saxena, Ph.D., director of the Security and Privacy In Emerging computing and networking Systems (SPIES) lab, and associate professor of computer and information sciences at UAB. “What they may not realize is that level of comfort lends itself to making the voice a vulnerable commodity. People often leave traces of their voices in many different scenarios. They may talk out loud while socializing in restaurants, giving public presentations or making phone calls, or leave voice samples online,” he added.
Voice is a characteristic unique to each person, it forms the basis of the authentication of the person, giving the attacker the keys to that person’s privacy.
Saxena argued that hackers can easily record a voice clip if they are within close proximity of their target, over the phone via a spam call, using audio snippets found online, or even from cloud storage.
The UAB study – a collaborative project involving UAB College of Arts and Sciences Department of Computer and Information Sciences, and the Center for Information Assurance and Joint Forensics Research – takes these audio samples and demonstrates how they can be used to compromise a victim’s security and privacy.
Advanced voice morphing software can create an extremely close imitation of a person’s voice from a limited number of audio samples, allowing an attacker to speak any message in the victim’s voice.
“As a result, just a few minutes’ worth of audio in a victim’s voice would lead to the cloning of the victim’s voice itself […] The consequences of such a clone can be grave. Because voice is a characteristic unique to each person, it forms the basis of the authentication of the person, giving the attacker the keys to that person’s privacy,” said Saxena.
The researchers applied the simulated attacks to two cases. Firstly, they tested voice-biometrics, or speaker-verification used to secure systems, such as online banking, smartphone PIN locks and government access control. Secondly, the research looked at the impact of stealing voices to imitate humans in conversation, such as morphing celebrity voices and posting snippets online, leaving fake voice messages, and creating false audio evidence in court.
The study results showed that the majority of advanced voice-verification algorithms were trumped by the researchers’ attacks, with only a 10-20% rate of rejection. On average it was also found that humans tasked with verifying voice samples only rejected about half of the morphed clips.
“Our research showed that voice conversion poses a serious threat, and our attacks can be successful for a majority of cases,” said Saxena. “Worryingly, the attacks against human-based speaker verification may become more effective in the future because voice conversion/synthesis quality will continue to improve, while it can be safely said that human ability will likely not.”
The UAB professor suggested that people posting audio files online should do so with caution. He argued that speaker verification systems should be developed to resist voice-imitation attack by researching ways to test live presence of the speaker.