As a lazy person, I am huge fan of voice control system such as Siri, Google Now, Alexa, etc. Due to the recent outbreaks in recognition using deep neural networks, it’s definitely the future. But what if it could be hacked?
Voice controlled systems are always in listening mode. In case of iPhone Siri you will need to activate the system by saying “Hey Siri”. Later you can give it commands like “Send message to my brother”. iPhone Siri has user dependent activation, which means my voice won’t activate your phone. It can be only activated through your voice. So your iPhone constantly listens to surrounding audio and checks if the patterns match. Once the device is activated, it can take commands from a different user.
In order for the hacker to sneakily hack into an iPhone. He needs to
1.) Mimic user’s “Hey Siri” voice pattern
2.) Execute commands without user’s awareness
Mimicking user’s “Hey Siri” voice pattern
The simplest way to do it is by recording when the user says “Hey Siri”. Provided that’s not available we do it by concatenative synthesis. Many words pronounce the same as ‘hey’, for example we can combine ‘He’ and ‘a’ from ‘Cake’ to make hey. In the same way ‘Siri’ can be obtained by slicing ‘Ci’ from ‘City’ and ‘re’ from ‘Care’. Only ‘Hey Siri’ has to be mimicked, once Siri is activated it can take commands irrespective of user’s voice.
I have also found out LyreBird has been able to generate user’s voice from text given enough training data of user’s voice. They are already able to mimic the voice of Donald Trump. I gave it a try by making my digital voice in LyreBird and asked it to say “Hey Siri”. The voice was able to activate Siri on my device. You can try it out too.
Execute commands without user’s awareness
We humans can only listen to sounds at certain frequencies. The researchers used ultrasounds or frequencies about 20 kHz which are not audible to us.
Here’s the demonstration , in the first half you can see iPhone making a call with audible command. On the second half using inaudible command.
Now that the hack is proven, we will see the possibilities.
It can be used to
1.) Make the device open a malicious link
2.) Spy on the user’s surrounding by initiating a phone call.
3.) Impersonate the user by sending fake messages (A user could be framed for things he didn’t do)
4.) Turn user’s phone to airplane mode, so that user stays disconnected.
5.) Since these commands activate the screen and respond making sounds. Brightness and voice can be reduced to avoid detection.
The researchers have successfully tested the dolphin attack in more than 16 voice controlled systems (Including the voice navigation system in Audi automobile) . The researchers have proposed hardware defence mechanisms to counter it. As of now given that the hacker has access to a user’s voice recordings. His/Her device is vulnerable.
If you are anxious about the vulnerability. Disable the “Hey Siri” on IOS or “OK Google” on Android. Hopefully Google and Apple would release a patch for this soon.
This is a derived work from the research paper DolphinAttack: Inaudible Voice Commands. The credit goes to Guoming Zhang, Chen Yan, Xiaoyu Ji Tianchen Zhang, Taimin Zhang and Wenyuan Xu from Zhejiang University.
I have a feeling I read this article somewhere else. Deja vu!
Yup, I am the same writer. (http://medium.com/@heyfebin)
Hi! I am a robot. I just upvoted you! I found similar content that readers might be interested in:
https://hackernoon.com/inaudible-voice-commands-can-hack-siri-google-now-alexa-3ebc654e0ad6