Tag: audio

Noise cancelling for cars is a no-brainer

We’re all familiar with noise cancelling headphones. I’ve got some that I use for transatlantic trips, and they’re great for minimising any repeating background noise.

Twenty years ago, when I was studying A-Level Physics, I was also building a new PC. I realised that, if I placed a microphone inside the computer case, and fed that into the audio input on the soundcard, I could use software to invert the sound wave and thus virtually eliminate fan noise. It worked a treat.

It doesn’t surprise me, therefore, to find that BOSE, best known for its headphones, are offering car manufacturers something similar with “road noise control”:

With accelerometers, multiple microphones, and algorithms, it’s much more complicated than what I rigged up in my bedroom as a teenager. But the principle remains the same.

Source: The Next Web

Audio Adversarial speech-to-text

I don’t usually go in for detailed technical papers on stuff that’s not directly relevant to what I’m working on, but I made an exception for this. Here’s the abstract:

We construct targeted audio adversarial examples on automatic speech recognition. Given any audio waveform, we can produce another that is over 99.9% similar, but transcribes as any phrase we choose (at a rate of up to 50 characters per second). We apply our white-box iterative optimization-based attack to Mozilla’s implementation DeepSpeech end-to-end, and show it has a 100% success rate. The feasibility of this attack introduce a new domain to study adversarial examples.

In other words, the researchers managed to fool a neural network devoted to speech recognition into transcribing a phrase different to that which was uttered.

So how does it work?

By starting with an arbitrary waveform instead of speech (such as music), we can embed speech into audio that should not be recognized as speech; and by choosing silence as the target, we can hide audio from a speech-to-text system

The authors state that merely changing words so that something different occurs is a standard adverserial attack. But a targeted adverserial attack is different:

Not only are we able to construct adversarial examples converting a person saying one phrase to that of them saying a different phrase, we are also able to begin with arbitrary non-speech audio sample and make that recognize as any target phrase.

This kind of stuff is possible due to open source projects, in particular Mozilla Common Voice. Great stuff.
 

Source: Arxiv