Yes, Deepfake Audio is Now a Thing

Well, it's official. I am throwing in the towel. My worldview has been shaken, stirred, crumbled, kneaded, blown apart, and reduced to its elemental atoms. Actually, at this point I'm down to the subatomic level. I think I just saw a quark go by.

One of my favorite childhood toys was an analog computer that taught Boolean Algebra. Starting with that toy, I began the heavy lifting of learning to think like an engineer. Then I thought like that for 40 years. Mine is a world of square roots and right angles. Add a pinch of OCD and you know how I butter my bread (precisely, with the appropriate amount of physics). In my mind, the world should be taken at face value. Except when it shouldn't. Which brings us to deepfake audio.

We are familiar with deepfake video. Somehow I accepted that, and considered those images as really clever parlor tricks. Now along comes the voices of deepfake audio. It has shaken me. Probably because audio is my passion, and in my mind, audio technology is a machine perfectly constructed to always strive toward fidelity. I just can't compute that now it's striving for, and achieving, deception.

For example, this Vocal Synthesis YouTube channel has a recording of six U.S. Presidents introducing the channel. Of course, it's not the Presidents. It's deepfake audio synthesis of them saying things they never said. Need more convincing? The site has lots of examples. Listen to this recitative duet by Donald Trump and Alexandria Ocasio-Cortez.

Now, these examples are purposefully outrageous. No one would believe them. But what if the program was used to produce audio that was just outrageous enough, designed to go viral? Going viral can be a fast and powerful thing. I refer you to the case of Justine Sacco. She tweeted to her 170 followers, then hopped on a plane for an 11-hour flight. Her tweet went viral, trending to #1. By the time her plane landed, unbeknownst to her, the social media world was tracking her flight, and she had been fired from her job. Deepfake audio would only accelerate the trending speed.

With a little luck, one could sow chaos, and maybe even bring down a government. But even if the intentions are lesser, deepfake audio has lots of nefarious potential. Consider musicians. You can copyright recordings of your voice, but I don't think you can copyright your voice. For example, Adele owns her name, likeness and recordings, but does she own her voice? If the synthesis software is trained with hours of copyrighted recordings of her voice, would the resulting product violate that copyright, or would it be a novel work?

Even if voice impersonation of public figures is illegal, crooks could use deepfake audio to construct songs apparently sung by Adele, marketing them as “private” or “lost” recordings. If those songs started trending, they could rake in millions of bucks before anyone was the wiser. When detected, they pull down that website, and create another trove of famous-name recordings.

Also, voice is now a potential data breach. My bank used a voice authentication feature; I could access my account via telephone by saying the phrase “my voice is my password.” It's easy to get enough voice samples of Mr. Trump's voice to train a synthesis program, and much more difficult to get samples of my voice, but I'm sure not using a voice log-on anymore. I guess we'll need to develop equally powerful software on the listening end that can differentiate fake voices from real voices. Honestly, I don't know what to believe any more. Is all content, sound and vision, now suspect?

When you start with Boolean Algebra, and develop it sufficiently, you're eventually able to write AI software that inputs a text and outputs the voice of Barack Obama. There's nothing illogical about that, and in fact the deception is a triumph of logic.

I'll need to think about that. But in my world of right angles, I'm not sure I'll be able to wrap my mind around it.

jeffhenning's picture

If I recall correctly, 60 Minutes ran a segment on this earlier in the year.

Using AI, they can continually improve the algorithm so that you would possibly only need 15-30 minutes of recordings to create something that the ear couldn't distinguish as fake. Using AI analysis, though, you could debunk fakes. They're working on that as well.

As to biometrics, those can easily be bypasses in the right circumstances (seen that in the movies a good bit).

Since all social media seems to have turned into a swirling cess pool of misinformation, stupidity, bigotry and outright lies, only the weak-minded use it for any serious information.

The only way to clean this mess up is to rescind the law that indemnifies them from any liability for the content published on their sites.

People seem to have forgotten or never known that lobbying groups in the US were started as a way for small, grass roots organizations to compete for access to politicians. You can see how that's worked.

Bosshog7_2000's picture

Deepfake audio/video is the beginning of the end of modern civilization. The past few years have already shown that a huge percentage of Americans lack any level of discernment to distinguish fact from fiction, hence the rise of sites like Infowars that prey on people's stupidity. Add in AI technology such as deepfake audio and the possible ramifications are alarming. This technology needs to be regulated and those who abuse it should be harshly punished.

elotes's picture

Really? In the aftermath of the racist murder of George Floyd, with literally millions of people taking to the streets, and with all the examples of weaponized Twitter mobs out for blood at your disposal, you went with the example of Justine Sacco. Justine Sacco, who blithely tweeted some vile and racist trash. Oh, but maybe you conveniently forgot about that little detail?

Thanks, Sound & Vision.

jeffhenning's picture

The Justine Sacco case doesn't really apply well for a single reason: she tweeted the racist crap that cost her her career. No one else was to blame. There was no subterfuge.

The irony about that whole situation, though, was that Sacco was the head of PR at a large media corporation. Her job was to make sure that the company's messaging would put it in the best possible light. Obviously, she managed to wildly surpass her level of competence.

All you have to do to prove your own stupidity is open your mouth.

elotes's picture

Well said. Jeff plainly articulates why this otherwise innocuous post is problematic. I do not ascribe any malice to the author, or to S&V editorial — only thoughtlessness. But as we have all plainly seen in the past few weeks (one hopes), thoughtlessness and silence are integral building blocks of racism. Even in the rarified air of high end AV publications.

And with that, I’m done. Everyone may retreat to their 8K OLED Dolby Atmos enabled safe spaces in peace.

jeffhenning's picture

Hey, baby, I'd love to be in that Atmos/8K paradise, but my 5.2 rig with the old, but best 46" set ever and 4 servo subs will get me through the lockdown!

Peace out.