A Defensive Computing Checklist    by Michael Horowitz
HOME | About | Domain Names | VPNs | Rules of the Road | DC Presentation | ChangeLog | Stats |


Artificial Intelligence allows bad guys to learn someone's voice and vocal patterns and then manipulate it to scam people. Thomas Brewster has said "Once a technology confined to the realm of fictional capers like Mission: Impossible, voice cloning is now widely available." The important thing to always be aware of is: if you get a phone call from someone you know who seems to be in an emergency situation and is pleading for money, it might be a scam with a faked voice.

These scams are still too new to have an official name yet. I have seen it referred to with all these terms:

  1. Voice fraud
  2. Voice phishing or the shortened version: vishing
  3. Voice cloning
  4. Voice swapping
  5. Artificial voice
  6. AI voice cloning and A.I.-generated audio
  7. Synthetic Audio and Deepfake Audio and Audio Deepfakes
  8. Deep Voice, and the generic, DeepFake
  9. Family-emergency schemes

FYI: What To Do if You Were Scammed from the Federal Trade Commission July 2022.


April 27, 2023: It's Time to Protect Yourself From AI Voice Scams by Caroline Mimbs Nyce in The Atlantic. Sub-head: Anyone can create a convincing clone of a stranger’s voice. What now? This is a worthwhile read, it takes a measured look at where things currently stand. Quoting: "[voice scams] have existed for some time ... but they’ve gotten better, cheaper, and more accessible in the past several months alongside a generative-AI boom. Now anyone with a dollar, a few minutes, and an Internet connection can synthesize a stranger’s voice."

A five minute video from CNN: CNN's Donie O'Sullivan tests AI voice-mimicking software March 2023.

April 28, 2023: I Cloned Myself With AI. She Fooled My Bank and My Family. by Joanna Stern in the Wall Street Journal. Stern replaced herself with an AI voice and video to see how humanlike the tech can be. The results were eerie. She tested Synthesia, a tool that creates artificially intelligent avatars from recorded video and audio (aka deepfakes). Also tested a voice clone generated by ElevenLabs. The ElevenLabs voice fooled her Chase credit card’s voice biometric system. One of the systems also fooler her relatives.

February 23, 2023: How I Broke Into a Bank Account With an AI-Generated Voice by Joseph Cox for Vice. Banks in the U.S. and Europe tout voice ID as a secure way to log into your account. Cox proved it is possible to trick such systems with free or cheap AI-generated voices that are widely available. Cox used a free voice creation service from ElevenLabs, an AI-voice company. TD Bank, Chase and Wells Fargo did not respond to a request for comment. Likewise, ElevenLabs did not respond to multiple requests for comment.

January 30, 2023: AI-Generated Voice Firm Clamps Down After 4chan Makes Celebrity Voices for Abuse by Joseph Cox for Vice. 4chan members have used AI software to generate voices that sound like Joe Rogan, Ben Shapiro, and Emma Watson to spew racist material. For example, a fake Emma Watson reads a section of Mein Kampf. ElevenLabs says it can generate a clone of someone’s voice from a clean sample recording, over one minute long. The high quality of the fake voices, and the ease which which people create them, highlight the looming risk of deepfake audio clips.

January 9, 2023. It is bad enough that, when we get a phone call, the callerid can be faked. Now, faking the voice is getting better and easier. Microsoft's new AI can simulate anyone’s voice with 3 seconds of audio by Benj Edwards for Ars Technica. It claimed that this new system can preserve the speaker's emotional tone and acoustic environment. Quoting: "Microsoft researchers announced a new text-to-speech AI model called VALL-E that can closely simulate a person's voice when given a three-second audio sample ... VALL-E can synthesize audio of that person saying anything - and do it in a way that attempts to preserve the speaker's emotional tone. Unlike other text-to-speech methods ... VALL-E generates discrete audio codec codes from text and acoustic prompts. It basically analyzes how a person sounds, breaks that information into discrete components (called 'tokens') thanks to EnCodec, and uses training data to match what it 'knows' about how that voice would sound if it spoke other phrases outside of the three-second sample.".



The biggest defense is to be aware that this sort of thing exists. Calls to older adults by younger relatives asking for money for an emergency is a known scam. Likewise, calls from your boss asking you to transfer money to a new bank account are suspicious.


 This page: 5 views per day (over 557 days)   Total views: 2,906   Created: January 6, 2023
This Page
Last Updated

July 9, 2024
Site Page

Site Page

Website by
Michael Horowitz
Copyright 2019 - 2024