Can AI “Deepfake” Software Impersonate a Voice Well Enough to Hack an Account?
Fraudsters and imposters are always trying to find new ways to overcome the latest security measures. For credit unions and banks that are implementing voice authentication as a better way to protect members from account takeovers, this raises an important question: Can voice synthesis software “Deepfake” a person’s voice well enough to fool voice verification software?
How Does Voice Authentication Work?
Voice authentication works by matching a person’s voiceprint to a previously captured sample to score how closely it matches. Active voice authentication involves matching the voiceprint for a predetermined phrase such as “My voice is my password” that a caller repeats during an interaction with an application such as an IVR.
On the other hand, passive voice authentication works by comparing voiceprints or AudioPrints based on unique characteristics of a person’s voice in natural conversation. These latter, more sophisticated voice verification systems leverage algorithms for conversational voice matching rather than relying on a direct match to a specific predefined phrase.
Can Synthetic Voices Fool Call Center Security Systems?
In a recent article, a banking customer revealed how he had successfully fooled his bank’s security protocol using an AI-generated clone of his voice. He used readily available software to accomplish this task, and was surprised at the relative ease with which the security system accepted the fake voice.
In another recent article, the author was able to defeat the voice ID system at her bank using a clone of her voice. Naturally, financial institutions (FIs) relying on voice verification are concerned. Is it really that simple for imposters to get into bank and credit union accounts using these tools?
Understanding the current threat level requires a deeper dive into how these AI technologies work and interact with current voice authentication technology. It is impressive that voice synthesis tools have evolved to such a level where creating synthetic voices is within reach of a typical consumer. This does reveal a potential vulnerability in voice authentication systems.
However, we firmly believe that contact center agents and IVR systems assisted by voice authentication technologies are better positioned to combat this type of AI mimicry compared to those that attempt to fight this battle without the assistance of voice ID. Here is why:
- The quality of synthetic voices generated by tools available today vary from one individual to the next. Some replicas sound fairly realistic, but most synthetic replicas sound robotic. The latter can be easily caught by a technology like Illuma Shield™, and also detected by the agent answering the phone.
- As described earlier in this article, voice ID systems are score based. In our lab tests, real persons’ voices scored much higher than AI generated clone voices when compared against the real person’s AudioPrint enrollment. Careful calibration ensures AI voices have an extremely low probability of getting past the Voice ID system and real voices have a much higher probability of passing through.
- Synthesizing a new sentence requires several seconds to process. The resulting lag time makes it very difficult to use these tools to engage in a live dynamic conversation with a contact center agent. However, it is easier to defeat an active voice recognition system like ‘My voice is my password’ where the authentication phrase is known and repeatable.
- The quality of the synthetic voice depends on the duration and quality of the original voice captured and fed into these tools. High quality voice recordings of a typical credit union member or community bank customer are difficult to obtain and require additional investment on the part of the fraudster. This significantly reduces ROI for a fraudster planning to use these tools in an account takeover (ATO) attempt. For voice authentication algorithms that include characteristics of the caller’s device as well as their voice, there is an added layer of protection. It is much easier for the fraudster to go after an FI that is not using a voice ID system.
- A greater threat over the near term is continuing to use traditional knowledge based security Q&A alone without voice ID assistance. Compared to creating replicas of every member’s voice, it is much simpler to create a handful of synthetic voices that match target demographics. Using synthetic voice tools to fake a voice that matches the member’s stored demographics (e.g.,a 40 year old male) in combination with access to the right answers to security questions would make it hard for agents to tell if they are being fooled. But these attacks will not be able to defeat a passive voice ID system that is looking for a match to an enrolled member’s voice.
How Do You Protect Your Financial Institution from Deepfake Attacks?
- Work with your voice ID solution provider to ensure your voice ID system is well calibrated to ensure strong rejection of synthetic voices.
- Use a layered or multi-factor security approach to optimize risk and convenience.
At Illuma, we recommend our clients use Illuma Shield™ as the primary authentication mechanism for low-medium risk transactions, and layering additional security protocols for high risk and high value transactions. Our clients tell us the vast majority of their calls are related to balance inquiries or internal transfers, where using a single authentication factor like Illuma Shield™ offers sufficient protection. However, if the caller is trying to make a large external transfer, the agent layers on additional factors like security questions, etc. Where exactly to draw the line is a risk management decision that our clients make.
What We Are Doing to Combat Deepfakes in Voice Authentication
At Illuma, we have explored the capabilities of voice synthesis and ensured that our software is well calibrated to block attacks. As always, we continue investing in our core voice recognition engine and new risk rating tools to stay ahead of attackers.
Go here to learn more about how our state-of-the-art voice verification system works.