Voice Authentication

Risks of the Biometric Approach

by Marc Ruef

on October 12, 2023

time to read: 10 minutes

Keypoints

How to Evade Voice Authentisierung

Biometric authentication uses biometric features instead of passwords and PINs
They promise a high level of security, but this can rarely be practicably maintained
Voice authentication can be verified through systematic testing
In the past, with a lot of effort and today, thanks to AI, with relatively few clicks, such systems can be successfully attacked
The use of voice authentication in environments with high security requirements is accordingly not recommended

Biometric mechanisms enjoy high popularity. In addition to the flair of science fiction, they promise convenience and a higher degree of security. The fact that this is usually not the case and that there are even far-reaching disadvantages is often neglected. This article discusses why voice authentication is not a good idea.

One speaks of biometric authentication when a biometric feature can be used by a user to authenticate himself. Instead of the tedious entry of a complex and regularly changing password, you can rely on what you always carry with you anyway: Fingerprint, iris, voice.

How Voice Authentication Works

With Voice Authentication a fingerprint of an audio signal, in this case the voice, is created. This pairing must take place at the beginning. This can happen actively at the behest of the user by starting the configuration on their device. However, it can also be implemented passively by analyzing existing voice samples or by using them in the context of a conversation (at least the first few seconds) to generate the fingerprint.

This fingerprint allows the identification of individual characteristics. These include, for example, pitch, frequencies, modulation, intonation and pauses. When a user needs to authenticate, the new input is compared with the existing fingerprint. If a certain match can be identified, it is assumed that it is the same user and that the user is authorized to authenticate. In the classic movie Sneakers (1992), this is summarized with a striking phrase: My Voice is My Password.

Attacks on voice authentication

An attack on voice authentication is primarily about getting the system to successfully authenticate, even though the appropriate requirements (legitimate user and correct voice) are not met. So the attacker tries to approximate the requirements of the fingerprint – again taking into account pitch, frequencies, modulation, intonation and pauses.

Unlike many other attack techniques, such an attack requires a high level of understanding of audio and acoustics in its basic features. An audio engineer is much more likely to understand what the authentication requirements are and how they can be addressed.

However, developments in the field of artificial intelligence (AI) in recent years have significantly simplified this attack capability. Synthetic voices can be used to generate statements that appear genuine. Online services such as Lyrebird have enabled the first attempts of this kind. With iOS 17.0, such voice synthesis was even introduced on iPhones with the name Your Voice. Voice synthesis is now available to everyone.

Implementation of a security test

In the context of security tests, we usually take the opposite approach: We first try to identify which deviations a legitimate fingerprint may have until it is no longer recognized as legitimate. The initial pairing is performed and recorded simultaneously. This guarantees the maximum possible match that a speech sample can have: When it is played back again, it should reach a matching of 100%.

In a further step the alienation of the speech sample takes place. We distinguish between the following categories:

ID	Category	Description
1	Compression	Increasing the compression eliminates subtleties.
2	Echo and reverb	Additional echo and reverb effects cause alienation.
3	Other effects (e.g. chorus)	Additional effects can force a very strong alienation.
4	Reverse	By reversing the original sample, many properties are retained (e.g. frequency), but certain ones are basically eliminated or replaced (intonation, pauses).
5	Sample Rate	By changing the sample rate you can influence the recording quality.
6	Tempo	Adjusting the tempo, usually taking into account the pitch/frequencies, can provide information about analysis techniques.
7	Secret recording (bug)	Secret recordings of conversations can be played back or used in the form of a soundboard.
8	Editing	By editing recordings together, statements can be artificially fabricated, which, however, appear alienated in terms of intonation.
9	Synthetic voice generation	The generation of synthetic voices can show how easy individual voice generation can be.

A number of different samples are then used for authentication. The primary interest is whether this was successful or not. However, some manufacturers of corresponding authentication systems report a confidence level. This provides additional information about whether and to what extent a deviation is given by an alienation. This allows conclusions to be drawn about attributes and their weighting.

Measures to increase security

There are various measures that can be taken to make attacks on voice authentication more difficult. On the one hand, there is the non-technical aspect that a dialog must be dynamic. This means, for example, that the questions are chosen randomly or that the authentication has to take place in a natural conversation.

An attacker can only adapt to this with difficulty. Prefabricated speech samples can then only be used with great difficulty, if at all. An attack of this kind is therefore already recognized as such at the interpersonal level. Fully automated voice authentication is not capable of functioning sustainably at this level.

In addition, matching can be set more strictly. At the technical level, it is thus enforced that a high degree of matching is required.

This increases security, but only on a relative level. Because an absolute match can never be enforced. It is important to remember that this approach introduces new problems. For example, if someone is tired or ill (e.g., hoarse), an unusual device is used for communication (e.g., Skype instead of a landline phone), or there is poor voice quality (e.g., reception or network problems). The comfort attributed to biometric mechanisms may then evaporate very quickly.

Summary

Voice authentication sounds convenient – and secure. However, a closer look reveals that neither is really the case. Technical or health issues can limit the benefits for legitimate users. And improvements in voice replicability and generation have made attacks much easier in recent years.

As always with biometric authentication, they are only partially suitable for providing a high level of security. They can provide an additional factor to make attacks more difficult. But relying on them alone is neither sensible nor timely. Using them in environments with a high level of security is accordingly not recommended.

Biometrics should primarily be an identification feature. It is not really suitable for authentication. If only because, in the event of a “loss” (e.g., fingerprints becoming known), they cannot be changed without further ado. Customizing passwords is definitely more practical.

About the Author

Marc Ruef has been working in information security since the late 1990s. He is well-known for his many publications and books. The last one called The Art of Penetration Testing is discussing security testing in detail. He is a lecturer at several faculties, like ETH, HWZ, HSLU and IKF. (ORCID 0000-0002-1328-6357)

You want to evaluate or develop an AI?

Our experts will get in contact with you!

Specific Criticism of CVSS4

Marc Ruef

scip Cybersecurity Forecast

Marc Ruef

Bug Bounty

Marc Ruef

Breach and Leak

Marc Ruef

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here

Voice Authentication

Risks of the Biometric Approach

Keypoints

How Voice Authentication Works

Attacks on voice authentication

Implementation of a security test

Measures to increase security

Summary

About the Author

Links

Tags

You want to evaluate or develop an AI?

Specific Criticism of CVSS4

scip Cybersecurity Forecast

Bug Bounty

Breach and Leak

You want more?

You need support in such a project?

You want more?