Specific Criticism of CVSS4
Marc Ruef
How to Evade Voice Authentisierung
One speaks of biometric authentication when a biometric feature can be used by a user to authenticate himself. Instead of the tedious entry of a complex and regularly changing password, you can rely on what you always carry with you anyway: Fingerprint, iris, voice.
With Voice Authentication a fingerprint of an audio signal, in this case the voice, is created. This pairing must take place at the beginning. This can happen actively at the behest of the user by starting the configuration on their device. However, it can also be implemented passively by analyzing existing voice samples or by using them in the context of a conversation (at least the first few seconds) to generate the fingerprint.
This fingerprint allows the identification of individual characteristics. These include, for example, pitch, frequencies, modulation, intonation and pauses. When a user needs to authenticate, the new input is compared with the existing fingerprint. If a certain match can be identified, it is assumed that it is the same user and that the user is authorized to authenticate. In the classic movie Sneakers (1992), this is summarized with a striking phrase: My Voice is My Password.
An attack on voice authentication is primarily about getting the system to successfully authenticate, even though the appropriate requirements (legitimate user and correct voice) are not met. So the attacker tries to approximate the requirements of the fingerprint – again taking into account pitch, frequencies, modulation, intonation and pauses.
Unlike many other attack techniques, such an attack requires a high level of understanding of audio and acoustics in its basic features. An audio engineer is much more likely to understand what the authentication requirements are and how they can be addressed.
However, developments in the field of artificial intelligence (AI) in recent years have significantly simplified this attack capability. Synthetic voices can be used to generate statements that appear genuine. Online services such as Lyrebird have enabled the first attempts of this kind. With iOS 17.0, such voice synthesis was even introduced on iPhones with the name Your Voice. Voice synthesis is now available to everyone.
In the context of security tests, we usually take the opposite approach: We first try to identify which deviations a legitimate fingerprint may have until it is no longer recognized as legitimate. The initial pairing is performed and recorded simultaneously. This guarantees the maximum possible match that a speech sample can have: When it is played back again, it should reach a matching of 100%.
In a further step the alienation of the speech sample takes place. We distinguish between the following categories:
ID | Category | Description |
---|---|---|
1 | Compression | Increasing the compression eliminates subtleties. |
2 | Echo and reverb | Additional echo and reverb effects cause alienation. |
3 | Other effects (e.g. chorus) | Additional effects can force a very strong alienation. |
4 | Reverse | By reversing the original sample, many properties are retained (e.g. frequency), but certain ones are basically eliminated or replaced (intonation, pauses). |
5 | Sample Rate | By changing the sample rate you can influence the recording quality. |
6 | Tempo | Adjusting the tempo, usually taking into account the pitch/frequencies, can provide information about analysis techniques. |
7 | Secret recording (bug) | Secret recordings of conversations can be played back or used in the form of a soundboard. |
8 | Editing | By editing recordings together, statements can be artificially fabricated, which, however, appear alienated in terms of intonation. |
9 | Synthetic voice generation | The generation of synthetic voices can show how easy individual voice generation can be. |
A number of different samples are then used for authentication. The primary interest is whether this was successful or not. However, some manufacturers of corresponding authentication systems report a confidence level. This provides additional information about whether and to what extent a deviation is given by an alienation. This allows conclusions to be drawn about attributes and their weighting.
There are various measures that can be taken to make attacks on voice authentication more difficult. On the one hand, there is the non-technical aspect that a dialog must be dynamic. This means, for example, that the questions are chosen randomly or that the authentication has to take place in a natural conversation.
An attacker can only adapt to this with difficulty. Prefabricated speech samples can then only be used with great difficulty, if at all. An attack of this kind is therefore already recognized as such at the interpersonal level. Fully automated voice authentication is not capable of functioning sustainably at this level.
In addition, matching can be set more strictly. At the technical level, it is thus enforced that a high degree of matching is required.
This increases security, but only on a relative level. Because an absolute match can never be enforced. It is important to remember that this approach introduces new problems. For example, if someone is tired or ill (e.g., hoarse), an unusual device is used for communication (e.g., Skype instead of a landline phone), or there is poor voice quality (e.g., reception or network problems). The comfort attributed to biometric mechanisms may then evaporate very quickly.
Voice authentication sounds convenient – and secure. However, a closer look reveals that neither is really the case. Technical or health issues can limit the benefits for legitimate users. And improvements in voice replicability and generation have made attacks much easier in recent years.
As always with biometric authentication, they are only partially suitable for providing a high level of security. They can provide an additional factor to make attacks more difficult. But relying on them alone is neither sensible nor timely. Using them in environments with a high level of security is accordingly not recommended.
Biometrics should primarily be an identification feature. It is not really suitable for authentication. If only because, in the event of a “loss” (e.g., fingerprints becoming known), they cannot be changed without further ado. Customizing passwords is definitely more practical.
Our experts will get in contact with you!
Marc Ruef
Marc Ruef
Marc Ruef
Marc Ruef
Our experts will get in contact with you!