Deepfake Audio Text to Speech
How faces can be digitally swapped in videos
In the past, this type of transformation required highly advanced video editing skills. Today, the technology is available to anyone who has enough motivation, time and processing power.
Deepfakes caught people’s attention and started to spread after a Reddit user known as “Deepfake” showed how the face of a famous person could be manipulated to give them a starring role in a pornographic video clip. GUI-driven applications like FakeApp have now made it possible for less tech-savvy users to produce these “deepfakes” too.
There are various different algorithms for creating deepfakes, but all of them employ artificial intelligence or, more specifically, deep learning, which is a subdiscipline of machine learning. This involves what are known as artificial neural networks. The GUI-based FakeApp software uses an autoencoder, which is a neural network specially designed to compress input data and then use this data to create the best reproduction possible of the source image.
It takes several steps to successfully swap a face. First, two autoencoders are trained to reproduce the two faces of both person A and B as precisely as possible.
After the two auto-encoders can adequately reproduce faces A and B, respectively, the next step is to actually swap the faces. This means providing the decoder for face A with a compressed representation of face B. Decoder A converts this representation into the face of person A along with the facial expressions of person B.
Speculation about potential uses for deepfakes is often negative – pornography, fake news or extortion, for example. But there’s a whole range of useful applications as well. Big opportunities are opening up in the film industry. The technology can make it possible to revive a dead actor, for instance. It’s also possible to dub films in different languages more realistically by matching the mouth movements of the actors with the actual spoken dialog. There is also a whole range of new possibilities in the field of dynamic films, which would allow a person to pick certain cast members, or viewers to cast themselves in certain roles. A similar concept could also be applied to the advertising industry. For example, a fashion brand might “lease” the face of a celebrity for a month and use it in a current ad campaign. Photo shoots would no longer require the celebrities themselves to be present, but instead simply a person of similar stature.
But now I would like to discuss a completely different topic, because it’s also important to note that deepfake technology can do more than just swap faces. Essentially, any objects can be swapped as long as they have enough similarity in their basic features. For example, horses can be transformed into zebras, or a certain artistic style could be used to transform a Picasso into a Van Gogh. The only limit to the possibilities is a person’s own creativity.
In early August it was reported that the first forensic tools for detecting deepfakes had been developed by the US Defense Advanced Research Project Agency (DARPA). These tools, which use artificial intelligence to distinguish between deepfake videos and real videos, are a double-edged sword, however. That’s because they can also be tricked by supplying the learning algorithm with feedback data on the authenticity of a video. If a video is correctly identified as fake, it will be improved or changed until it can no longer be identified by the tool as fake. As a result, only fakes able to circumvent detection by this specific tool will be created.
Because this is obviously only a partial solution, a few other ideas for identifying fakes should also be mentioned. One possibility would be to add watermarks to official videos. Or certain videos could be officially recognized only when they include a publicly available signature. The authenticity of the video could then be verified by comparing the signatures. If the signatures differ, it would be fair to say that the video in question has been altered.
Lastly, a word of caution: At some point we expect that the transformation of faces will be carried out dynamically during the actual production of the video. The first studies have already been carried out, such as the one presented in the scientific paper Deep Video Portraits.
There are already plenty of good examples of deepfakes, but there are few, if any, specific parameters for defining what constitutes a successful deepfake. The testing environment we have developed is designed to provide clear information on the current state of deepfake technology and the image material requirements for producing successful fakes. Based on a number of criteria, studies are being carried out to determine the limits of currently available technology. The following criteria have been defined to evaluate these requirements:
A more detailed description, as well as the results from the test cases mentioned here, will be published in a separate article.
Our experts will get in contact with you!
Our experts will get in contact with you!
Further articles available here