Deepfakes Analysis

Amount of Images, Lighting and Angles

by Andrea Hauser

time to read: 13 minutes

Keypoints

This is how you create the perfect Deepfake

It needs roughly 500 images of a source to create a solid deepfake
The lighting is very important for the selection of source material
Furthermore, the angles of the source material must be extended to create solid results

As we have announced in an earlier article, we are going to determine the requirements of image material to create a successful deepfake. The source material for these tests consists of Youtube videos of the size of 720p showing George W. Bush and George Clooney. We are using the face of Clooney (source) and put it on the face of Bush (destination).

The calculation of the source material is based on a model trained with Donald Trump and Nicolas Cage. The tests are divided into these three categories:

Amount of images
Lighting
Angles of source material

These categories present multiple test cases which consists of two videos as results. The first video shows the result generated by the default values and the second one shows a manually tweaked version to ensure the best possible result.

Amount of Images

The goal of this category is to determine the minimum amount of images required to create a successful deepfake. The test cases are divided into 500, 2.000, and 5.000 images. We would also like to determine if it is necessary to have a large amount of images of the target video or if there have to be multiple target videos. The basic target video of George W. Bush is a Youtube video with the title Bush’s Best Speech.

George Clooney is represented by a mix of four different videos.

Source 500 images ⇒ Destination 7 seconds (168 images)

After 24 hours of computing the results are the ones shown below. You can clearly see that manual tweaking of parameters increases the quality of the result.

Source 2.000 images ⇒ Destination 7 seconds (168 images)

After another 24 hours of computing the result consists of the two videos shown below. Once again the manual tweaking of the merge parameters produces much better results.

Source 5.000 images ⇒ Destination 7 seconds (168 images)

After another 24 hours the following results can be shown. Once again the default video is of lesser quality.

Source 5.000 images ⇒ Destination 5.000 images

After training on 5.000 to 5.000 images for a day, the resulting model was used in a one-minute-long conversion to the 7 second video of the other test cases. As usual the default values generate less convincing results. However, it is not possible to determine any differences in quality between the 168 images version of Bush and the 5.000 images version of Bush. There is also no difference between 500 and 5.000 images of Clooney. This leads us to the conclusion that just 500 images are required to create a perfect deepfake.

Lighting

We shall now determine how lighting and shadows influence the quality of deepfakes. We have chosen material by George Clooney where his face is partially in the shadow or the lighting of the source and destination videos is not the same. The source video of George W. Bush remains the same Youtube video as used before.

Good Lighting of both Faces

Both images have solid lighting in their respective videos. The face of George Clooney is a bit reddish. This difference in color can also be seen in the results. The video with the default values shows the same flickering as before. But in this case it was not possible to eliminate this effect with manual tweaking of the parameters, as normalizing the colors applies some of Clooney’s reddish color to Bush’s pale skin tone, resulting in an unconvincing image. However, this analysis shows what kind of facial areas were touched by the deepfake algorithm.

Source video partially shadowed ⇒ Destination video well lit

In the source material for George Clooney, one side of his face is in shadow and the other side is lit a little too brightly. This test case clearly shows that the lighting of the source video plays an important role in the selection of the source material. No good result can be achieved with distinctly different illuminations.

Influence of the Background

In this case, the usual source for George Bush was used, while for Clooney an interview with a black background was drawn upon. The color of the background does not lead to any significant issues when using default parameters, although the typical flicker is present. In the manually tweaked merge, however, the black background has a clear effect on the result: The produced face is too dark.

Same Background and Lighting in both Videos

This produced a generally good result, although the resulting face looked to be blurry. More training would likely reduce this blur effect.

Angle of the Source Material

With this case, we investigated how many side views are necessary in the source material to convincingly fake a target video with side views. The same video from Amount of Images was used as the base video of George W. Bush.

Source Material of both Faces does not contain Side Views

While the result is generally good, this video does also clearly show areas where deepfake technology needs to improve. Focusing on the mouth, it becomes clear that the algorithm cannot handle teeth particularly well. They are either not shown at all or as a single white area which even overlaps the lips in most cases.

Source Video is only Front View ⇒ Target Video also contains Side View

For this category, a pre-trained model of Nicolas Cage was utilized, which led to an effect where the resulting face became a mixture of George W. Bush, George Clooney and Nicolas Cage in the side views.

Source Video 30% Side View ⇒ Target Video contains Side Views

Here, too, Nicolas Cage’s facial traits show in some side views. We can therefore conclude that more than 30% of the source recording need to be side views to produce convincing side views.

Conclusion

The amount of faces plays less of a role than expected. Much more important is the similarity of the material in terms of illumination and angles of the faces, as high quality deepfakes can only be produced with similar material.

About the Author

Andrea Hauser graduated with a Bachelor of Science FHO in information technology at the University of Applied Sciences Rapperswil. She is focusing her offensive work on web application security testing and the realization of social engineering campaigns. Her research focus is creating and analyzing deepfakes. (ORCID 0000-0002-5161-8658)

You want experience what damage AI and Fake News can do?

Our experts will get in contact with you!

Ways of attacking Generative AI

Andrea Hauser

XML Injection

Andrea Hauser

Burp Macros

Andrea Hauser

WebSocket Fuzzing

Andrea Hauser

You want more?

Further articles available here

You need support in such a project?

Our experts will get in contact with you!

You want more?

Further articles available here

Deepfakes Analysis

Amount of Images, Lighting and Angles

Keypoints

Amount of Images

Source 500 images ⇒ Destination 7 seconds (168 images)

Source 2.000 images ⇒ Destination 7 seconds (168 images)

Source 5.000 images ⇒ Destination 7 seconds (168 images)

Source 5.000 images ⇒ Destination 5.000 images

Lighting

Good Lighting of both Faces

Source video partially shadowed ⇒ Destination video well lit

Influence of the Background

Same Background and Lighting in both Videos

Angle of the Source Material

Source Material of both Faces does not contain Side Views

Source Video is only Front View ⇒ Target Video also contains Side View

Source Video 30% Side View ⇒ Target Video contains Side Views

Conclusion

About the Author

Tags

You want experience what damage AI and Fake News can do?

Ways of attacking Generative AI

XML Injection

Burp Macros

WebSocket Fuzzing

You want more?

You need support in such a project?

You want more?