Skip to main content

LUP Student Papers

LUND UNIVERSITY LIBRARIES

Generative Adversarial Networks in Lip-Synchronized Deepfakes for Personalized Video Messages

Liljegren, Johan LU and Nordqvist, Pontus LU (2021) In Master's Theses in Mathematical Sciences FMAM05 20211
Mathematics (Faculty of Engineering)
Abstract
The recent progress of deep learning has enabled more powerful frameworks to create good-quality deepfakes. Deepfakes, which are mostly known for malicious purposes, have great potential to be useful in areas such as the movie industry, education, and personalized messaging. This thesis focus on lip-synchronization, which is a part of a broader pipeline to develop personalized video messages, using deepfakes. For this application, the deep learning framework Generative Adversarial Networks (GAN), adapted to a given audio and video input, was used. The objectives were to implement a structure to perform lip-synchronization, investigate what variations of GANs excel at this task, and also how different datasets impact the results.

Three... (More)
The recent progress of deep learning has enabled more powerful frameworks to create good-quality deepfakes. Deepfakes, which are mostly known for malicious purposes, have great potential to be useful in areas such as the movie industry, education, and personalized messaging. This thesis focus on lip-synchronization, which is a part of a broader pipeline to develop personalized video messages, using deepfakes. For this application, the deep learning framework Generative Adversarial Networks (GAN), adapted to a given audio and video input, was used. The objectives were to implement a structure to perform lip-synchronization, investigate what variations of GANs excel at this task, and also how different datasets impact the results.

Three different models were investigated: firstly, the GAN architecture LipGAN was reimplemented in Pytorch, secondly, a GAN variation, WGAN-GP, was adapted to the LipGAN architecture, and thirdly, a novel approach that takes inspiration from both models, L1WGAN-GP, was developed and implemented. All models were trained using the dataset GRID and benchmarked by the metrics PSNR, SSIM, and FID-score. Lastly, the influence of the training dataset was tested by comparing our implementation of LipGAN with another implementation trained on another dataset, LRS2.

WGAN-GP did not converge and resulted in suspected mode collapse. For the two other models, we showed that the LipGAN implementation performed best in terms of PSNR and SSIM, whereas L1WGAN-GP performed better than LipGAN according to the FID-score. Yet, L1WGAN-GP produced samples that were polluted by artifacts. Our models trained on the GRID dataset showed bad generalization performance compared to the same model trained on LRS2. Additionally, the models trained on less amount of data were outperformed by models that were trained on the full dataset.

Finally, our results suggest that LipGAN was the best performing network, and with it we successfully managed to produce satisfying lip-synchronization. (Less)
Popular Abstract (Swedish)
Med hjälp utav de senaste rönen inom maskininlärning utvecklar vi en produkt, som automatiserat kan producera nästa generation av videomeddelanden. Detta med hjälp av den så kallade “deepfake”-teknologin.
Please use this url to cite or link to this publication:
author
Liljegren, Johan LU and Nordqvist, Pontus LU
supervisor
organization
course
FMAM05 20211
year
type
H2 - Master's Degree (Two Years)
subject
keywords
Generative Adversarial Networks, GAN, Lip-Synchronization, Deepfake, Deep Learning, Autoencoder, WGAN, WGAN-GP, L1WGAN-GP, Skip-Connections, FID-Score
publication/series
Master's Theses in Mathematical Sciences
report number
LUTFMA-3450-2021
ISSN
1404-6342
other publication id
2021:E33
language
English
id
9060411
date added to LUP
2021-07-14 14:41:16
date last changed
2021-07-14 14:41:16
@misc{9060411,
  abstract     = {{The recent progress of deep learning has enabled more powerful frameworks to create good-quality deepfakes. Deepfakes, which are mostly known for malicious purposes, have great potential to be useful in areas such as the movie industry, education, and personalized messaging. This thesis focus on lip-synchronization, which is a part of a broader pipeline to develop personalized video messages, using deepfakes. For this application, the deep learning framework Generative Adversarial Networks (GAN), adapted to a given audio and video input, was used. The objectives were to implement a structure to perform lip-synchronization, investigate what variations of GANs excel at this task, and also how different datasets impact the results. 

Three different models were investigated: firstly, the GAN architecture LipGAN was reimplemented in Pytorch, secondly, a GAN variation, WGAN-GP, was adapted to the LipGAN architecture, and thirdly, a novel approach that takes inspiration from both models, L1WGAN-GP, was developed and implemented. All models were trained using the dataset GRID and benchmarked by the metrics PSNR, SSIM, and FID-score. Lastly, the influence of the training dataset was tested by comparing our implementation of LipGAN with another implementation trained on another dataset, LRS2.

WGAN-GP did not converge and resulted in suspected mode collapse. For the two other models, we showed that the LipGAN implementation performed best in terms of PSNR and SSIM, whereas L1WGAN-GP performed better than LipGAN according to the FID-score. Yet, L1WGAN-GP produced samples that were polluted by artifacts. Our models trained on the GRID dataset showed bad generalization performance compared to the same model trained on LRS2. Additionally, the models trained on less amount of data were outperformed by models that were trained on the full dataset.

Finally, our results suggest that LipGAN was the best performing network, and with it we successfully managed to produce satisfying lip-synchronization.}},
  author       = {{Liljegren, Johan and Nordqvist, Pontus}},
  issn         = {{1404-6342}},
  language     = {{eng}},
  note         = {{Student Paper}},
  series       = {{Master's Theses in Mathematical Sciences}},
  title        = {{Generative Adversarial Networks in Lip-Synchronized Deepfakes for Personalized Video Messages}},
  year         = {{2021}},
}