MIT Repository

Title
A Machine Learning-Based Approach To Analyze And Detect DeepFakes
Publication Date
2024-11
Author(s)
Ramanaharan, Ramcharan
Subject
Machine learning
Deefake
deepfake detection
3D ConvNets (I3D)
Vision Transformers
I3D
ViViT
CNN-based architectures
Hybrid model performance
deepfake detection algor
Abstract
In recent years, advancements in deep learning generative models have enabled the creation of hyper-realistic fake videos, images, and audio, complicating digital media trustworthiness and public confidence. These sophisticated models, known as deepfakes, are powerful and potentially dangerous tools that can be used for misinformation, identity theft, and other malicious purposes. In response, there is a growing demand for stable and transferable de- tection models that can detect deepfakes across different spectrums. An extensive systematic literature review was conducted to analyse the strengths and weaknesses of various detection models, with the aim of developing a hybrid model best suited for this study. In this work, we propose a novel hybrid detection approach that utilises the Inflated 3D ConvNets (I3D) and Vision Transformers (ViViT) to enhance the robustness and to improve the adaptability of deepfake detection systems. The I3D model excels in spatiotemporal analysis and hence is perfect for detecting frame-level anomalies such as temporal inconsistencies that are common to deepfake manipulations. It can capture spatiotemporal long-range dependencies between different cutouts, and therefore, can even recognize common distortions across video frames (both spatial and temporal) which cannot be detected by modern models like transformers. However, modern transformers excel at capturing long-range dependencies in data, which complements I3D’s strengths. This hybrid model combines both architectures as each architecture complements the other, allowing for a detection system to capture both spatial and temporal irregularities which greatly increases the detection accuracy. Most existing deepfake detection models lose predicting power when exposed to new or previously unseen deepfakes. The hybrid I3D-ViViT model aims to address this problem by providing greater flexibility and adaptability, enabling it to generalise detection capabilities effectively to subtle changes in deepfake manipulations. Our experiments demonstrated that the hybrid I3D-ViViT model improved detection accuracy to a substantial 97.88% on a video dataset, surpassing the performance of standalone models such as CNN-based architectures, which typically achieve between 85% and 92% accuracy, as observed in our systematic literature review. This comparison highlights how the integration of I3D’s spatiotemporal analysis with ViViT’s ability to capture long-range dependencies allows the hybrid model to generalize better and detect both spatial and temporal anomalies. The performance of the hybrid model on the videos dataset indicates its potential to capture spatial and temporal changes, thereby generalising more effectively across a broader range of manipulations. The data was split into training, and testing; 80% for training, 20% for testing. This I3D-ViViT model’s ability to detect sophisticated manipulations in multiple frames nested seamlessly into the video makes it particularly appropriate for real-world applications, where deepfake detection must be both accurate and generalisable across various digital media formats.
Link
Citation
Ramanaharan, R. (2024). A Machine Learning-Based Approach To Analyze And Detect DeepFakes. [Unpublished Masters Thesis]. Melbourne Institute of Technology.

Files:

NameSizeformatDescriptionLink
Ramcharan_A_Machine_Learning_Based_Approach_to_Analyse_and_Detect_DeepFakes_Revised.pdf 3008.397 KB application/pdf View document