ARCHIVES
Original Article
Unveiling Deepfake Detection Using Vision Transformers: A Survey and Experimental Study
Pritesh Patil1
Govind Dayma2
Sujay Farkade3
Harshvardhan Pawar4
Swayam Pilare5
1Professor, Department of Information Technology, AISSMS Institute of Information Technology, Pune, Maharashtra, India. 2,3,4,5 Department of Information Technology, AISSMS Institute of Information Technology, Pune, Maharashtra, India.
Published Online: January-April 2026
Pages: 29-40
Cite this article
↗ https://www.doi.org/10.59256/indjcst.20260501005References
1. V.-N. Tran, S. Kwon, S.-H. Lee, H.-S. Le, and K.-R. Kwon, “Generalization of Forgery Detection With Meta Deepfake Detection Model”, IEEE Access, vol. 10, pp. 1–1, 2022.
2. Y. Xu, T. Jin, Y. Xu, X. Shi, S. Chen, W. Sun, Y. Xue, and H. Wu,"Transformer image recognition system based on deep learning," In 2019 6th International Conference on Systems and Informatics (ICSAI),pp. 1606-1610, 2019.
3. G. Hinton, A. Krizhevsky, I. Sutskever, and Y. Rachmad, “ImageNet Classification with Deep Convolutional Neural Networks”, Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
4. A. Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint, 2017.
5. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 1–9, 2015.
6. B. Valarmathi, N. S. Gupta, G. Prakash, R. H. Reddy, S. Saravanan, and P. Shanmugasundaram, "Hybrid Deep Learning Algorithms for Dog Breed Identification—A Comparative Analysis," IEEE Access, vol. 11, pp. 77228–77239, 2023.
7. D. Ulyanov, V. Lebedev, A. Vedaldi, and V. Lempitsky, "Texture Networks: Feed-forward Synthesis of Textures and Stylized Images," arXiv preprint arXiv:1603.03417, 2016, doi: 10.48550/arXiv.1603.03417.
8. G. Bao, L. C. Chen, W. Wen, and J. H. Ng, "CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training," Proceedings of the European Conference on Computer Vision (ECCV), 2018.
9. B. K. Durga, V. Rajesh, S. Jagannadham, P. S. Kumar, A. N. Z. Rashed, and K. Saikumar, "Deep Learning-Based Micro Facial Expression Recognition Using an Adaptive Tiefes FCNN Model," Traitement du signal, June 2023.
10. .I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Yere, "Generative Adversarial Networks," Advances in Neural Information Processing Systems, vol. 3, 2014, doi: 10.1145/3422622.
11. J. Hu, L. Shen, and G. Sun, "Squeeze-and-Excitation Networks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
12. J. Donahue, P. Krähenbühl, and T. Darrell, "Adversarial Feature Learning," arXiv preprint, 2016.
13. J. Huang, A. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, et al., "Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
14. J. Johnson, A. Alahi, and L. Fei-Fei, "Perceptual Losses for Real-Time Style Transfer and Super-Resolution," European Conference on Computer Vision (ECCV), 2016.
15. Jonathan T. Barron, "A Generalized Robust Loss Function," arXiv preprint, 2019.
16. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, "Attention Is All You Need," arXiv preprint, 2019.
17. Mashood Mohammad Mohsan, Muhammad Usman Akram, Ghulam Rasool, Norah Saleh Alghamdi, "Vision Transformer and Language Model Based Radiology Report Generation," IEEE Access, vol. 11, pp. 1814 - 1824, 2022.
18. Hoo-Chang Shin, Holger R. Roth, Mingchen Gao, Le Lu, Ziyue Xu, Isabella Nogues, Jianhua Yao, Daniel Mollura, "Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning," IEEE Transactions on Medical Imaging, 2016.
19. Laila Bashmal, Yakoub Bazi, and Mohamad Al Rahhal, " Deep Vision Transformers for Remote Sensing Scene Classification," 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021.
20. Norah M. Alnaim, Zaynab M. Almutairi, Manal S. Alsuwat, Hana H. Alalawi, Aljowhra Alshobaili, Fayadh S. Alenezi, "DFFMD: A Deepfake Face Mask Dataset for Infectious Disease Era With Deepfake Detection Algorithms," IEEE Access, vol. 11, pp. 16711 - 16722, 2023.
21. Tianyi Wang, Xin Liao, Kam Pui Chow, Xiaodong Lin, Yinglong Wang, " Deepfake Detection: A Comprehensive Study from the Reliability Perspective," arXiv preprint, 2022.
22. Liqiong Lu, Yaohua Yi, Faliang Huang, Kaili Wang, Qi Wang, " Integrating Local CNN and Global CNN for Script Identification in Natural Scene Images," IEEE Access, vol. 7, pp. 52669 - 52679, 2019.
23. D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, "MesoNet: A Compact Facial Video Forgery Detection Network," arXiv preprint, 2018.
24. X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie, "Stacked Generative Adversarial Networks," arXiv preprint, 2017.
25. Francesco Marra, Diego Gragnaniello, Davide Cozzolino, and Luisa Verdoliva, "Detection of GAN-Generated Fake Images over Social Networks," in Proceedings of the IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2018.
26. P. Baldi, P. Sadowski, and D. Whiteson, "Searching for Exotic Particles in High-Energy Physics with Deep Learning," arXiv preprint, 2014.
27. A. Doshi, A. Venkatadri, S. Kulkarni, V. Athavale, A. Jagarlapudi, S. Suratkar, and F. Kazi, "Realtime Deepfake Detection using Video Vision Transformer," in Proceedings of the IEEE Bombay Section Signature Conference (IBSSC), 2022.
28. Lior Wolf, Tal Hassner, Itay Maoz, "Face Recognition in Unconstrained Videos with Matched Background Similarity," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011.
29. Ruoqi Wei, Cesar Garcia, Ahmed El-Sayed, Viyaleta Peterson, Ausif Mahmood, "Variations in Variational Autoencoders - A Comparative Evaluation," IEEE Access, vol. 8, pp. 153651 - 153670, 2020.
30. L. Minh Dang, Syed Ibrahim Hassan, Suhyeon Im, Hyeonjoon Moon, "Face Image Manipulation Detection Based on a Convolutional Neural Network,"Sciencedirect, 2019.
31. T. F. Cootes and C. J. Taylor, "Statistical Models of Appearance for Computer Vision," Foundation and Trends in Computer Graphics and Vision, 2004.
32. Y. Bazi, L. Bashmal, M. M. Al Rahhal, R. Al Dayil, and N. Al Ajlan, "Vision Transformers for Remote Sensing Image Classification," Remote Sensing, 2021.
33. L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, "Face X-ray for More General Face Forgery Detection," arXiv preprint, 2019.
34. N. Waqas, S. I. Safie, K. A. Kadir, S. Khan, and M. H. K. Khel, "DEEPFAKE Image Synthesis for Data Augmentation," IEEE Access, vol. 10, pp. 80847-80857, July 25, 2022.
35. Z. Guo, G. Yang, J. Chen, and X. Sun, "Fake face detection via adaptive manipulation traces extraction network," Computer Vision and Image Understanding, vol. 204, p. 103170, Mar. 2021.
36. L. Trinh, M. Tsang, S. Rambhatla, and Y. Liu, "Interpretable and Trustworthy Deepfake Detection via Dynamic Prototypes," arXiv preprint, Jun. 2020.
37. D. Liu, Z. Dang, C. Peng, Y. Zheng, S. Li, N. Wang, and X. Gao, "FedForgery: Generalized Face Forgery Detection With Residual Federated Learning," IEEE Transactions on Information Forensics and Security, vol. 18, pp. 4272-4284, 2023.
38. M. Barni, K. Kallas, E. Nowroozi, and B. Tondi, "CNN Detection of GAN-Generated Face Images based on Cross-Band Co-occurrences Analysis," arXiv preprint, Jul. 2020.
39. Z. Cao, T. Simon, S. Wei, and Y. Sheikh, "Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
40. H. A. Khalil and S. A. Maged, "Deepfakes Creation and Detection Using Deep Learning," in 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), 2021.
41. A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, "FaceForensics++: Learning to Detect Manipulated Facial Images," arXiv preprint, 2019.
42. J. Chen, X. Wang, Z. He, and X. Peng, “A Comprehensive Exploration On Detecting Fake Images Generated By Stable Diffusion,” Proc. Int. Conf. Pattern Recognition And Artificial Intelligence, 2024, pp. 1–10, DOI: 10.1007/978-981-97-8487-5_32.
43. A. Yermakov, J. Cech, and J. Matas, "Unlocking the Hidden Potential of CLIP in Generalizable Deepfake Detection," arXiv preprint, 2025.
44. A. Xi and E. Chen, “Classifying Deepfakes Using Swin Transformers,” arXiv preprint, 2025.
45. S. Chen, S. Ding, R. Ji, H. Liu, K. Sun, X. Sun, and T. Yao, “DiffusionFake: Enhancing Generalization In Deepfake Detection Via Guided Stable Diffusion,” Proc. IEEE/CVF Conf. Computer Vision And Pattern Recognition Workshops, 2024, pp. 101474–101497, DOI: 10.52202/079017-3218.
2. Y. Xu, T. Jin, Y. Xu, X. Shi, S. Chen, W. Sun, Y. Xue, and H. Wu,"Transformer image recognition system based on deep learning," In 2019 6th International Conference on Systems and Informatics (ICSAI),pp. 1606-1610, 2019.
3. G. Hinton, A. Krizhevsky, I. Sutskever, and Y. Rachmad, “ImageNet Classification with Deep Convolutional Neural Networks”, Advances in Neural Information Processing Systems, pp. 1097–1105, 2012.
4. A. Howard et al., "MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications," arXiv preprint, 2017.
5. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Erhan, V. Vanhoucke, and A. Rabinovich, "Going deeper with convolutions," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, pp. 1–9, 2015.
6. B. Valarmathi, N. S. Gupta, G. Prakash, R. H. Reddy, S. Saravanan, and P. Shanmugasundaram, "Hybrid Deep Learning Algorithms for Dog Breed Identification—A Comparative Analysis," IEEE Access, vol. 11, pp. 77228–77239, 2023.
7. D. Ulyanov, V. Lebedev, A. Vedaldi, and V. Lempitsky, "Texture Networks: Feed-forward Synthesis of Textures and Stylized Images," arXiv preprint arXiv:1603.03417, 2016, doi: 10.48550/arXiv.1603.03417.
8. G. Bao, L. C. Chen, W. Wen, and J. H. Ng, "CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training," Proceedings of the European Conference on Computer Vision (ECCV), 2018.
9. B. K. Durga, V. Rajesh, S. Jagannadham, P. S. Kumar, A. N. Z. Rashed, and K. Saikumar, "Deep Learning-Based Micro Facial Expression Recognition Using an Adaptive Tiefes FCNN Model," Traitement du signal, June 2023.
10. .I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Yere, "Generative Adversarial Networks," Advances in Neural Information Processing Systems, vol. 3, 2014, doi: 10.1145/3422622.
11. J. Hu, L. Shen, and G. Sun, "Squeeze-and-Excitation Networks," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.
12. J. Donahue, P. Krähenbühl, and T. Darrell, "Adversarial Feature Learning," arXiv preprint, 2016.
13. J. Huang, A. Rathod, C. Sun, M. Zhu, A. Korattikara, A. Fathi, I. Fischer, Z. Wojna, Y. Song, S. Guadarrama, et al., "Speed/Accuracy Trade-Offs for Modern Convolutional Object Detectors," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
14. J. Johnson, A. Alahi, and L. Fei-Fei, "Perceptual Losses for Real-Time Style Transfer and Super-Resolution," European Conference on Computer Vision (ECCV), 2016.
15. Jonathan T. Barron, "A Generalized Robust Loss Function," arXiv preprint, 2019.
16. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin, "Attention Is All You Need," arXiv preprint, 2019.
17. Mashood Mohammad Mohsan, Muhammad Usman Akram, Ghulam Rasool, Norah Saleh Alghamdi, "Vision Transformer and Language Model Based Radiology Report Generation," IEEE Access, vol. 11, pp. 1814 - 1824, 2022.
18. Hoo-Chang Shin, Holger R. Roth, Mingchen Gao, Le Lu, Ziyue Xu, Isabella Nogues, Jianhua Yao, Daniel Mollura, "Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning," IEEE Transactions on Medical Imaging, 2016.
19. Laila Bashmal, Yakoub Bazi, and Mohamad Al Rahhal, " Deep Vision Transformers for Remote Sensing Scene Classification," 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021.
20. Norah M. Alnaim, Zaynab M. Almutairi, Manal S. Alsuwat, Hana H. Alalawi, Aljowhra Alshobaili, Fayadh S. Alenezi, "DFFMD: A Deepfake Face Mask Dataset for Infectious Disease Era With Deepfake Detection Algorithms," IEEE Access, vol. 11, pp. 16711 - 16722, 2023.
21. Tianyi Wang, Xin Liao, Kam Pui Chow, Xiaodong Lin, Yinglong Wang, " Deepfake Detection: A Comprehensive Study from the Reliability Perspective," arXiv preprint, 2022.
22. Liqiong Lu, Yaohua Yi, Faliang Huang, Kaili Wang, Qi Wang, " Integrating Local CNN and Global CNN for Script Identification in Natural Scene Images," IEEE Access, vol. 7, pp. 52669 - 52679, 2019.
23. D. Afchar, V. Nozick, J. Yamagishi, and I. Echizen, "MesoNet: A Compact Facial Video Forgery Detection Network," arXiv preprint, 2018.
24. X. Huang, Y. Li, O. Poursaeed, J. Hopcroft, and S. Belongie, "Stacked Generative Adversarial Networks," arXiv preprint, 2017.
25. Francesco Marra, Diego Gragnaniello, Davide Cozzolino, and Luisa Verdoliva, "Detection of GAN-Generated Fake Images over Social Networks," in Proceedings of the IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), 2018.
26. P. Baldi, P. Sadowski, and D. Whiteson, "Searching for Exotic Particles in High-Energy Physics with Deep Learning," arXiv preprint, 2014.
27. A. Doshi, A. Venkatadri, S. Kulkarni, V. Athavale, A. Jagarlapudi, S. Suratkar, and F. Kazi, "Realtime Deepfake Detection using Video Vision Transformer," in Proceedings of the IEEE Bombay Section Signature Conference (IBSSC), 2022.
28. Lior Wolf, Tal Hassner, Itay Maoz, "Face Recognition in Unconstrained Videos with Matched Background Similarity," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011.
29. Ruoqi Wei, Cesar Garcia, Ahmed El-Sayed, Viyaleta Peterson, Ausif Mahmood, "Variations in Variational Autoencoders - A Comparative Evaluation," IEEE Access, vol. 8, pp. 153651 - 153670, 2020.
30. L. Minh Dang, Syed Ibrahim Hassan, Suhyeon Im, Hyeonjoon Moon, "Face Image Manipulation Detection Based on a Convolutional Neural Network,"Sciencedirect, 2019.
31. T. F. Cootes and C. J. Taylor, "Statistical Models of Appearance for Computer Vision," Foundation and Trends in Computer Graphics and Vision, 2004.
32. Y. Bazi, L. Bashmal, M. M. Al Rahhal, R. Al Dayil, and N. Al Ajlan, "Vision Transformers for Remote Sensing Image Classification," Remote Sensing, 2021.
33. L. Li, J. Bao, T. Zhang, H. Yang, D. Chen, F. Wen, and B. Guo, "Face X-ray for More General Face Forgery Detection," arXiv preprint, 2019.
34. N. Waqas, S. I. Safie, K. A. Kadir, S. Khan, and M. H. K. Khel, "DEEPFAKE Image Synthesis for Data Augmentation," IEEE Access, vol. 10, pp. 80847-80857, July 25, 2022.
35. Z. Guo, G. Yang, J. Chen, and X. Sun, "Fake face detection via adaptive manipulation traces extraction network," Computer Vision and Image Understanding, vol. 204, p. 103170, Mar. 2021.
36. L. Trinh, M. Tsang, S. Rambhatla, and Y. Liu, "Interpretable and Trustworthy Deepfake Detection via Dynamic Prototypes," arXiv preprint, Jun. 2020.
37. D. Liu, Z. Dang, C. Peng, Y. Zheng, S. Li, N. Wang, and X. Gao, "FedForgery: Generalized Face Forgery Detection With Residual Federated Learning," IEEE Transactions on Information Forensics and Security, vol. 18, pp. 4272-4284, 2023.
38. M. Barni, K. Kallas, E. Nowroozi, and B. Tondi, "CNN Detection of GAN-Generated Face Images based on Cross-Band Co-occurrences Analysis," arXiv preprint, Jul. 2020.
39. Z. Cao, T. Simon, S. Wei, and Y. Sheikh, "Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017.
40. H. A. Khalil and S. A. Maged, "Deepfakes Creation and Detection Using Deep Learning," in 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), 2021.
41. A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, "FaceForensics++: Learning to Detect Manipulated Facial Images," arXiv preprint, 2019.
42. J. Chen, X. Wang, Z. He, and X. Peng, “A Comprehensive Exploration On Detecting Fake Images Generated By Stable Diffusion,” Proc. Int. Conf. Pattern Recognition And Artificial Intelligence, 2024, pp. 1–10, DOI: 10.1007/978-981-97-8487-5_32.
43. A. Yermakov, J. Cech, and J. Matas, "Unlocking the Hidden Potential of CLIP in Generalizable Deepfake Detection," arXiv preprint, 2025.
44. A. Xi and E. Chen, “Classifying Deepfakes Using Swin Transformers,” arXiv preprint, 2025.
45. S. Chen, S. Ding, R. Ji, H. Liu, K. Sun, X. Sun, and T. Yao, “DiffusionFake: Enhancing Generalization In Deepfake Detection Via Guided Stable Diffusion,” Proc. IEEE/CVF Conf. Computer Vision And Pattern Recognition Workshops, 2024, pp. 101474–101497, DOI: 10.52202/079017-3218.
Related Articles
2026
Artificial Intelligence in Learning and Teaching
2026
Admin Assist: An AI – Driven Configuration and Orchestration for Enterprise Application
2026
Enhancing Blood Group Identification using pigeon inspired optimization: An Innovative Approach
2026
Eco-Genius: Power Up Smart, Power Down Waste
2026
Crowd-Sourced Disaster Response and Rescue Assistant
2026