

ILLUSION: Unveiling Truth with a Comprehensive Multi-Modal, Multi-Lingual Deepfake Dataset
The proliferation of deepfakes and AI-generated content has led to a surge in media forgeries and misinformation, necessitating robust detection systems. However, current datasets lack diversity across modalities, languages, and real-world scenarios. To address this gap, we present ILLUSION (Integration of Life-Like Unique Synthetic Identities and Objects from Neural Networks), a large-scale, multi-modal deepfake dataset comprising 1.3 million samples spanning audio-visual forgeries, 26 languages, challenging noisy environments, and various manipulation protocols. Generated using 28 state-of-the-art generative techniques, ILLUSION includes faceswaps, audio spoofing, synchronized audio-video manipulations, and synthetic media while ensuring a balanced representation of gender and skin tone for unbiased evaluation. Using Jaccard Index and UpSet plot analysis, we demonstrate ILLUSION’s distinctiveness and minimal overlap with existing datasets, emphasizing its novel generative coverage. We benchmarked image, audio, video, and multi-modal detection models, revealing key challenges such as performance degradation in multilingual and multi-modal contexts, vulnerability to real-world distortions, and limited generalization to zero-day attacks. By bridging synthetic and real-world complexities, ILLUSION provides a challenging yet essential platform for advancing deepfake detection research.
License Agreement + Citation
The code and data for this research can be accessed at coming soon.
For more info, please refer to the following paper:
Kartik Thakral*, Rishabh Ranjan*, Akanksha SIngh, Akshat Jain, Mayank Vatsa, Richa Singh, "ILLUSION: Unveiling Truth with a Comprehensive Multi-Modal, Multi-Lingual Deepfake Dataset." International Conference on Learning Representations (ICLR) (Accepted)
@inproceedings{DBLP:conf/iclr/ThakralRSJV025, author = {Kartik Thakral and Rishabh Ranjan and Akanksha Singh and Akshat Jain and Mayank Vatsa and Richa Singh}, title = {{ILLUSION:} Unveiling Truth with a Comprehensive Multi-Modal, Multi-Lingual Deepfake Dataset}, booktitle = {The Thirteenth International Conference on Learning Representations, {ICLR} 2025, Singapore, April 24-28, 2025}, publisher = {OpenReview.net}, year = {2025}, url = {https://openreview.net/forum?id=qnlG3zPQUy}, timestamp = {Thu, 15 May 2025 17:19:05 +0200}, biburl = {https://dblp.org/rec/conf/iclr/ThakralRSJV025.bib}, bibsource = {dblp computer science bibliography, https://dblp.org} }