2025:Song Deepfake Detection
Task Description
The Song Deepfake Detection Challenge 2025 builds upon last year’s Singing Voice Deepfake Detection Challenge by expanding the task to a broader context: detecting AI-generated content in full songs. Unlike the previous focus solely on vocal deepfakes, this year’s challenge also considers AI-generated background music. We invite participants to develop systems that analyze both musical accompaniment and singing voice components to detect whether a song contains any AI-generated elements. Submissions that incorporate joint modeling of vocals and music or explore their interactions are especially encouraged.
In 2024, we introduced the WildSVDD track, which focused on detecting AI-generated singing voices in real-world scenarios. Participants were tasked with identifying whether a given song clip contained a genuine human singer or an AI-generated one, often in the presence of complex background music. The 2025 challenge extends this setting to include potential deepfakes in both the vocals and instrumental parts, increasing the difficulty and relevance of the task. For more information about our previous work, please visit: https://main.singfake.org/ or check out the previous year's results: https://www.music-ir.org/mirex/wiki/2024:MIREX2024_Results.
- Background
- The rapid advancement of generative AI has enabled the creation of highly realistic synthetic songs. Today’s models can not only replicate a singer’s vocal characteristics with minimal training data but also produce convincing musical accompaniments. While this technology opens exciting creative possibilities, it also raises significant ethical, legal, and commercial concerns. Deepfake songs that mimic well-known artists and musical styles pose a growing threat to intellectual property rights and the integrity of music distribution platforms.
- Building on the success of our 2024 SingFake [1] and SVDD [2] challenges—featuring the CtrSVDD and WildSVDD tracks—we aim to further elevate the visibility of this problem within the broader music research community. The CtrSVDD track [3], focusing on controlled vocal synthesis detection, drew strong engagement from the speech research field. The SONICS dataset recently proposed [4] further enriched this research direction. With this year’s expanded challenge, we hope to bring more attention to the complex problem of detecting deepfakes in complete musical compositions and to foster interdisciplinary collaboration between the audio forensics and music information retrieval communities.
- [1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184
- [2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://ieeexplore.ieee.org/document/10832284
- [3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242
- [4] Rahman, Md Awsafur, Zab er Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, and Shaikh Anowarul Fattah. "SONICS: Synthetic Or Not--Identifying Counterfeit Songs." In Proc. International Conference on Learning Representations (ICLR), 2025. https://openreview.net/forum?id=PY7KSh29Z8
Contact: Neil Zhang
Dataset
- WildSVDD Description
- The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification. See "Download" section for details.
- The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.
- The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.
- SONICS Description
- The SONICS dataset, introduced in the ICLR 2025 paper, is a large-scale collection designed for end-to-end synthetic song detection. It consists of over 97,000 songs, amounting to a total of 4,751 hours of audio. This dataset includes 49,074 synthetic songs generated by AI platforms like Suno and Udio, and 48,090 real songs sourced from YouTube. The synthetic songs cover a wide range of genres, music styles, and song lengths (32 to 240 seconds), while the real songs come from 9,096 different artists.
- The SONICS dataset is divided into three parts: training, testing, and validation. The training set contains 77,409 songs. Out of these, 66,709 are real songs, and 10,700 are synthetic songs, which are further divided into categories like Full Fake, Mostly Fake, and Half Fake. The test set includes 9,269 songs. It has 3,396 real songs and 5,873 synthetic songs, also divided into the same categories as the training set. The validation set consists of 4,486 songs, with 1,566 real songs and 2,920 synthetic songs.
For this year's song deepfake detection challenge, we will be using both the test sets of WildSVDD and SONICS, and ranking the pooled EER. Participants will need to submit the score files that indicate the scoring for each sample.
Baseline
- Model Architecture
- Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].
- [1] SingFake: https://github.com/yongyizang/SingFake
- [2] SingGraph: https://github.com/xjchenGit/SingGraph
- [3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132
Metrics
The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.
Download
The dataset and necessary resources can be accessed via the following links:
- Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
- Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
- Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)
- SONICS dataset download: [Huggingface SONICS](https://huggingface.co/datasets/awsaf49/sonics)
Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to svddchallenge@gmail.com.
Rules
Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.
Submission
- Submission Deadline: Aug 25, 2025, AOE
Leaderboard release will be shortly after that.
- Results submission
- Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.
- System description submission
- Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.
- Research paper submission
- Participants are encouraged to submit a research paper to the MIREX track at ISMIR 2025.
- Workshop presentation
- We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.
Please send your submission to Neil Zhang or contact for any questions you have for the challenge.