2025:Song Deepfake Detection - Revision history

Yzyouzhang: /* Baseline */

2025-06-30T04:43:03Z

‎Baseline

Yzyouzhang: /* Dataset */

2025-06-30T04:41:08Z

‎Dataset

Yzyouzhang: /* Submission */

2025-06-24T04:06:13Z

‎Submission

Yzyouzhang at 04:02, 24 June 2025

2025-06-24T04:02:48Z

Yzyouzhang: /* Task Description */

2025-06-24T03:42:06Z

‎Task Description

Yzyouzhang: /* Task Description */

2025-06-24T03:14:39Z

‎Task Description

Yzyouzhang: /* Submission */

2025-06-06T02:19:39Z

‎Submission

Yzyouzhang: /* Download */

2025-05-29T02:39:00Z

‎Download

Junyan: Junyan moved page 2025:Singing Voice Deepfake Detection to 2025:Song Deepfake Detection

2025-05-29T02:36:11Z

Junyan moved page 2025:Singing Voice Deepfake Detection to 2025:Song Deepfake Detection

Junyan: Created page with "= Task Description = The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs fr..."

2025-05-29T02:33:09Z

Created page with "= Task Description = The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs fr..."

New page

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts. For more information about our prior work, please visit: https://main.singfake.org/

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

Contact: [mailto:you.zhang@rochester.edu Neil Zhang] & [mailto:yixiao.zhang@qmul.ac.uk Yixiao Zhang]

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification. See "Download" section for details.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: October 20, AOE'''

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

Please send your submission to [mailto:you.zhang@rochester.edu Neil Zhang].

@@ Line 21: / Line 21: @@
 ;WildSVDD Description
 :The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification. See "Download" section for details.
-;;Description of Audio Files
+;SONICS Description
-:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.
+:The SONICS dataset, introduced in the ICLR 2025 paper, is a large-scale collection designed for end-to-end synthetic song detection. It consists of over 97,000 songs, amounting to a total of 4,751 hours of audio. This dataset includes 49,074 synthetic songs generated by AI platforms like Suno and Udio, and 48,090 real songs sourced from YouTube. The synthetic songs cover a wide range of genres, music styles, and song lengths (32 to 240 seconds), while the real songs come from 9,096 different artists.
-;;Description of Split
+For this year's song deepfake detection challenge, we will be using both the test sets of WildSVDD and SONICS, and ranking the pooled EER. Participants will need to submit the score files that indicate the scoring for each sample.
 = Baseline =

@@ Line 61: / Line 61: @@
 * '''Submission Deadline: Aug 25, 2025, AOE'''
-Leader board release will be shortly after that.
+Leaderboard release will be shortly after that.
 ;Results submission

@@ Line 3: / Line 3: @@
 The Song Deepfake Detection Challenge 2025 builds upon last year’s Singing Voice Deepfake Detection Challenge by expanding the task to a broader context: detecting AI-generated content in full songs. Unlike the previous focus solely on vocal deepfakes, this year’s challenge also considers AI-generated background music. We invite participants to develop systems that analyze both musical accompaniment and singing voice components to detect whether a song contains any AI-generated elements. Submissions that incorporate joint modeling of vocals and music or explore their interactions are especially encouraged.
-In 2024, we introduced the WildSVDD track, which focused on detecting AI-generated singing voices in real-world scenarios. Participants were tasked with identifying whether a given song clip contained a genuine human singer or an AI-generated one, often in the presence of complex background music. The 2025 challenge extends this setting to include potential deepfakes in both the vocals and instrumental parts, increasing the difficulty and relevance of the task. For more information about our previous work, please visit: https://main.singfake.org/
+In 2024, we introduced the WildSVDD track, which focused on detecting AI-generated singing voices in real-world scenarios. Participants were tasked with identifying whether a given song clip contained a genuine human singer or an AI-generated one, often in the presence of complex background music. The 2025 challenge extends this setting to include potential deepfakes in both the vocals and instrumental parts, increasing the difficulty and relevance of the task. For more information about our previous work, please visit: https://main.singfake.org/ or check out the previous year's results: https://www.music-ir.org/mirex/wiki/2024:MIREX2024_Results.
 ;Background
 :The rapid advancement of generative AI has enabled the creation of highly realistic synthetic songs. Today’s models can not only replicate a singer’s vocal characteristics with minimal training data but also produce convincing musical accompaniments. While this technology opens exciting creative possibilities, it also raises significant ethical, legal, and commercial concerns. Deepfake songs that mimic well-known artists and musical styles pose a growing threat to intellectual property rights and the integrity of music distribution platforms.
-:Building on the success of our 2024 SingFake [1] and SVDD [2] challenges—featuring the CtrSVDD and WildSVDD tracks—we aim to further elevate the visibility of this problem within the broader music research community. The CtrSVDD track [3], focusing on controlled vocal synthesis detection, drew strong engagement from the speech research field. With this year’s expanded challenge, we hope to bring more attention to the complex problem of detecting deepfakes in complete musical compositions and to foster interdisciplinary collaboration between the audio forensics and music information retrieval communities.
+:Building on the success of our 2024 SingFake [1] and SVDD [2] challenges—featuring the CtrSVDD and WildSVDD tracks—we aim to further elevate the visibility of this problem within the broader music research community. The CtrSVDD track [3], focusing on controlled vocal synthesis detection, drew strong engagement from the speech research field. The SONICS dataset recently proposed [4] further enriched this research direction. With this year’s expanded challenge, we hope to bring more attention to the complex problem of detecting deepfakes in complete musical compositions and to foster interdisciplinary collaboration between the audio forensics and music information retrieval communities.
 :[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184
-:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132
+:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://ieeexplore.ieee.org/document/10832284
 :[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242
 Contact: [mailto:you.zhang@rochester.edu Neil Zhang]

@@ Line 33: / Line 33: @@
 ;Model Architecture
-:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].
+:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3]. An winning solution from SVDD Challenge 2024 could also be referenced [4].
 :[1] SingFake: https://github.com/yongyizang/SingFake
@@ Line 40: / Line 40: @@
 :[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132
 = Metrics =

@@ Line 59: / Line 59: @@
 = Submission =
-* '''Submission Deadline: October 20, AOE'''
+* '''Submission Deadline: Aug 25, 2025, AOE'''
 ;Results submission

@@ Line 49: / Line 49: @@
 * Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
 * Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)
 Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

← Older revision	Revision as of 02:36, 29 May 2025
(No difference)