MIREX Wiki - User contributions [en]

2025:Song Deepfake Detection Results

2025-09-11T04:29:48Z

Yzyouzhang: /* Results */

=Results=

{| class="wikitable" style="vertical-align:bottom;"
|- style="font-weight:bold;"
! System
! Methods Used
! Trained on the training set of
! style="text-align:right;" | EER on WildSVDD test_A
! style="text-align:right;" | EER on WildSVDD test_B
! style="text-align:right;" | EER on SONICS test
|-
| Baseline1
| RawNet - Mixtures
| WildSVDD
| style="text-align:right;" | 9.44
| style="text-align:right;" | 30.85
| style="text-align:right;" | 24.36
|-
| Baseline2
| SpecTTTra - Mixtures
| SONICS
| style="text-align:right;" | 48.92
| style="text-align:right;" | 43.96
| style="text-align:right;" | 1.75
|- style="background-color:#FF0;"
| Baseline3
| Wav2vec2 + AASIST - Mixtures
| WildSVDD + SONICS
| style="text-align:right;" | 6.14
| style="text-align:right;" | 20.82
| style="text-align:right;" | 2.05

|}

2025:Song Deepfake Detection Results

2025-09-11T04:28:50Z

Yzyouzhang: /* Results */

=Results=

{| class="wikitable" style="vertical-align:bottom;"
|- style="font-weight:bold;"
! System
! Methods Used
! Trained on the training set of
! style="text-align:right;" | EER on WildSVDD test_A
! style="text-align:right;" | EER on WildSVDD test_B
! style="text-align:right;" | EER on SONICS test
|- style="background-color:#FF0;"
| Baseline1
| RawNet - Mixtures
| WildSVDD
| style="text-align:right;" | 9.44
| style="text-align:right;" | 30.85
| style="text-align:right;" | 24.36
|-
| Baseline2
| SpecTTTra - Mixtures
| SONICS
| style="text-align:right;" | 48.92
| style="text-align:right;" | 43.96
| style="text-align:right;" | 1.75
|-
| Baseline3
| Wav2vec2 + AASIST - Mixtures
| WildSVDD + SONICS
| style="text-align:right;" | 6.14
| style="text-align:right;" | 20.82
| style="text-align:right;" | 2.05

|}

2025:Song Deepfake Detection Results

2025-09-11T04:28:22Z

Yzyouzhang: /* Results */

=Results=

{| class="wikitable" style="vertical-align:bottom;"
|- style="font-weight:bold;"
! System
! Methods Used
! Trained on the training set of
! style="text-align:right;" | EER on WildSVDD test_A
! style="text-align:right;" | EER on WildSVDD test_B
! style="text-align:right;" | EER on SONICS test
| Baseline1
| RawNet - Mixtures
| WildSVDD
| style="text-align:right;" | 9.44
| style="text-align:right;" | 30.85
| style="text-align:right;" | 24.36
|-
| Baseline2
| SpecTTTra - Mixtures
| SONICS
| style="text-align:right;" | 48.92
| style="text-align:right;" | 43.96
| style="text-align:right;" | 1.75
|- style="background-color:#FF0;"
|-
| Baseline3
| Wav2vec2 + AASIST - Mixtures
| WildSVDD + SONICS
| style="text-align:right;" | 6.14
| style="text-align:right;" | 20.82
| style="text-align:right;" | 2.05

|}

2025:Song Deepfake Detection Results

2025-09-11T04:27:56Z

Yzyouzhang: /* Results */

=Results=

{| class="wikitable" style="vertical-align:bottom;"
|- style="font-weight:bold;"
! System
! Methods Used
! Trained on the training set of
! style="text-align:right;" | EER on WildSVDD test_A
! style="text-align:right;" | EER on WildSVDD test_B
! style="text-align:right;" | EER on SONICS test
| Baseline1
| RawNet - Mixtures
| WildSVDD
| style="text-align:right;" | 9.44
| style="text-align:right;" | 30.85
| style="text-align:right;" | 24.36
|-
| Baseline2
| SpecTTTra - Mixtures
| SONICS
| style="text-align:right;" | 48.92
| style="text-align:right;" | 43.96
| style="text-align:right;" | 1.75
|-
|- style="background-color:#FF0;"
| Baseline3
| Wav2vec2 + AASIST - Mixtures
| WildSVDD + SONICS
| style="text-align:right;" | 6.14
| style="text-align:right;" | 20.82
| style="text-align:right;" | 2.05

|}

2025:Song Deepfake Detection Results

2025-09-11T04:27:26Z

Yzyouzhang: /* Baselines */

=Results=

{| class="wikitable" style="vertical-align:bottom;"
|- style="font-weight:bold;"
! System
! Methods Used
! Trained on the training set of
! style="text-align:right;" | EER on WildSVDD test_A
! style="text-align:right;" | EER on WildSVDD test_B
! style="text-align:right;" | EER on SONICS test
|- style="background-color:#FF0;"
| Baseline1
| RawNet - Mixtures
| WildSVDD
| style="text-align:right;" | 9.44
| style="text-align:right;" | 30.85
| style="text-align:right;" | 24.36
|-
| Baseline2
| SpecTTTra - Mixtures
| SONICS
| style="text-align:right;" | 48.92
| style="text-align:right;" | 43.96
| style="text-align:right;" | 1.75
|-
| Baseline3
| Wav2vec2 + AASIST - Mixtures
| WildSVDD + SONICS
| style="text-align:right;" | 6.14
| style="text-align:right;" | 20.82
| style="text-align:right;" | 2.05

|}

2025:Song Deepfake Detection Results

2025-09-11T04:26:47Z

Yzyouzhang: /* Baselines */

=Baselines=

{| class="wikitable" style="vertical-align:bottom;"
|- style="font-weight:bold;"
! System
! Methods Used
! Trained on the training set of
! style="text-align:right;" | EER on WildSVDD test_A
! style="text-align:right;" | EER on WildSVDD test_B
! style="text-align:right;" | EER on SONICS test
|- style="background-color:#FF0;"
| Baseline1
| RawNet - Mixtures
| WildSVDD
| style="text-align:right;" | 9.44
| style="text-align:right;" | 30.85
| style="text-align:right;" | 24.36
|-
| Baseline2
| SpecTTTra - Mixtures
| SONICS
| style="text-align:right;" | 48.92
| style="text-align:right;" | 43.96
| style="text-align:right;" | 1.75
|-
| Baseline3
| Wav2vec2 + AASIST - Mixtures
| WildSVDD + SONICS
| style="text-align:right;" | 6.14
| style="text-align:right;" | 20.82
| style="text-align:right;" | 2.05

|}

2025:Song Deepfake Detection Results

2025-09-11T03:06:11Z

Yzyouzhang: /* Baselines */

=Baselines=

{| class="wikitable" style="vertical-align:bottom;"
|- style="font-weight:bold;"
! System
! Methods Used
! Trained on the training set of
! style="text-align:right;" | EER on WildSVDD test_A
! style="text-align:right;" | EER on WildSVDD test_B
! style="text-align:right;" | EER on SONICS test
|- style="background-color:#FF0;"
| Baseline1
| RawNet - Mixtures
| WildSVDD
| style="text-align:right;" | 9.44
| style="text-align:right;" | 30.85
| style="text-align:right;" | 24.36
|-
| Baseline2
| Wav2vec2 + AASIST - Mixtures
| WildSVDD + SONICS
| style="text-align:right;" | 6.14
| style="text-align:right;" | 20.82
| style="text-align:right;" | 2.05
|-
| Baseline3
| SpecTTa - Mixtures
| SONICS
| style="text-align:right;" |
| style="text-align:right;" |
| style="text-align:right;" |

|}

2025:Song Deepfake Detection Results

2025-09-11T02:05:36Z

Yzyouzhang: /* Baselines */

=Baselines=

{| class="wikitable" style="vertical-align:bottom;"
|- style="font-weight:bold;"
! System
! Methods Used
! Trained on the training set of
! style="text-align:right;" | EER on WildSVDD test_A
! style="text-align:right;" | EER on WildSVDD test_B
! style="text-align:right;" | EER on SONICS test
|- style="background-color:#FF0;"
| Baseline1
| RawNet - Mixtures
| WildSVDD
| style="text-align:right;" | 9.44
| style="text-align:right;" | 30.85
| style="text-align:right;" | 24.36
|-
| Baseline2
| Wav2vec2 + AASIST - Mixtures
| WildSVDD + SONICS
| style="text-align:right;" | 6.14
| style="text-align:right;" | 20.82
| style="text-align:right;" | 2.05

|}

2025:Song Deepfake Detection Results

2025-09-11T02:04:49Z

Yzyouzhang: /* Baselines */

=Baselines=

{| class="wikitable" style="vertical-align:bottom;"
|- style="font-weight:bold;"
! System
! Methods Used
! Trained on the training set of
! style="text-align:right;" | EER on WildSVDD test_A
! style="text-align:right;" | EER on WildSVDD test_B
! style="text-align:right;" | EER on SONICS test
|- style="background-color:#FF0;"
| Baseline1
| RawNet - Mixtures
| WildSVDD
| style="text-align:right;" | 9.44
| style="text-align:right;" | 30.85
| style="text-align:right;" | 24.36
| Baseline2
| Wav2vec2 + AASIST - Mixtures
| WildSVDD + SONICS
| style="text-align:right;" | 6.14
| style="text-align:right;" | 20.82
| style="text-align:right;" | 2.05

|}

2025:Song Deepfake Detection Results

2025-09-11T01:47:31Z

Yzyouzhang: Created page with "=Baselines= {| class="wikitable" style="vertical-align:bottom;" |- style="font-weight:bold;" ! System ! Methods Used ! style="text-align:right;" | EER on WildSVDD test_A ! st..."

=Baselines=

{| class="wikitable" style="vertical-align:bottom;"
|- style="font-weight:bold;"
! System
! Methods Used
! style="text-align:right;" | EER on WildSVDD test_A
! style="text-align:right;" | EER on WildSVDD test_B
! style="text-align:right;" | EER on SONICS test
|- style="background-color:#FF0;"
| UNIBS1
| RawNet - Mixtures
| style="text-align:right;" | 9.44
| style="text-align:right;" | 30.85
| style="text-align:right;" | 30.85

|}

2025:Song Deepfake Detection

2025-06-30T04:43:03Z

Yzyouzhang: /* Baseline */

= Task Description =

The Song Deepfake Detection Challenge 2025 builds upon last year’s Singing Voice Deepfake Detection Challenge by expanding the task to a broader context: detecting AI-generated content in full songs. Unlike the previous focus solely on vocal deepfakes, this year’s challenge also considers AI-generated background music. We invite participants to develop systems that analyze both musical accompaniment and singing voice components to detect whether a song contains any AI-generated elements. Submissions that incorporate joint modeling of vocals and music or explore their interactions are especially encouraged.

In 2024, we introduced the WildSVDD track, which focused on detecting AI-generated singing voices in real-world scenarios. Participants were tasked with identifying whether a given song clip contained a genuine human singer or an AI-generated one, often in the presence of complex background music. The 2025 challenge extends this setting to include potential deepfakes in both the vocals and instrumental parts, increasing the difficulty and relevance of the task. For more information about our previous work, please visit: https://main.singfake.org/ or check out the previous year's results: https://www.music-ir.org/mirex/wiki/2024:MIREX2024_Results.

;Background
:The rapid advancement of generative AI has enabled the creation of highly realistic synthetic songs. Today’s models can not only replicate a singer’s vocal characteristics with minimal training data but also produce convincing musical accompaniments. While this technology opens exciting creative possibilities, it also raises significant ethical, legal, and commercial concerns. Deepfake songs that mimic well-known artists and musical styles pose a growing threat to intellectual property rights and the integrity of music distribution platforms.

:Building on the success of our 2024 SingFake [1] and SVDD [2] challenges—featuring the CtrSVDD and WildSVDD tracks—we aim to further elevate the visibility of this problem within the broader music research community. The CtrSVDD track [3], focusing on controlled vocal synthesis detection, drew strong engagement from the speech research field. The SONICS dataset recently proposed [4] further enriched this research direction. With this year’s expanded challenge, we hope to bring more attention to the complex problem of detecting deepfakes in complete musical compositions and to foster interdisciplinary collaboration between the audio forensics and music information retrieval communities.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184
:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://ieeexplore.ieee.org/document/10832284
:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242
:[4] Rahman, Md Awsafur, Zab er Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, and Shaikh Anowarul Fattah. "SONICS: Synthetic Or Not--Identifying Counterfeit Songs." In Proc. International Conference on Learning Representations (ICLR), 2025. https://openreview.net/forum?id=PY7KSh29Z8

Contact: [mailto:you.zhang@rochester.edu Neil Zhang]

= Dataset =

;WildSVDD Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification. See "Download" section for details.
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

;SONICS Description
:The SONICS dataset, introduced in the ICLR 2025 paper, is a large-scale collection designed for end-to-end synthetic song detection. It consists of over 97,000 songs, amounting to a total of 4,751 hours of audio. This dataset includes 49,074 synthetic songs generated by AI platforms like Suno and Udio, and 48,090 real songs sourced from YouTube. The synthetic songs cover a wide range of genres, music styles, and song lengths (32 to 240 seconds), while the real songs come from 9,096 different artists.
:The SONICS dataset is divided into three parts: training, testing, and validation. The training set contains 77,409 songs. Out of these, 66,709 are real songs, and 10,700 are synthetic songs, which are further divided into categories like Full Fake, Mostly Fake, and Half Fake. The test set includes 9,269 songs. It has 3,396 real songs and 5,873 synthetic songs, also divided into the same categories as the training set. The validation set consists of 4,486 songs, with 1,566 real songs and 2,920 synthetic songs.

For this year's song deepfake detection challenge, we will be using both the test sets of WildSVDD and SONICS, and ranking the pooled EER. Participants will need to submit the score files that indicate the scoring for each sample.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3]. An winning solution from SVDD Challenge 2024 could also be referenced [4].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

:[4] XWSB for SVDD 2024: https://github.com/QiShanZhang/XWSB_for_SVDD2024

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)
* SONICS dataset download: [Huggingface SONICS](https://huggingface.co/datasets/awsaf49/sonics)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: Aug 25, 2025, AOE'''
Leaderboard release will be shortly after that.

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2025.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

Please send your submission to [mailto:you.zhang@rochester.edu Neil Zhang] or contact for any questions you have for the challenge.

2025:Song Deepfake Detection

2025-06-30T04:41:08Z

Yzyouzhang: /* Dataset */

= Task Description =

The Song Deepfake Detection Challenge 2025 builds upon last year’s Singing Voice Deepfake Detection Challenge by expanding the task to a broader context: detecting AI-generated content in full songs. Unlike the previous focus solely on vocal deepfakes, this year’s challenge also considers AI-generated background music. We invite participants to develop systems that analyze both musical accompaniment and singing voice components to detect whether a song contains any AI-generated elements. Submissions that incorporate joint modeling of vocals and music or explore their interactions are especially encouraged.

In 2024, we introduced the WildSVDD track, which focused on detecting AI-generated singing voices in real-world scenarios. Participants were tasked with identifying whether a given song clip contained a genuine human singer or an AI-generated one, often in the presence of complex background music. The 2025 challenge extends this setting to include potential deepfakes in both the vocals and instrumental parts, increasing the difficulty and relevance of the task. For more information about our previous work, please visit: https://main.singfake.org/ or check out the previous year's results: https://www.music-ir.org/mirex/wiki/2024:MIREX2024_Results.

;Background
:The rapid advancement of generative AI has enabled the creation of highly realistic synthetic songs. Today’s models can not only replicate a singer’s vocal characteristics with minimal training data but also produce convincing musical accompaniments. While this technology opens exciting creative possibilities, it also raises significant ethical, legal, and commercial concerns. Deepfake songs that mimic well-known artists and musical styles pose a growing threat to intellectual property rights and the integrity of music distribution platforms.

:Building on the success of our 2024 SingFake [1] and SVDD [2] challenges—featuring the CtrSVDD and WildSVDD tracks—we aim to further elevate the visibility of this problem within the broader music research community. The CtrSVDD track [3], focusing on controlled vocal synthesis detection, drew strong engagement from the speech research field. The SONICS dataset recently proposed [4] further enriched this research direction. With this year’s expanded challenge, we hope to bring more attention to the complex problem of detecting deepfakes in complete musical compositions and to foster interdisciplinary collaboration between the audio forensics and music information retrieval communities.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184
:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://ieeexplore.ieee.org/document/10832284
:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242
:[4] Rahman, Md Awsafur, Zab er Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, and Shaikh Anowarul Fattah. "SONICS: Synthetic Or Not--Identifying Counterfeit Songs." In Proc. International Conference on Learning Representations (ICLR), 2025. https://openreview.net/forum?id=PY7KSh29Z8

Contact: [mailto:you.zhang@rochester.edu Neil Zhang]

= Dataset =

;WildSVDD Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification. See "Download" section for details.
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

;SONICS Description
:The SONICS dataset, introduced in the ICLR 2025 paper, is a large-scale collection designed for end-to-end synthetic song detection. It consists of over 97,000 songs, amounting to a total of 4,751 hours of audio. This dataset includes 49,074 synthetic songs generated by AI platforms like Suno and Udio, and 48,090 real songs sourced from YouTube. The synthetic songs cover a wide range of genres, music styles, and song lengths (32 to 240 seconds), while the real songs come from 9,096 different artists.
:The SONICS dataset is divided into three parts: training, testing, and validation. The training set contains 77,409 songs. Out of these, 66,709 are real songs, and 10,700 are synthetic songs, which are further divided into categories like Full Fake, Mostly Fake, and Half Fake. The test set includes 9,269 songs. It has 3,396 real songs and 5,873 synthetic songs, also divided into the same categories as the training set. The validation set consists of 4,486 songs, with 1,566 real songs and 2,920 synthetic songs.

For this year's song deepfake detection challenge, we will be using both the test sets of WildSVDD and SONICS, and ranking the pooled EER. Participants will need to submit the score files that indicate the scoring for each sample.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)
* SONICS dataset download: [Huggingface SONICS](https://huggingface.co/datasets/awsaf49/sonics)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: Aug 25, 2025, AOE'''
Leaderboard release will be shortly after that.

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2025.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

Please send your submission to [mailto:you.zhang@rochester.edu Neil Zhang] or contact for any questions you have for the challenge.

2025:Song Deepfake Detection

2025-06-24T04:06:13Z

Yzyouzhang: /* Submission */

= Task Description =

The Song Deepfake Detection Challenge 2025 builds upon last year’s Singing Voice Deepfake Detection Challenge by expanding the task to a broader context: detecting AI-generated content in full songs. Unlike the previous focus solely on vocal deepfakes, this year’s challenge also considers AI-generated background music. We invite participants to develop systems that analyze both musical accompaniment and singing voice components to detect whether a song contains any AI-generated elements. Submissions that incorporate joint modeling of vocals and music or explore their interactions are especially encouraged.

In 2024, we introduced the WildSVDD track, which focused on detecting AI-generated singing voices in real-world scenarios. Participants were tasked with identifying whether a given song clip contained a genuine human singer or an AI-generated one, often in the presence of complex background music. The 2025 challenge extends this setting to include potential deepfakes in both the vocals and instrumental parts, increasing the difficulty and relevance of the task. For more information about our previous work, please visit: https://main.singfake.org/ or check out the previous year's results: https://www.music-ir.org/mirex/wiki/2024:MIREX2024_Results.

;Background
:The rapid advancement of generative AI has enabled the creation of highly realistic synthetic songs. Today’s models can not only replicate a singer’s vocal characteristics with minimal training data but also produce convincing musical accompaniments. While this technology opens exciting creative possibilities, it also raises significant ethical, legal, and commercial concerns. Deepfake songs that mimic well-known artists and musical styles pose a growing threat to intellectual property rights and the integrity of music distribution platforms.

:Building on the success of our 2024 SingFake [1] and SVDD [2] challenges—featuring the CtrSVDD and WildSVDD tracks—we aim to further elevate the visibility of this problem within the broader music research community. The CtrSVDD track [3], focusing on controlled vocal synthesis detection, drew strong engagement from the speech research field. The SONICS dataset recently proposed [4] further enriched this research direction. With this year’s expanded challenge, we hope to bring more attention to the complex problem of detecting deepfakes in complete musical compositions and to foster interdisciplinary collaboration between the audio forensics and music information retrieval communities.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184
:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://ieeexplore.ieee.org/document/10832284
:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242
:[4] Rahman, Md Awsafur, Zab er Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, and Shaikh Anowarul Fattah. "SONICS: Synthetic Or Not--Identifying Counterfeit Songs." In Proc. International Conference on Learning Representations (ICLR), 2025. https://openreview.net/forum?id=PY7KSh29Z8

Contact: [mailto:you.zhang@rochester.edu Neil Zhang]

= Dataset =

;WildSVDD Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification. See "Download" section for details.

;;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)
* SONICS dataset download: [Huggingface SONICS](https://huggingface.co/datasets/awsaf49/sonics)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: Aug 25, 2025, AOE'''
Leaderboard release will be shortly after that.

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2025.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

Please send your submission to [mailto:you.zhang@rochester.edu Neil Zhang] or contact for any questions you have for the challenge.

2025:Song Deepfake Detection

2025-06-24T04:02:48Z

Yzyouzhang:

= Task Description =

The Song Deepfake Detection Challenge 2025 builds upon last year’s Singing Voice Deepfake Detection Challenge by expanding the task to a broader context: detecting AI-generated content in full songs. Unlike the previous focus solely on vocal deepfakes, this year’s challenge also considers AI-generated background music. We invite participants to develop systems that analyze both musical accompaniment and singing voice components to detect whether a song contains any AI-generated elements. Submissions that incorporate joint modeling of vocals and music or explore their interactions are especially encouraged.

In 2024, we introduced the WildSVDD track, which focused on detecting AI-generated singing voices in real-world scenarios. Participants were tasked with identifying whether a given song clip contained a genuine human singer or an AI-generated one, often in the presence of complex background music. The 2025 challenge extends this setting to include potential deepfakes in both the vocals and instrumental parts, increasing the difficulty and relevance of the task. For more information about our previous work, please visit: https://main.singfake.org/ or check out the previous year's results: https://www.music-ir.org/mirex/wiki/2024:MIREX2024_Results.

;Background
:The rapid advancement of generative AI has enabled the creation of highly realistic synthetic songs. Today’s models can not only replicate a singer’s vocal characteristics with minimal training data but also produce convincing musical accompaniments. While this technology opens exciting creative possibilities, it also raises significant ethical, legal, and commercial concerns. Deepfake songs that mimic well-known artists and musical styles pose a growing threat to intellectual property rights and the integrity of music distribution platforms.

:Building on the success of our 2024 SingFake [1] and SVDD [2] challenges—featuring the CtrSVDD and WildSVDD tracks—we aim to further elevate the visibility of this problem within the broader music research community. The CtrSVDD track [3], focusing on controlled vocal synthesis detection, drew strong engagement from the speech research field. The SONICS dataset recently proposed [4] further enriched this research direction. With this year’s expanded challenge, we hope to bring more attention to the complex problem of detecting deepfakes in complete musical compositions and to foster interdisciplinary collaboration between the audio forensics and music information retrieval communities.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184
:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://ieeexplore.ieee.org/document/10832284
:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242
:[4] Rahman, Md Awsafur, Zab er Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, and Shaikh Anowarul Fattah. "SONICS: Synthetic Or Not--Identifying Counterfeit Songs." In Proc. International Conference on Learning Representations (ICLR), 2025. https://openreview.net/forum?id=PY7KSh29Z8

Contact: [mailto:you.zhang@rochester.edu Neil Zhang]

= Dataset =

;WildSVDD Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification. See "Download" section for details.

;;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)
* SONICS dataset download: [Huggingface SONICS](https://huggingface.co/datasets/awsaf49/sonics)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: Aug 25, 2025, AOE'''
Leader board release will be shortly after that.

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2025.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

Please send your submission to [mailto:you.zhang@rochester.edu Neil Zhang] or contact for any questions you have for the challenge.

2025:Song Deepfake Detection

2025-06-24T03:42:06Z

Yzyouzhang: /* Task Description */

= Task Description =

The Song Deepfake Detection Challenge 2025 builds upon last year’s Singing Voice Deepfake Detection Challenge by expanding the task to a broader context: detecting AI-generated content in full songs. Unlike the previous focus solely on vocal deepfakes, this year’s challenge also considers AI-generated background music. We invite participants to develop systems that analyze both musical accompaniment and singing voice components to detect whether a song contains any AI-generated elements. Submissions that incorporate joint modeling of vocals and music or explore their interactions are especially encouraged.

In 2024, we introduced the WildSVDD track, which focused on detecting AI-generated singing voices in real-world scenarios. Participants were tasked with identifying whether a given song clip contained a genuine human singer or an AI-generated one, often in the presence of complex background music. The 2025 challenge extends this setting to include potential deepfakes in both the vocals and instrumental parts, increasing the difficulty and relevance of the task. For more information about our previous work, please visit: https://main.singfake.org/ or check out the previous year's results: https://www.music-ir.org/mirex/wiki/2024:MIREX2024_Results.

;Background
:The rapid advancement of generative AI has enabled the creation of highly realistic synthetic songs. Today’s models can not only replicate a singer’s vocal characteristics with minimal training data but also produce convincing musical accompaniments. While this technology opens exciting creative possibilities, it also raises significant ethical, legal, and commercial concerns. Deepfake songs that mimic well-known artists and musical styles pose a growing threat to intellectual property rights and the integrity of music distribution platforms.

:Building on the success of our 2024 SingFake [1] and SVDD [2] challenges—featuring the CtrSVDD and WildSVDD tracks—we aim to further elevate the visibility of this problem within the broader music research community. The CtrSVDD track [3], focusing on controlled vocal synthesis detection, drew strong engagement from the speech research field. The SONICS dataset recently proposed [4] further enriched this research direction. With this year’s expanded challenge, we hope to bring more attention to the complex problem of detecting deepfakes in complete musical compositions and to foster interdisciplinary collaboration between the audio forensics and music information retrieval communities.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184
:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://ieeexplore.ieee.org/document/10832284
:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242
:[4] Rahman, Md Awsafur, Zab er Ibn Abdul Hakim, Najibul Haque Sarker, Bishmoy Paul, and Shaikh Anowarul Fattah. "SONICS: Synthetic Or Not--Identifying Counterfeit Songs." In Proc. International Conference on Learning Representations (ICLR), 2025. https://openreview.net/forum?id=PY7KSh29Z8

Contact: [mailto:you.zhang@rochester.edu Neil Zhang]

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification. See "Download" section for details.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)
* SONICS dataset download: [Huggingface SONICS](https://huggingface.co/datasets/awsaf49/sonics)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: Aug 25, 2025, AOE'''

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

Please send your submission to [mailto:you.zhang@rochester.edu Neil Zhang].

2025:Song Deepfake Detection

2025-06-24T03:14:39Z

Yzyouzhang: /* Task Description */

= Task Description =

The Song Deepfake Detection Challenge 2025 builds upon last year’s Singing Voice Deepfake Detection Challenge by expanding the task to a broader context: detecting AI-generated content in full songs. Unlike the previous focus solely on vocal deepfakes, this year’s challenge also considers AI-generated background music. We invite participants to develop systems that analyze both musical accompaniment and singing voice components to detect whether a song contains any AI-generated elements. Submissions that incorporate joint modeling of vocals and music or explore their interactions are especially encouraged.

In 2024, we introduced the WildSVDD track, which focused on detecting AI-generated singing voices in real-world scenarios. Participants were tasked with identifying whether a given song clip contained a genuine human singer or an AI-generated one, often in the presence of complex background music. The 2025 challenge extends this setting to include potential deepfakes in both the vocals and instrumental parts, increasing the difficulty and relevance of the task. For more information about our previous work, please visit: https://main.singfake.org/

;Background
:The rapid advancement of generative AI has enabled the creation of highly realistic synthetic songs. Today’s models can not only replicate a singer’s vocal characteristics with minimal training data but also produce convincing musical accompaniments. While this technology opens exciting creative possibilities, it also raises significant ethical, legal, and commercial concerns. Deepfake songs that mimic well-known artists and musical styles pose a growing threat to intellectual property rights and the integrity of music distribution platforms.

:Building on the success of our 2024 SingFake [1] and SVDD [2] challenges—featuring the CtrSVDD and WildSVDD tracks—we aim to further elevate the visibility of this problem within the broader music research community. The CtrSVDD track [3], focusing on controlled vocal synthesis detection, drew strong engagement from the speech research field. With this year’s expanded challenge, we hope to bring more attention to the complex problem of detecting deepfakes in complete musical compositions and to foster interdisciplinary collaboration between the audio forensics and music information retrieval communities.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184
:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132
:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

Contact: [mailto:you.zhang@rochester.edu Neil Zhang]

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification. See "Download" section for details.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)
* SONICS dataset download: [Huggingface SONICS](https://huggingface.co/datasets/awsaf49/sonics)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: Aug 25, 2025, AOE'''

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

Please send your submission to [mailto:you.zhang@rochester.edu Neil Zhang].

2025:Song Deepfake Detection

2025-06-06T02:19:39Z

Yzyouzhang: /* Submission */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts. For more information about our prior work, please visit: https://main.singfake.org/

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

Contact: [mailto:you.zhang@rochester.edu Neil Zhang] & [mailto:yixiao.zhang@qmul.ac.uk Yixiao Zhang]

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification. See "Download" section for details.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)
* SONICS dataset download: [Huggingface SONICS](https://huggingface.co/datasets/awsaf49/sonics)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: Aug 25, 2025, AOE'''

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

Please send your submission to [mailto:you.zhang@rochester.edu Neil Zhang].

2025:Song Deepfake Detection

2025-05-29T02:39:00Z

Yzyouzhang: /* Download */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts. For more information about our prior work, please visit: https://main.singfake.org/

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

Contact: [mailto:you.zhang@rochester.edu Neil Zhang] & [mailto:yixiao.zhang@qmul.ac.uk Yixiao Zhang]

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification. See "Download" section for details.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)
* SONICS dataset download: [Huggingface SONICS](https://huggingface.co/datasets/awsaf49/sonics)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: October 20, AOE'''

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

Please send your submission to [mailto:you.zhang@rochester.edu Neil Zhang].

2024:Singing Voice Deepfake Detection

2024-09-27T12:25:23Z

Yzyouzhang: /* Dataset */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts. For more information about our prior work, please visit: https://main.singfake.org/

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

Contact: [mailto:you.zhang@rochester.edu Neil Zhang] & [mailto:yixiao.zhang@qmul.ac.uk Yixiao Zhang]

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification. See "Download" section for details.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: October 15, AOE'''

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-27T12:24:06Z

Yzyouzhang: /* Task Description */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts. For more information about our prior work, please visit: https://main.singfake.org/

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

Contact: [mailto:you.zhang@rochester.edu Neil Zhang] & [mailto:yixiao.zhang@qmul.ac.uk Yixiao Zhang]

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: October 15, AOE'''

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-26T14:39:50Z

Yzyouzhang: /* Download */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts. For more information about our prior work, please visit: https://main.singfake.org/

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to [mailto:svddchallenge@gmail.com svddchallenge@gmail.com].

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: October 15, AOE'''

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-26T14:38:37Z

Yzyouzhang: /* Download */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts. For more information about our prior work, please visit: https://main.singfake.org/

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation. If you have concerns about downloading data, please reach out to svddchallenge@gmail.com

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: October 15, AOE'''

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-26T14:37:04Z

Yzyouzhang: /* Task Description */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts. For more information about our prior work, please visit: https://main.singfake.org/

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: October 15, AOE'''

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-15T02:20:41Z

Yzyouzhang: /* Download */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts.

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/bFeruNA0, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

* '''Submission Deadline: October 15, AOE'''

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the '''MIREX track''' at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-07T23:01:49Z

Yzyouzhang: /* Baseline */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts.

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3] SVDD 2024@SLT: https://arxiv.org/abs/2408.16132

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/YhpYXT9z, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the late-breaking demo session at ISMIR 2024. https://ismir2024.ismir.net/call-for-late-breaking-demos

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-07T23:01:20Z

Yzyouzhang: /* Baseline */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts.

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake [1] and SingGraph [2] projects. SingGraph includes state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling. The key features of these baselines include robust handling of background music and adaptation to different musical styles. Some results of how baseline systems in SingFake perform on the WildSVDD test data can be found in our SVDD@SLT challenge overview paper [3].

:[1] SingFake: https://github.com/yongyizang/SingFake

:[2] SingGraph: https://github.com/xjchenGit/SingGraph

:[3]

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/YhpYXT9z, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the late-breaking demo session at ISMIR 2024. https://ismir2024.ismir.net/call-for-late-breaking-demos

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-07T22:57:34Z

Yzyouzhang: /* Dataset */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts.

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. We gathered data annotations from social media platforms. The annotators, who were familiar with the singers they covered, manually verified the user-specified labels during the annotation process to ensure accuracy, especially in cases where the singer(s) did not actually perform certain songs. We cross-checked the annotations against song titles and descriptions and manually reviewed any discrepancies for further verification.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset of the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake and SingGraph projects. These baselines include state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling and controlled SVDD analysis. The key features of these baselines include robust handling of background music and adaptation to different musical styles.

[1] SingFake: https://github.com/yongyizang/SingFake
[2] SingGraph: https://github.com/xjchenGit/SingGraph

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/YhpYXT9z, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the late-breaking demo session at ISMIR 2024. https://ismir2024.ismir.net/call-for-late-breaking-demos

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-07T22:48:48Z

Yzyouzhang: /* Task Description */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts.

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. It comprises 97 singers with 2,007 deepfake and 1,216 bonafide song clips, annotated for accuracy.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset from the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake and SingGraph projects. These baselines include state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling and controlled SVDD analysis. The key features of these baselines include robust handling of background music and adaptation to different musical styles.

[1] SingFake: https://github.com/yongyizang/SingFake
[2] SingGraph: https://github.com/xjchenGit/SingGraph

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/YhpYXT9z, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the late-breaking demo session at ISMIR 2024. https://ismir2024.ismir.net/call-for-late-breaking-demos

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-07T22:47:54Z

Yzyouzhang: /* Task Description */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts.

;Background
:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses~\cite{collins2024avoiding}. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. It comprises 97 singers with 2,007 deepfake and 1,216 bonafide song clips, annotated for accuracy.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset from the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake and SingGraph projects. These baselines include state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling and controlled SVDD analysis. The key features of these baselines include robust handling of background music and adaptation to different musical styles.

[1] SingFake: https://github.com/yongyizang/SingFake
[2] SingGraph: https://github.com/xjchenGit/SingGraph

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/YhpYXT9z, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the late-breaking demo session at ISMIR 2024. https://ismir2024.ismir.net/call-for-late-breaking-demos

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-07T22:47:31Z

Yzyouzhang: /* Task Description */

= Task Description =

The WildSVDD challenge aims to detect AI-generated singing voices in real-world scenarios. The task involves distinguishing authentic human-sung songs from AI-generated deepfake songs at the clip level. Participants are required to identify whether each segmented clip contains a genuine singer or an AI-generated fake singer. The developed systems are expected to account for the complexities introduced by background music and various musical contexts.

;Background

:With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. These synthesized voices can now emulate the vocal characteristics of any singer with minimal training data. While this technological advancement is impressive, it has sparked widespread concerns among artists, record labels, and publishing houses~\cite{collins2024avoiding}. The potential for unauthorized synthetic reproductions that mimic well-known singers poses a real threat to original artists' commercial value and intellectual property rights, igniting urgent calls for efficient and accurate methods to detect these deepfake singing voices.

:This challenge is an extension of our precious work SingFake [1] and was initially introduced at the 2024 IEEE Spoken Language Technology Workshop (SLT 2024) [2] with CtrSVDD track and WildSVDD track. The CtrSVDD track [3] garnered significant attention from the speech community. We aim to raise more awareness for WildSVDD within the ISMIR community and leverage the expertise of music experts.

:[1] Zang, Yongyi, You Zhang, Mojtaba Heydari, and Zhiyao Duan. "SingFake: Singing voice deepfake detection." In Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 12156-12160. IEEE, 2024. https://ieeexplore.ieee.org/document/10448184

:[2] Zhang, You, Yongyi Zang, Jiatong Shi, Ryuichi Yamamoto, Tomoki Toda, and Zhiyao Duan. "SVDD 2024: The Inaugural Singing Voice Deepfake Detection Challenge." In Proc. IEEE Spoken Language Technology (SLT), 2024. https://arxiv.org/abs/2408.16132

:[3] Zang, Yongyi, Jiatong Shi, You Zhang, Ryuichi Yamamoto, Jionghao Han, Yuxun Tang, Shengyuan Xu et al. “CtrSVDD: A Benchmark Dataset and Baseline Analysis for Controlled Singing Voice Deepfake Detection.” In Proc. Interspeech, pp. 4783-4787, 2024. https://doi.org/10.21437/Interspeech.2024-2242

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. It comprises 97 singers with 2,007 deepfake and 1,216 bonafide song clips, annotated for accuracy.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset from the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake and SingGraph projects. These baselines include state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling and controlled SVDD analysis. The key features of these baselines include robust handling of background music and adaptation to different musical styles.

[1] SingFake: https://github.com/yongyizang/SingFake
[2] SingGraph: https://github.com/xjchenGit/SingGraph

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/YhpYXT9z, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the late-breaking demo session at ISMIR 2024. https://ismir2024.ismir.net/call-for-late-breaking-demos

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-07T21:27:26Z

Yzyouzhang: /* Submission */

= Task Description =

The WildSVDD challenge focuses on the detection of AI-generated singing voices in the wild. With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. This task challenges participants to develop systems capable of accurately distinguishing real singing voices from AI-generated ones, especially within the complex context of background music and diverse musical environments. Participants will leverage the WildSVDD dataset, which includes a wide variety of song clips, both bonafide and deepfake, to develop and evaluate their systems.

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. It comprises 97 singers with 2,007 deepfake and 1,216 bonafide song clips, annotated for accuracy.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset from the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake and SingGraph projects. These baselines include state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling and controlled SVDD analysis. The key features of these baselines include robust handling of background music and adaptation to different musical styles.

[1] SingFake: https://github.com/yongyizang/SingFake
[2] SingGraph: https://github.com/xjchenGit/SingGraph

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/YhpYXT9z, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the late-breaking demo session at ISMIR 2024. https://ismir2024.ismir.net/call-for-late-breaking-demos

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-07T06:15:53Z

Yzyouzhang: /* Submission */

= Task Description =

The WildSVDD challenge focuses on the detection of AI-generated singing voices in the wild. With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. This task challenges participants to develop systems capable of accurately distinguishing real singing voices from AI-generated ones, especially within the complex context of background music and diverse musical environments. Participants will leverage the WildSVDD dataset, which includes a wide variety of song clips, both bonafide and deepfake, to develop and evaluate their systems.

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. It comprises 97 singers with 2,007 deepfake and 1,216 bonafide song clips, annotated for accuracy.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset from the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake and SingGraph projects. These baselines include state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling and controlled SVDD analysis. The key features of these baselines include robust handling of background music and adaptation to different musical styles.

[1] SingFake: https://github.com/yongyizang/SingFake
[2] SingGraph: https://github.com/xjchenGit/SingGraph

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/YhpYXT9z, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the late-breaking demo session at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-07T06:15:40Z

Yzyouzhang: /* Submission */

= Task Description =

The WildSVDD challenge focuses on the detection of AI-generated singing voices in the wild. With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. This task challenges participants to develop systems capable of accurately distinguishing real singing voices from AI-generated ones, especially within the complex context of background music and diverse musical environments. Participants will leverage the WildSVDD dataset, which includes a wide variety of song clips, both bonafide and deepfake, to develop and evaluate their systems.

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. It comprises 97 singers with 2,007 deepfake and 1,216 bonafide song clips, annotated for accuracy.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset from the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake and SingGraph projects. These baselines include state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling and controlled SVDD analysis. The key features of these baselines include robust handling of background music and adaptation to different musical styles.

[1] SingFake: https://github.com/yongyizang/SingFake
[2] SingGraph: https://github.com/xjchenGit/SingGraph

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/YhpYXT9z, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

;Results submission

:Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

;System description submission
:Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.

;Research paper submission
:Participants are encouraged to submit a research paper to the late-breaking demo session at ISMIR 2024.

;Workshop presentation
:We will invite top-ranked participants to present their work during the workshop session. The format will by hybrid to accommodate remote participation.

2024:Singing Voice Deepfake Detection

2024-09-07T06:11:32Z

Yzyouzhang: /* Submission */

= Task Description =

The WildSVDD challenge focuses on the detection of AI-generated singing voices in the wild. With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. This task challenges participants to develop systems capable of accurately distinguishing real singing voices from AI-generated ones, especially within the complex context of background music and diverse musical environments. Participants will leverage the WildSVDD dataset, which includes a wide variety of song clips, both bonafide and deepfake, to develop and evaluate their systems.

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. It comprises 97 singers with 2,007 deepfake and 1,216 bonafide song clips, annotated for accuracy.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset from the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake and SingGraph projects. These baselines include state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling and controlled SVDD analysis. The key features of these baselines include robust handling of background music and adaptation to different musical styles.

[1] SingFake: https://github.com/yongyizang/SingFake
[2] SingGraph: https://github.com/xjchenGit/SingGraph

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/YhpYXT9z, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

## Results submission

Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

## Paper submission

2024:Singing Voice Deepfake Detection

2024-09-07T06:09:35Z

Yzyouzhang: /* Download */

= Task Description =

The WildSVDD challenge focuses on the detection of AI-generated singing voices in the wild. With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. This task challenges participants to develop systems capable of accurately distinguishing real singing voices from AI-generated ones, especially within the complex context of background music and diverse musical environments. Participants will leverage the WildSVDD dataset, which includes a wide variety of song clips, both bonafide and deepfake, to develop and evaluate their systems.

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. It comprises 97 singers with 2,007 deepfake and 1,216 bonafide song clips, annotated for accuracy.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset from the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake and SingGraph projects. These baselines include state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling and controlled SVDD analysis. The key features of these baselines include robust handling of background music and adaptation to different musical styles.

[1] SingFake: https://github.com/yongyizang/SingFake
[2] SingGraph: https://github.com/xjchenGit/SingGraph

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Download tools: https://pastebin.com/YhpYXT9z, https://cobalt.tools/, https://github.com/ytdl-org/youtube-dl, https://github.com/yt-dlp/yt-dlp, https://www.locoloader.com/bilibili-video-downloader/
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.

2024:Singing Voice Deepfake Detection

2024-09-07T06:05:15Z

Yzyouzhang: /* Baseline */

= Task Description =

The WildSVDD challenge focuses on the detection of AI-generated singing voices in the wild. With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. This task challenges participants to develop systems capable of accurately distinguishing real singing voices from AI-generated ones, especially within the complex context of background music and diverse musical environments. Participants will leverage the WildSVDD dataset, which includes a wide variety of song clips, both bonafide and deepfake, to develop and evaluate their systems.

= Dataset =

;Description
:The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. It comprises 97 singers with 2,007 deepfake and 1,216 bonafide song clips, annotated for accuracy.

;Description of Audio Files
:The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.

;Description of Split
:The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset from the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

= Baseline =

;Model Architecture
:Participants are referred to baseline systems from the SingFake and SingGraph projects. These baselines include state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling and controlled SVDD analysis. The key features of these baselines include robust handling of background music and adaptation to different musical styles.

[1] SingFake: https://github.com/yongyizang/SingFake
[2] SingGraph: https://github.com/xjchenGit/SingGraph

= Metrics =

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

= Download =

The dataset and necessary resources can be accessed via the following links:

* Dataset download: [Zenodo WildSVDD](https://zenodo.org/records/10893604)
* Segmentation tool: [SingFake GitHub](https://github.com/yongyizang/SingFake/tree/main/dataset)

Participants are encouraged to use the provided tools for segmenting song clips to ensure consistency in evaluation.

= Rules =

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

= Submission =

Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.