Difference between revisions of "2024:Singing Voice Deepfake Detection"

From MIREX Wiki
(Submission)
(Submission)
Line 50: Line 50:
  
 
;Research paper submission
 
;Research paper submission
:Participants are encouraged to submit a research paper to the late-breaking demo session at ISMIR 2024.
+
:Participants are encouraged to submit a research paper to the late-breaking demo session at ISMIR 2024. https://ismir2024.ismir.net/call-for-late-breaking-demos
  
 
;Workshop presentation
 
;Workshop presentation
 
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.
 
:We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.

Revision as of 16:27, 7 September 2024

Task Description

The WildSVDD challenge focuses on the detection of AI-generated singing voices in the wild. With the advancement of AI technology, singing voices generated by AI are becoming increasingly indistinguishable from human performances. This task challenges participants to develop systems capable of accurately distinguishing real singing voices from AI-generated ones, especially within the complex context of background music and diverse musical environments. Participants will leverage the WildSVDD dataset, which includes a wide variety of song clips, both bonafide and deepfake, to develop and evaluate their systems.

Dataset

Description
The WildSVDD dataset is an extension of the SingFake dataset, now expanded to include a more diverse and comprehensive collection of real and AI-generated singing voice clips. It comprises 97 singers with 2,007 deepfake and 1,216 bonafide song clips, annotated for accuracy.
Description of Audio Files
The audio files in the WildSVDD dataset represent a broad range of languages and singers. These clips include strong background music, simulating real-world conditions that challenge the distinction between real and AI-generated voices. The dataset ensures diversity in the source material, with varying levels of complexity in the musical contexts.
Description of Split
The dataset is divided into training and evaluation subsets. Test Set A includes new samples, while Test Set B represents the most challenging subset from the SingFake dataset. Participants are permitted to use the training data to create validation sets but must adhere to restrictions on the usage of the evaluation data.

Baseline

Model Architecture
Participants are referred to baseline systems from the SingFake and SingGraph projects. These baselines include state-of-the-art components for detecting AI-generated singing voices, incorporating advanced techniques like graph modeling and controlled SVDD analysis. The key features of these baselines include robust handling of background music and adaptation to different musical styles.

[1] SingFake: https://github.com/yongyizang/SingFake [2] SingGraph: https://github.com/xjchenGit/SingGraph

Metrics

The primary metric for evaluation is Equal Error Rate (EER), which reflects the system's ability to distinguish between bonafide and deepfake singing voices regardless of the threshold set. EER is preferred over accuracy as it does not depend on a fixed threshold, providing a more reliable assessment of system performance. A lower EER indicates a better distinction between real and AI-generated voices.

Download

The dataset and necessary resources can be accessed via the following links:

Participants are encouraged to use the provided tools to download and segment song clips to ensure consistency in evaluation.

Rules

Participants are allowed to use any publicly available datasets for training, excluding those used in the test set. Any additional data sources or pre-trained models must be clearly documented in the system descriptions. Private data or models are strictly prohibited to maintain fairness. All submissions should focus on segment-level evaluation, with results presented in a score file format.

Submission

Results submission
Participants should submit a score TXT file that includes the URLs, segment start and end timestamps, and the corresponding scores indicating the system's confidence in identifying bonafide or deepfake clips. Submissions will be evaluated based on EER, and the results will be ranked accordingly.
System description submission
Participants are required to describe their system, including the data preprocessing, model architecture, training details, post-processing, etc.
Research paper submission
Participants are encouraged to submit a research paper to the late-breaking demo session at ISMIR 2024. https://ismir2024.ismir.net/call-for-late-breaking-demos
Workshop presentation
We will invite top-ranked participants to present their work during the workshop session. The format will be hybrid to accommodate remote participation.