1 Task Description
2 Dataset
- 2.1 Hidden Test Dataset
- 2.2 Training Dataset
3 Input and Output Format
- 3.1 Input
- 3.2 Output
4 Baselines
5 Metrics
- 5.1 Primary Metric
- 5.2 Secondary Metrics
6 Download
7 Rules
8 Submission
- 8.1 Output CSV Format
9 Paper
10 Task Captains
11 Future Iterations
12 Bibliography

Task Description

The MIREX 2026 AI-Generated Music Detection Task invites participants to develop systems that can detect whether a music recording is fully AI-generated or human-made.

In the first iteration of this task, we focus on a simple and accessible binary setting:

Positive: the recording is fully AI-generated music.
Negative: the recording is real human-made music.

Participants are asked to submit systems that take a music audio recording as input and output a probability score between 0 and 1. A higher score indicates that the system believes the recording is more likely to be AI-generated.

This first-year task is intentionally limited to full-song AI-generated music detection. More complex cases, such as AI-generated stems, AI-assisted remixing, localized AI insertions, neural codec reconstruction, or partially AI-involved music, may be considered in future iterations.

Dataset

Hidden Test Dataset

The official evaluation set will be hidden. Participants will not receive the raw evaluation audio or item-level labels.

The hidden test set will contain both fully AI-generated music and real human-made music. The AI-generated portion will be constructed from multiple music generation systems, where licensing and evaluation conditions permit.

Candidate AI-generated sources may include:

Suno
Udio
Mureka
MiniMax
YuE
ACE-Step

The real-music negative examples will be selected from CC0 or otherwise evaluation-compatible human-made music sources.

The hidden test set will be designed to evaluate whether systems can generalize across multiple AI music generators rather than overfitting to a single source. The exact composition of the hidden test set will not be disclosed before evaluation.

Training Dataset

We plan to provide a training dataset to make the task easier to enter, especially for participants who do not have access to large-scale AI-generated music data.

Possible training sources include:

SONICS: approximately 96k tracks generated by Suno v3.5 and Udio.
Muse: approximately 116k tracks generated by Suno v5.
Other AI-generated or human-made music sources, to be determined.

The final training dataset will be announced before the submission phase.

Participants may also use their own public, private, synthetic, or self-constructed training data, provided that no part of the hidden evaluation set is used directly or indirectly for training, validation, model selection, prompt tuning, or threshold tuning.

Input and Output Format

Input

Submitted systems will receive:

A directory of audio files
A metadata CSV file

Audio files will be provided as WAV files. They may be:

44.1 kHz or 48 kHz
Mono or stereo

Output

Each system must output one AI-generated music score for each audio file.

The score must be a scalar value in the range [0, 1], where:

0 means the system believes the recording is very unlikely to be AI-generated.
1 means the system believes the recording is very likely to be AI-generated.

Baselines

We plan to provide a baseline model and checkpoint to help participants get started.

The baseline system may include:

A standard audio classifier trained on the provided training dataset
A music or audio foundation model with a binary classification head
A reproducible inference pipeline
A released checkpoint
Example scripts for running inference and producing the required submission file

The baseline is intended as a starting point rather than a competitive upper bound. Participants are encouraged to improve upon it using better architectures, training strategies, data construction, calibration, and robustness methods.

Metrics

The official evaluation is binary. Each system outputs a continuous AI-generated music score for each test item.

Primary Metric

Macro-averaged AUROC across hidden evaluation strata: The primary ranking metric will be macro-averaged AUROC across hidden evaluation strata. AUROC is used because different real-world applications may require different operating thresholds. Macro-averaging prevents the final ranking from being dominated by easy subsets or by one particular generator family.

Secondary Metrics

Secondary metrics will be reported for diagnostic analysis. These may include:

Pooled AUROC: Measures overall ranking performance across all test items.

AUPRC: Measures precision-recall performance, especially under class imbalance.

Equal Error Rate: Reports the point where false positive rate and false negative rate are equal.

Balanced Accuracy: Measures classification accuracy while accounting for class balance.

F1 Score: Measures the harmonic mean of precision and recall at a selected threshold.

False Positive Rate on Real Human Music: Measures how often human-made music is incorrectly classified as AI-generated.

False Negative Rate by Generator Source: Measures how often AI-generated music from different generator families is missed.

Additional diagnostic results may include performance by vocal/instrumental category, generator-held-out condition, compression condition, excerpt length, and difficulty level.

Internal metadata will be used only for aggregate diagnostic reporting and will not be released at the item level.

Download

The hidden evaluation set will not be publicly released.

The held-out audio will remain private to the organizers throughout and after the evaluation. All evaluation items will be created, licensed, commissioned, or selected under conditions that permit private evaluation by the organizers.

The training dataset, baseline model, checkpoint, and example submission scripts will be released before the submission phase, subject to licensing and infrastructure constraints.

Rules

Participants may use the provided training dataset and baseline model.
Participants may use external datasets and pre-trained models.
Participants may use public, private, synthetic, or self-constructed data for training.
Participants must not use any part of the hidden evaluation set for training, validation, model selection, prompt tuning, or threshold tuning.
Participants must describe all training data, pre-trained models, external APIs, watermark detectors, and major preprocessing steps in the technical report.
External API calls are discouraged and may be prohibited depending on MIREX execution policy, privacy requirements, and reproducibility constraints.
The full hidden test set must be processed within a 24-hour wall-clock budget on a single GPU.
Submissions that exceed the time budget, fail on more than 5% of test items, or output invalid scores for more than 5% of test items will be reported but excluded from the primary ranking.
Participants must respect all relevant licenses for the data and models used in their systems.

Submission

Participants are required to submit the following:

Docker container: A Docker container with a standardized inference interface. The system should take a directory of WAV files and a metadata CSV file as input, and produce a CSV file containing one AI-generated music score per item.

Technical report: A 2-4 page technical report in ISMIR LBD format. The report should describe the system architecture, training data, preprocessing, inference-time input duration, use of external APIs if any, use of watermark detection if any, thresholding strategy, known limitations, and compute requirements.

Compute declaration: A compute declaration reporting training data size, model size, GPU memory footprint, average inference time per track, total expected runtime, and other computational resources used in model development.

Output CSV Format

The output CSV should contain one row per audio file.

Example:

track_id,ai_generated_score
000001,0.972
000002,0.084
000003,0.611
000004,0.238

The exact metadata format and required file naming convention will be announced before the submission phase.

Each participant or team may submit up to four versions of their system. The final ranking will be based on the official evaluation metrics described above.

Submission Deadline: TBD

Submission Platform: TBD

Paper

Participants are encouraged to submit a short technical report describing their system, training data, and analysis of results.

Top-ranked participants may be invited to present their systems in a MIREX or ISMIR-related session, depending on the final organization of the task.

Task Captains

Yixiao Zhang & You Zhang

Additional task captains may be added for dataset licensing, evaluation infrastructure, or conflict-of-interest management.

Future Iterations

The first iteration focuses on full-song AI-generated music detection.

Future iterations may extend the task to broader AI music provenance detection, including:

AI-generated vocals or instrumental stems
Human music with AI-generated stem replacement
AI-assisted remixing or arrangement
Localized AI-generated insertions or continuations
Human-created music reconstructed through neural codecs or vocoders
Segment-level localization
Generator-held-out evaluation
Open-set detection
Robustness to adversarial post-processing
Calibration under deployment-like class imbalance

Bibliography

[1] MIREX 2026 Call for Challenges. Music Information Retrieval Evaluation eXchange.

[2] MIREX 2025 Song Deepfake Detection Challenge. Music Information Retrieval Evaluation eXchange.

[3] SONICS: large-scale AI-generated music dataset including Suno v3.5 and Udio. Full citation to be added.

[4] Muse: large-scale AI-generated music dataset including Suno v5. Full citation to be added.

2026:AI-Generated Music Detection

Contents

Task Description

Dataset

Hidden Test Dataset

Training Dataset

Input and Output Format

Input

Output

Baselines

Metrics

Primary Metric

Secondary Metrics

Download

Rules

Submission

Output CSV Format

Paper

Task Captains

Future Iterations

Bibliography

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools