<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://music-ir.org/mirex/w/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=A43992899</id>
	<title>MIREX Wiki - User contributions [en]</title>
	<link rel="self" type="application/atom+xml" href="https://music-ir.org/mirex/w/api.php?action=feedcontributions&amp;feedformat=atom&amp;user=A43992899"/>
	<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/wiki/Special:Contributions/A43992899"/>
	<updated>2026-04-29T23:08:56Z</updated>
	<subtitle>User contributions</subtitle>
	<generator>MediaWiki 1.31.1</generator>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2025:Lyrics_Transcription&amp;diff=14646</id>
		<title>2025:Lyrics Transcription</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2025:Lyrics_Transcription&amp;diff=14646"/>
		<updated>2025-05-29T05:36:33Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Description =&lt;br /&gt;
&lt;br /&gt;
This pages describes the '''MIREX2025: Automatic Lyrics Transcription''' challenge. For evaluation procedure and the submission format please scroll down the page. &lt;br /&gt;
&lt;br /&gt;
The task of Lyrics Transcription aims to identify the words from sung utterances, in the same way as in automatic speech recognition. This can be mathematically expressed as follows:&lt;br /&gt;
&lt;br /&gt;
  Prediction('''w''') = argmax P('''w'''|'''X''')&lt;br /&gt;
&lt;br /&gt;
where '''w''' and '''X''' are the word and acoustic features respectively.&lt;br /&gt;
&lt;br /&gt;
Ideally, the lyrics transcriber should return meaningful word sequences:&lt;br /&gt;
&lt;br /&gt;
  Prediction('''w''')  = [ &amp;lt;w_1&amp;gt;, &amp;lt;w_2&amp;gt;, ..., &amp;lt;w_N&amp;gt; ]&lt;br /&gt;
&lt;br /&gt;
The algorithm receives either monophonic singing performances or a polyphonic mix (singing voice + musical accompaniment). Both cases are evaluated separately in this challenge.&lt;br /&gt;
&lt;br /&gt;
= Evaluation =&lt;br /&gt;
&lt;br /&gt;
'''Word Error Rate''' (WER) : the standard metric use in Automatic Speech Recognition.&lt;br /&gt;
&lt;br /&gt;
  WER = (S + I + D) / (C + S + D)&lt;br /&gt;
&lt;br /&gt;
where;&lt;br /&gt;
 C : correctly predicted words&lt;br /&gt;
 S : substitution errors&lt;br /&gt;
 I : insertion errors&lt;br /&gt;
 D : deletion errors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Character Error Rate''' (CER) : the above computation can also be done on the character level. This metric penalises the partially correctly predicted / incorrectly spelled words less than WER.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
IMPORTANT: The evaluation samples have few minutes of audio length. The submission is expected to be able to transcribe the entire recording. If your submission requires segmentation as a preprocessing step, this should already be implemented in your pipeline.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Submission Format =&lt;br /&gt;
&lt;br /&gt;
Submissions must be done through the MIREX system (info available [https://www.music-ir.org/mirex/wiki/2021:Main_Page#MIREX_2021_Submission_Instructions here]) and should be packaged in a compressed file (.zip or .rar, etc.) which contains at least two files:&lt;br /&gt;
&lt;br /&gt;
=== A) The main transcription script ===&lt;br /&gt;
&lt;br /&gt;
The main transcription script to execute. This should be a '''one-line executable''' in one of the following formats: a bash (.sh) a python (.py) script, or a binary file.&lt;br /&gt;
&lt;br /&gt;
===  I / O ===&lt;br /&gt;
&lt;br /&gt;
The submitted algorithm must take as arguments an audio file and the full output path to save the transcriptions. The ability to specify the output path and file name is essential.&lt;br /&gt;
&lt;br /&gt;
Denoting the input audio filename path as $[input_audio_path} and the output file path and name as ${output}, a program called `foobar' will be called from the command-line as follows:&lt;br /&gt;
&lt;br /&gt;
 foobar ${input_audio_path}  ${output}&lt;br /&gt;
&lt;br /&gt;
OR with flags:&lt;br /&gt;
&lt;br /&gt;
 foobar -i ${input_audio_path}  -o ${output}&lt;br /&gt;
&lt;br /&gt;
==== Input Audio ====&lt;br /&gt;
&lt;br /&gt;
Participating algorithms will have to receive the following input format:&lt;br /&gt;
&lt;br /&gt;
* Audio format : WAV / MP3&lt;br /&gt;
* CD-quality (PCM, 16-bit, 44100 Hz)&lt;br /&gt;
* single channel (mono) for a cappella (Hansen) and two channels for original&lt;br /&gt;
&lt;br /&gt;
==== Output File Format ====&lt;br /&gt;
&lt;br /&gt;
A text file (per song) containing list of words separated by white space:&lt;br /&gt;
&lt;br /&gt;
  &amp;lt;word_1&amp;gt; &amp;lt;word_2&amp;gt; ... &amp;lt;word_N&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Any non-word items (e.g. silence, music, noise or end of the sentence tokens) should be removed from the final output.&lt;br /&gt;
&lt;br /&gt;
Ideally, the output transcriptions will be saved as:&lt;br /&gt;
 &lt;br /&gt;
  ${output}/${input_song_id}.txt&lt;br /&gt;
&lt;br /&gt;
=== B) The README file ===&lt;br /&gt;
&lt;br /&gt;
This file must contain detailed installation instructions, the use of the main script and contact information.&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
&lt;br /&gt;
Any submission that is failed to meet above requirements will not be considered in evaluation!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Training Datasets =&lt;br /&gt;
&lt;br /&gt;
Datasets within automatic lyrics transcription research can be categorised under two domains in regards to the presence of music instruments accompanying the singer: Monophonic and polyphonic datasets. &lt;br /&gt;
&lt;br /&gt;
The former is considered to have only one singer singing the lyrics, and the latter is when there is music accompaniment. &lt;br /&gt;
&lt;br /&gt;
In this challenge, the participants are encouraged but '''not obliged''' to use the open source datasets below, which are also commonly used in the literature for benchmarking ALT results:&lt;br /&gt;
&lt;br /&gt;
=== DAMP dataset ===&lt;br /&gt;
The [https://zenodo.org/record/2747436#.Xyge4xMzZ0s DAMP - Sing!300x30x2 dataset] consists of solo singing recordings (monophonic) performed by amateur singers, collected via a mobile Karaoke application. &lt;br /&gt;
&lt;br /&gt;
The data is curated to be gender-wise balanced and contains performers from 30 different countries, which provides a good amount of variability in terms of accents and pronunciation.  &lt;br /&gt;
[https://docs.google.com/spreadsheets/d/1YwhPhXU6t-BMZfdEODS_pNW_umFIsciYL62kh-fiBWI/edit?usp=sharing list of recordings]. For more details see the paper. &lt;br /&gt;
&lt;br /&gt;
* The audio can be downloaded from the [https://ccrma.stanford.edu/damp/ Smule web site]&lt;br /&gt;
* Lyrics boundary annotations can be generated from raw annotations using [https://github.com/groadabike/Kaldi-Dsing-task this repository]. Paper [https://isca-speech.org/archive/Interspeech_2019/pdfs/2378.pdf here (1)].&lt;br /&gt;
* Or annotations can be directly retrieved in the Kaldi form [https://github.com/emirdemirel/ALTA/s5/data here] Paper [https://arxiv.org/pdf/2007.06486.pdf here (2)].&lt;br /&gt;
&lt;br /&gt;
=== DALI Dataset ===&lt;br /&gt;
&lt;br /&gt;
DALI (a large '''D'''ataset of synchronised '''A'''udio, '''L'''yr'''I'''cs and notes) (3) is the benchmark dataset for building an acoustic model on polyphonic recordings (4,5,6) and it contains over 5000 songs with semi-automatically aligned lyrics annotations.&lt;br /&gt;
&lt;br /&gt;
The songs are commercial recordings in full-duration, whereas the lyrics are described according to different levels of granularity including words and notes (and syllables underlying a given note).&lt;br /&gt;
&lt;br /&gt;
For each song DALI provides a link to a matched youtube video for the audio retrieval.&lt;br /&gt;
&lt;br /&gt;
* For more details how, see its full description [https://github.com/gabolsgabs/DALI here]. Paper [https://arxiv.org/pdf/1906.10606.pdf here].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Evaluation Datasets =&lt;br /&gt;
&lt;br /&gt;
The following datasets are used for evaluation and so '''cannot''' be used by participants to train their models under any circumstance. &lt;br /&gt;
&lt;br /&gt;
Note that the evaluation sets listed below consist of popular songs in English language, and have overlapping samples with DALI. &lt;br /&gt;
&lt;br /&gt;
'''*** IMPORTANT ***'''    In case using DALI for training, you '''MUST''' exclude [https://www.music-ir.org/mirex/wiki/2020:Lyrics_Transcription_Results the songs used for MIREX evaluation] during training your model in order to make a scientific evaluation possible. &lt;br /&gt;
&lt;br /&gt;
=== Hansen's Dataset ===&lt;br /&gt;
The dataset contains 9 pop music songs released in early 2010s.&lt;br /&gt;
&lt;br /&gt;
The audio has two versions: the original mix with instrumental accompaniment and a cappella singing voice only one. An example song can be seen [https://www.dropbox.com/sh/wm6k4dqrww0fket/AAC1o1uRFxBPg9iAeSAd1Wxta?dl=0 here].&lt;br /&gt;
&lt;br /&gt;
You can read in detail about how the dataset was made here: [http://publica.fraunhofer.de/documents/N-345612.html (7)]. The recordings have been provided by Jens Kofod Hansen for public evaluation.&lt;br /&gt;
&lt;br /&gt;
* file duration up to 4:40 minutes (total time: 35:33 minutes)&lt;br /&gt;
* 3590 words annotated in total&lt;br /&gt;
&lt;br /&gt;
=== Mauch's Dataset ===&lt;br /&gt;
&lt;br /&gt;
The dataset contains 20 pop music songs with annotations of beginning-timestamps of each word.&lt;br /&gt;
The audio has instrumental accompaniment. An example song can be seen [https://www.dropbox.com/sh/8pp4u2xg93z36d4/AAAsCE2eYW68gxRhKiPH_VvFa?dl=0 here].&lt;br /&gt;
&lt;br /&gt;
You can read in detail about how the dataset was used for the first time here: [https://pdfs.semanticscholar.org/547d/7a5d105380562ca3543bf05b4d5f7a8bee66.pdf (8)] . The dataset has been provided by Sungkyun Chang.&lt;br /&gt;
&lt;br /&gt;
* file duration up to 5:40 minutes (total time: 1h 19m)&lt;br /&gt;
* 5050 words annotated in total&lt;br /&gt;
&lt;br /&gt;
=== Jamendo Dataset ===&lt;br /&gt;
&lt;br /&gt;
This dataset contains 20 recordings with varying Western music genres, annotated with start-of-word timestamps. All songs have instrumental accompaniment.&lt;br /&gt;
&lt;br /&gt;
It is available online on [https://github.com/f90/jamendolyrics Github], although note that we do not allow tuning model parameters using this data, it can only be used to gain insight into the general structure of the test data. For more information also refer to [https://arxiv.org/abs/1902.06797 this paper (9)].&lt;br /&gt;
&lt;br /&gt;
* file duration up to 4:43 (total time: 1h 12m)&lt;br /&gt;
* 5677 words annotated in total&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Time and hardware limits =&lt;br /&gt;
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.&lt;br /&gt;
A hard limit of 24 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result. In addition, submission that are not able to run with the provided RAM and CPU instructions provided by you may not receive a result.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Questions? =&lt;br /&gt;
&lt;br /&gt;
* send us an email - ruibiny@alumni.cmu.edu (Ruibin Yuan), at2jjy@gmail.com (Junyan Jiang)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Bibliography =&lt;br /&gt;
&lt;br /&gt;
1 - G.R., Barker, J. (2019) Automatic Lyric Transcription from Karaoke Vocal Tracks: Resources and a Baseline System. Proc. Interspeech 2019, 579-583, doi: 10.21437/Interspeech.2019-2378&lt;br /&gt;
&lt;br /&gt;
2 - Demirel, E., Ahlbäck, S., &amp;amp; Dixon, S. (2020). Automatic Lyrics Transcription using Dilated Convolutional Neural Networks with Self-Attention. In IJCNN 2020, 1-8. IEEE.&lt;br /&gt;
&lt;br /&gt;
3 - Meseguer-Brocal, G., Cohen-Hadria, A., &amp;amp; Peeters, G. (2019). DALI: A large dataset of synchronized audio, lyrics and notes, automatically created using teacher-student machine learning paradigm. In ISMIR 2018.&lt;br /&gt;
&lt;br /&gt;
4 - Gupta, C., Yılmaz, E., &amp;amp; Li, H. (2020). Automatic lyrics alignment and transcription in polyphonic music: Does background music help?. In ICASSP 2020, 496-500. IEEE.&lt;br /&gt;
&lt;br /&gt;
5 - Basak, S., Agarwal, S., Ganapathy, S., &amp;amp; Takahashi, N. (2021, June). End-to-End Lyrics Recognition with Voice to Singing Style Transfer. In ICASSP 2021, 266-270. IEEE.&lt;br /&gt;
&lt;br /&gt;
6- Demirel, E., Ahlbäck, S., &amp;amp; Dixon, S. (2021). MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription. Proc. ISMIR 2021.&lt;br /&gt;
&lt;br /&gt;
7 - Hansen, J. K., &amp;amp; Fraunhofer, I. D. M. T. (2012). Recognition of phonemes in a-cappella recordings using temporal patterns and mel frequency cepstral coefficients. In 9th Sound and Music Computing Conference (SMC), 494-499.&lt;br /&gt;
&lt;br /&gt;
8 - Mauch, M., Fujihara, H., &amp;amp; Goto, M. (2012). Integrating additional chord information into HMM-based lyrics-to-audio alignment. ICASSP 2012, 200-210, IEEE.&lt;br /&gt;
&lt;br /&gt;
9 - Stoller, D. and Durand, S. and Ewert, S. (2019) End-to-end Lyrics Alignment for Polyphonic Music Using An Audio-to-Character Recognition Model. In ICASSP 2019, IEEE.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2025:Lyrics_Transcription&amp;diff=14645</id>
		<title>2025:Lyrics Transcription</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2025:Lyrics_Transcription&amp;diff=14645"/>
		<updated>2025-05-29T05:34:42Z</updated>

		<summary type="html">&lt;p&gt;A43992899: Created page with &amp;quot;= Description =  This pages describes the '''MIREX2021: Automatic Lyrics Transcription''' challenge. For evaluation procedure and the submission format please scroll down the...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;= Description =&lt;br /&gt;
&lt;br /&gt;
This pages describes the '''MIREX2021: Automatic Lyrics Transcription''' challenge. For evaluation procedure and the submission format please scroll down the page. &lt;br /&gt;
&lt;br /&gt;
The task of Lyrics Transcription aims to identify the words from sung utterances, in the same way as in automatic speech recognition. This can be mathematically expressed as follows:&lt;br /&gt;
&lt;br /&gt;
  Prediction('''w''') = argmax P('''w'''|'''X''')&lt;br /&gt;
&lt;br /&gt;
where '''w''' and '''X''' are the word and acoustic features respectively.&lt;br /&gt;
&lt;br /&gt;
Ideally, the lyrics transcriber should return meaningful word sequences:&lt;br /&gt;
&lt;br /&gt;
  Prediction('''w''')  = [ &amp;lt;w_1&amp;gt;, &amp;lt;w_2&amp;gt;, ..., &amp;lt;w_N&amp;gt; ]&lt;br /&gt;
&lt;br /&gt;
The algorithm receives either monophonic singing performances or a polyphonic mix (singing voice + musical accompaniment). Both cases are evaluated separately in this challenge.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Evaluation =&lt;br /&gt;
&lt;br /&gt;
'''Word Error Rate''' (WER) : the standard metric use in Automatic Speech Recognition.&lt;br /&gt;
&lt;br /&gt;
  WER = (S + I + D) / (C + S + D)&lt;br /&gt;
&lt;br /&gt;
where;&lt;br /&gt;
 C : correctly predicted words&lt;br /&gt;
 S : substitution errors&lt;br /&gt;
 I : insertion errors&lt;br /&gt;
 D : deletion errors&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
'''Character Error Rate''' (CER) : the above computation can also be done on the character level. This metric penalises the partially correctly predicted / incorrectly spelled words less than WER.&lt;br /&gt;
&lt;br /&gt;
----&lt;br /&gt;
&lt;br /&gt;
IMPORTANT: The evaluation samples have few minutes of audio length. The submission is expected to be able to transcribe the entire recording. If your submission requires segmentation as a preprocessing step, this should already be implemented in your pipeline.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Submission Format =&lt;br /&gt;
&lt;br /&gt;
Submissions must be done through the MIREX system (info available [https://www.music-ir.org/mirex/wiki/2021:Main_Page#MIREX_2021_Submission_Instructions here]) and should be packaged in a compressed file (.zip or .rar, etc.) which contains at least two files:&lt;br /&gt;
&lt;br /&gt;
=== A) The main transcription script ===&lt;br /&gt;
&lt;br /&gt;
The main transcription script to execute. This should be a '''one-line executable''' in one of the following formats: a bash (.sh) a python (.py) script, or a binary file.&lt;br /&gt;
&lt;br /&gt;
===  I / O ===&lt;br /&gt;
&lt;br /&gt;
The submitted algorithm must take as arguments an audio file and the full output path to save the transcriptions. The ability to specify the output path and file name is essential.&lt;br /&gt;
&lt;br /&gt;
Denoting the input audio filename path as $[input_audio_path} and the output file path and name as ${output}, a program called `foobar' will be called from the command-line as follows:&lt;br /&gt;
&lt;br /&gt;
 foobar ${input_audio_path}  ${output}&lt;br /&gt;
&lt;br /&gt;
OR with flags:&lt;br /&gt;
&lt;br /&gt;
 foobar -i ${input_audio_path}  -o ${output}&lt;br /&gt;
&lt;br /&gt;
==== Input Audio ====&lt;br /&gt;
&lt;br /&gt;
Participating algorithms will have to receive the following input format:&lt;br /&gt;
&lt;br /&gt;
* Audio format : WAV / MP3&lt;br /&gt;
* CD-quality (PCM, 16-bit, 44100 Hz)&lt;br /&gt;
* single channel (mono) for a cappella (Hansen) and two channels for original&lt;br /&gt;
&lt;br /&gt;
==== Output File Format ====&lt;br /&gt;
&lt;br /&gt;
A text file (per song) containing list of words separated by white space:&lt;br /&gt;
&lt;br /&gt;
  &amp;lt;word_1&amp;gt; &amp;lt;word_2&amp;gt; ... &amp;lt;word_N&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Any non-word items (e.g. silence, music, noise or end of the sentence tokens) should be removed from the final output.&lt;br /&gt;
&lt;br /&gt;
Ideally, the output transcriptions will be saved as:&lt;br /&gt;
 &lt;br /&gt;
  ${output}/${input_song_id}.txt&lt;br /&gt;
&lt;br /&gt;
=== B) The README file ===&lt;br /&gt;
&lt;br /&gt;
This file must contain detailed installation instructions, the use of the main script and contact information.&lt;br /&gt;
&lt;br /&gt;
---- &lt;br /&gt;
&lt;br /&gt;
Any submission that is failed to meet above requirements will not be considered in evaluation!&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Training Datasets =&lt;br /&gt;
&lt;br /&gt;
Datasets within automatic lyrics transcription research can be categorised under two domains in regards to the presence of music instruments accompanying the singer: Monophonic and polyphonic datasets. &lt;br /&gt;
&lt;br /&gt;
The former is considered to have only one singer singing the lyrics, and the latter is when there is music accompaniment. &lt;br /&gt;
&lt;br /&gt;
In this challenge, the participants are encouraged but '''not obliged''' to use the open source datasets below, which are also commonly used in the literature for benchmarking ALT results:&lt;br /&gt;
&lt;br /&gt;
=== DAMP dataset ===&lt;br /&gt;
The [https://zenodo.org/record/2747436#.Xyge4xMzZ0s DAMP - Sing!300x30x2 dataset] consists of solo singing recordings (monophonic) performed by amateur singers, collected via a mobile Karaoke application. &lt;br /&gt;
&lt;br /&gt;
The data is curated to be gender-wise balanced and contains performers from 30 different countries, which provides a good amount of variability in terms of accents and pronunciation.  &lt;br /&gt;
[https://docs.google.com/spreadsheets/d/1YwhPhXU6t-BMZfdEODS_pNW_umFIsciYL62kh-fiBWI/edit?usp=sharing list of recordings]. For more details see the paper. &lt;br /&gt;
&lt;br /&gt;
* The audio can be downloaded from the [https://ccrma.stanford.edu/damp/ Smule web site]&lt;br /&gt;
* Lyrics boundary annotations can be generated from raw annotations using [https://github.com/groadabike/Kaldi-Dsing-task this repository]. Paper [https://isca-speech.org/archive/Interspeech_2019/pdfs/2378.pdf here (1)].&lt;br /&gt;
* Or annotations can be directly retrieved in the Kaldi form [https://github.com/emirdemirel/ALTA/s5/data here] Paper [https://arxiv.org/pdf/2007.06486.pdf here (2)].&lt;br /&gt;
&lt;br /&gt;
=== DALI Dataset ===&lt;br /&gt;
&lt;br /&gt;
DALI (a large '''D'''ataset of synchronised '''A'''udio, '''L'''yr'''I'''cs and notes) (3) is the benchmark dataset for building an acoustic model on polyphonic recordings (4,5,6) and it contains over 5000 songs with semi-automatically aligned lyrics annotations.&lt;br /&gt;
&lt;br /&gt;
The songs are commercial recordings in full-duration, whereas the lyrics are described according to different levels of granularity including words and notes (and syllables underlying a given note).&lt;br /&gt;
&lt;br /&gt;
For each song DALI provides a link to a matched youtube video for the audio retrieval.&lt;br /&gt;
&lt;br /&gt;
* For more details how, see its full description [https://github.com/gabolsgabs/DALI here]. Paper [https://arxiv.org/pdf/1906.10606.pdf here].&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Evaluation Datasets =&lt;br /&gt;
&lt;br /&gt;
The following datasets are used for evaluation and so '''cannot''' be used by participants to train their models under any circumstance. &lt;br /&gt;
&lt;br /&gt;
Note that the evaluation sets listed below consist of popular songs in English language, and have overlapping samples with DALI. &lt;br /&gt;
&lt;br /&gt;
'''*** IMPORTANT ***'''    In case using DALI for training, you '''MUST''' exclude [https://www.music-ir.org/mirex/wiki/2020:Lyrics_Transcription_Results the songs used for MIREX evaluation] during training your model in order to make a scientific evaluation possible. &lt;br /&gt;
&lt;br /&gt;
=== Hansen's Dataset ===&lt;br /&gt;
The dataset contains 9 pop music songs released in early 2010s.&lt;br /&gt;
&lt;br /&gt;
The audio has two versions: the original mix with instrumental accompaniment and a cappella singing voice only one. An example song can be seen [https://www.dropbox.com/sh/wm6k4dqrww0fket/AAC1o1uRFxBPg9iAeSAd1Wxta?dl=0 here].&lt;br /&gt;
&lt;br /&gt;
You can read in detail about how the dataset was made here: [http://publica.fraunhofer.de/documents/N-345612.html (7)]. The recordings have been provided by Jens Kofod Hansen for public evaluation.&lt;br /&gt;
&lt;br /&gt;
* file duration up to 4:40 minutes (total time: 35:33 minutes)&lt;br /&gt;
* 3590 words annotated in total&lt;br /&gt;
&lt;br /&gt;
=== Mauch's Dataset ===&lt;br /&gt;
&lt;br /&gt;
The dataset contains 20 pop music songs with annotations of beginning-timestamps of each word.&lt;br /&gt;
The audio has instrumental accompaniment. An example song can be seen [https://www.dropbox.com/sh/8pp4u2xg93z36d4/AAAsCE2eYW68gxRhKiPH_VvFa?dl=0 here].&lt;br /&gt;
&lt;br /&gt;
You can read in detail about how the dataset was used for the first time here: [https://pdfs.semanticscholar.org/547d/7a5d105380562ca3543bf05b4d5f7a8bee66.pdf (8)] . The dataset has been provided by Sungkyun Chang.&lt;br /&gt;
&lt;br /&gt;
* file duration up to 5:40 minutes (total time: 1h 19m)&lt;br /&gt;
* 5050 words annotated in total&lt;br /&gt;
&lt;br /&gt;
=== Jamendo Dataset ===&lt;br /&gt;
&lt;br /&gt;
This dataset contains 20 recordings with varying Western music genres, annotated with start-of-word timestamps. All songs have instrumental accompaniment.&lt;br /&gt;
&lt;br /&gt;
It is available online on [https://github.com/f90/jamendolyrics Github], although note that we do not allow tuning model parameters using this data, it can only be used to gain insight into the general structure of the test data. For more information also refer to [https://arxiv.org/abs/1902.06797 this paper (9)].&lt;br /&gt;
&lt;br /&gt;
* file duration up to 4:43 (total time: 1h 12m)&lt;br /&gt;
* 5677 words annotated in total&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Time and hardware limits =&lt;br /&gt;
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.&lt;br /&gt;
A hard limit of 24 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result. In addition, submission that are not able to run with the provided RAM and CPU instructions provided by you may not receive a result.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Questions? =&lt;br /&gt;
&lt;br /&gt;
* send us an email - ruibiny@alumni.cmu.edu (Ruibin Yuan), at2jjy@gmail.com (Junyan Jiang)&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
= Bibliography =&lt;br /&gt;
&lt;br /&gt;
1 - G.R., Barker, J. (2019) Automatic Lyric Transcription from Karaoke Vocal Tracks: Resources and a Baseline System. Proc. Interspeech 2019, 579-583, doi: 10.21437/Interspeech.2019-2378&lt;br /&gt;
&lt;br /&gt;
2 - Demirel, E., Ahlbäck, S., &amp;amp; Dixon, S. (2020). Automatic Lyrics Transcription using Dilated Convolutional Neural Networks with Self-Attention. In IJCNN 2020, 1-8. IEEE.&lt;br /&gt;
&lt;br /&gt;
3 - Meseguer-Brocal, G., Cohen-Hadria, A., &amp;amp; Peeters, G. (2019). DALI: A large dataset of synchronized audio, lyrics and notes, automatically created using teacher-student machine learning paradigm. In ISMIR 2018.&lt;br /&gt;
&lt;br /&gt;
4 - Gupta, C., Yılmaz, E., &amp;amp; Li, H. (2020). Automatic lyrics alignment and transcription in polyphonic music: Does background music help?. In ICASSP 2020, 496-500. IEEE.&lt;br /&gt;
&lt;br /&gt;
5 - Basak, S., Agarwal, S., Ganapathy, S., &amp;amp; Takahashi, N. (2021, June). End-to-End Lyrics Recognition with Voice to Singing Style Transfer. In ICASSP 2021, 266-270. IEEE.&lt;br /&gt;
&lt;br /&gt;
6- Demirel, E., Ahlbäck, S., &amp;amp; Dixon, S. (2021). MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription. Proc. ISMIR 2021.&lt;br /&gt;
&lt;br /&gt;
7 - Hansen, J. K., &amp;amp; Fraunhofer, I. D. M. T. (2012). Recognition of phonemes in a-cappella recordings using temporal patterns and mel frequency cepstral coefficients. In 9th Sound and Music Computing Conference (SMC), 494-499.&lt;br /&gt;
&lt;br /&gt;
8 - Mauch, M., Fujihara, H., &amp;amp; Goto, M. (2012). Integrating additional chord information into HMM-based lyrics-to-audio alignment. ICASSP 2012, 200-210, IEEE.&lt;br /&gt;
&lt;br /&gt;
9 - Stoller, D. and Durand, S. and Ewert, S. (2019) End-to-end Lyrics Alignment for Polyphonic Music Using An Audio-to-Character Recognition Model. In ICASSP 2019, IEEE.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13929</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13929"/>
		<updated>2024-10-12T03:31:50Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Rules */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
For training, any non-test-set data from the open-source world can be used.&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexGen2024, will serve as this task's evaluation benchmark. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MirexGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MirexGen2024 dataset is only used for testing.&lt;br /&gt;
For training, any non-test-set data from the open-source world can be used.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''MusicGen'''&lt;br /&gt;
&lt;br /&gt;
MusicGen, developed by Meta, is a single-stage transformer-based Language Model (LM) designed for conditional music generation. It operates over multiple streams of compressed discrete music tokens, eliminating the need for multi-stage models like hierarchical or upsampling methods. MusicGen efficiently generates high-quality mono and stereo music samples conditioned on text descriptions or melodic features, providing enhanced control over the output. Extensive evaluations, including both automatic and human studies, demonstrate that MusicGen outperforms baseline models in text-to-music generation benchmarks. Ablation studies further highlight the significance of its key components.&lt;br /&gt;
&lt;br /&gt;
(MusicGen-large)[https://huggingface.co/facebook/musicgen-large] and (MusicGen-medium)[https://huggingface.co/facebook/musicgen-medium] will be used as baselines.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''CLAP-Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
We do not provide the download of the dataset for now.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, Participants should not use any test-split data from any open-source dataset for training or validation.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13928</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13928"/>
		<updated>2024-10-12T03:29:02Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Metrics */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
For training, any non-test-set data from the open-source world can be used.&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexGen2024, will serve as this task's evaluation benchmark. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MirexGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MirexGen2024 dataset is only used for testing.&lt;br /&gt;
For training, any non-test-set data from the open-source world can be used.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''MusicGen'''&lt;br /&gt;
&lt;br /&gt;
MusicGen, developed by Meta, is a single-stage transformer-based Language Model (LM) designed for conditional music generation. It operates over multiple streams of compressed discrete music tokens, eliminating the need for multi-stage models like hierarchical or upsampling methods. MusicGen efficiently generates high-quality mono and stereo music samples conditioned on text descriptions or melodic features, providing enhanced control over the output. Extensive evaluations, including both automatic and human studies, demonstrate that MusicGen outperforms baseline models in text-to-music generation benchmarks. Ablation studies further highlight the significance of its key components.&lt;br /&gt;
&lt;br /&gt;
(MusicGen-large)[https://huggingface.co/facebook/musicgen-large] and (MusicGen-medium)[https://huggingface.co/facebook/musicgen-medium] will be used as baselines.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''CLAP-Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
We do not provide the download of the dataset for now.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13927</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13927"/>
		<updated>2024-10-12T03:27:32Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Download */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
For training, any non-test-set data from the open-source world can be used.&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexGen2024, will serve as this task's evaluation benchmark. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MirexGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MirexGen2024 dataset is only used for testing.&lt;br /&gt;
For training, any non-test-set data from the open-source world can be used.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''MusicGen'''&lt;br /&gt;
&lt;br /&gt;
MusicGen, developed by Meta, is a single-stage transformer-based Language Model (LM) designed for conditional music generation. It operates over multiple streams of compressed discrete music tokens, eliminating the need for multi-stage models like hierarchical or upsampling methods. MusicGen efficiently generates high-quality mono and stereo music samples conditioned on text descriptions or melodic features, providing enhanced control over the output. Extensive evaluations, including both automatic and human studies, demonstrate that MusicGen outperforms baseline models in text-to-music generation benchmarks. Ablation studies further highlight the significance of its key components.&lt;br /&gt;
&lt;br /&gt;
(MusicGen-large)[https://huggingface.co/facebook/musicgen-large] and (MusicGen-medium)[https://huggingface.co/facebook/musicgen-medium] will be used as baselines.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
We do not provide the download of the dataset for now.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13926</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13926"/>
		<updated>2024-10-12T03:26:23Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description of Split */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
For training, any non-test-set data from the open-source world can be used.&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexGen2024, will serve as this task's evaluation benchmark. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MirexGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MirexGen2024 dataset is only used for testing.&lt;br /&gt;
For training, any non-test-set data from the open-source world can be used.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''MusicGen'''&lt;br /&gt;
&lt;br /&gt;
MusicGen, developed by Meta, is a single-stage transformer-based Language Model (LM) designed for conditional music generation. It operates over multiple streams of compressed discrete music tokens, eliminating the need for multi-stage models like hierarchical or upsampling methods. MusicGen efficiently generates high-quality mono and stereo music samples conditioned on text descriptions or melodic features, providing enhanced control over the output. Extensive evaluations, including both automatic and human studies, demonstrate that MusicGen outperforms baseline models in text-to-music generation benchmarks. Ablation studies further highlight the significance of its key components.&lt;br /&gt;
&lt;br /&gt;
(MusicGen-large)[https://huggingface.co/facebook/musicgen-large] and (MusicGen-medium)[https://huggingface.co/facebook/musicgen-medium] will be used as baselines.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13925</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13925"/>
		<updated>2024-10-12T03:25:11Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
For training, any non-test-set data from the open-source world can be used.&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexGen2024, will serve as this task's evaluation benchmark. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MirexGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MirexGen2024 dataset is only used for testing.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''MusicGen'''&lt;br /&gt;
&lt;br /&gt;
MusicGen, developed by Meta, is a single-stage transformer-based Language Model (LM) designed for conditional music generation. It operates over multiple streams of compressed discrete music tokens, eliminating the need for multi-stage models like hierarchical or upsampling methods. MusicGen efficiently generates high-quality mono and stereo music samples conditioned on text descriptions or melodic features, providing enhanced control over the output. Extensive evaluations, including both automatic and human studies, demonstrate that MusicGen outperforms baseline models in text-to-music generation benchmarks. Ablation studies further highlight the significance of its key components.&lt;br /&gt;
&lt;br /&gt;
(MusicGen-large)[https://huggingface.co/facebook/musicgen-large] and (MusicGen-medium)[https://huggingface.co/facebook/musicgen-medium] will be used as baselines.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13924</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13924"/>
		<updated>2024-10-12T03:25:01Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Dataset */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
For training, any non-test-set data from the open-source world can be used.&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexGen2024, will serve as this task's evaluation benchmark. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MirexGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MirexGen2024 dataset is only used for testing.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''MusicGen'''&lt;br /&gt;
&lt;br /&gt;
MusicGen, developed by Meta, is a single-stage transformer-based Language Model (LM) designed for conditional music generation. It operates over multiple streams of compressed discrete music tokens, eliminating the need for multi-stage models like hierarchical or upsampling methods. MusicGen efficiently generates high-quality mono and stereo music samples conditioned on text descriptions or melodic features, providing enhanced control over the output. Extensive evaluations, including both automatic and human studies, demonstrate that MusicGen outperforms baseline models in text-to-music generation benchmarks. Ablation studies further highlight the significance of its key components.&lt;br /&gt;
&lt;br /&gt;
(MusicGen-large)[https://huggingface.co/facebook/musicgen-large] and (MusicGen-medium)[https://huggingface.co/facebook/musicgen-medium] will be used as baselines.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13923</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13923"/>
		<updated>2024-10-12T03:20:17Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Baseline */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexGen2024 dataset will serve as the evaluation benchmark for this task. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
For training, any data can be used.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MirexGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MirexGen2024 dataset is only used for testing.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''MusicGen'''&lt;br /&gt;
&lt;br /&gt;
MusicGen, developed by Meta, is a single-stage transformer-based Language Model (LM) designed for conditional music generation. It operates over multiple streams of compressed discrete music tokens, eliminating the need for multi-stage models like hierarchical or upsampling methods. MusicGen efficiently generates high-quality mono and stereo music samples conditioned on text descriptions or melodic features, providing enhanced control over the output. Extensive evaluations, including both automatic and human studies, demonstrate that MusicGen outperforms baseline models in text-to-music generation benchmarks. Ablation studies further highlight the significance of its key components.&lt;br /&gt;
&lt;br /&gt;
(MusicGen-large)[https://huggingface.co/facebook/musicgen-large] and (MusicGen-medium)[https://huggingface.co/facebook/musicgen-medium] will be used as baselines.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13922</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13922"/>
		<updated>2024-10-12T03:19:44Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description of Split */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexGen2024 dataset will serve as the evaluation benchmark for this task. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
For training, any data can be used.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MirexGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MirexGen2024 dataset is only used for testing.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''MusicGen'''&lt;br /&gt;
&lt;br /&gt;
MusicGen, developed by Meta, is a single-stage transformer-based Language Model (LM) designed for conditional music generation. It operates over multiple streams of compressed discrete music tokens, eliminating the need for multi-stage models like hierarchical or upsampling methods. MusicGen efficiently generates high-quality mono and stereo music samples conditioned on text descriptions or melodic features, providing enhanced control over the output. Extensive evaluations, including both automatic and human studies, demonstrate that MusicGen outperforms baseline models in text-to-music generation benchmarks. Ablation studies further highlight the significance of its key components.&lt;br /&gt;
&lt;br /&gt;
MusicGen-large and MusicGen-medium will be used as baselines.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13921</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13921"/>
		<updated>2024-10-12T03:19:19Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description of Audio Files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexGen2024 dataset will serve as the evaluation benchmark for this task. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
For training, any data can be used.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MirexGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset is only used for testing.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''MusicGen'''&lt;br /&gt;
&lt;br /&gt;
MusicGen, developed by Meta, is a single-stage transformer-based Language Model (LM) designed for conditional music generation. It operates over multiple streams of compressed discrete music tokens, eliminating the need for multi-stage models like hierarchical or upsampling methods. MusicGen efficiently generates high-quality mono and stereo music samples conditioned on text descriptions or melodic features, providing enhanced control over the output. Extensive evaluations, including both automatic and human studies, demonstrate that MusicGen outperforms baseline models in text-to-music generation benchmarks. Ablation studies further highlight the significance of its key components.&lt;br /&gt;
&lt;br /&gt;
MusicGen-large and MusicGen-medium will be used as baselines.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13920</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13920"/>
		<updated>2024-10-12T03:19:04Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexGen2024 dataset will serve as the evaluation benchmark for this task. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
For training, any data can be used.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MirexMusicGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset is only used for testing.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''MusicGen'''&lt;br /&gt;
&lt;br /&gt;
MusicGen, developed by Meta, is a single-stage transformer-based Language Model (LM) designed for conditional music generation. It operates over multiple streams of compressed discrete music tokens, eliminating the need for multi-stage models like hierarchical or upsampling methods. MusicGen efficiently generates high-quality mono and stereo music samples conditioned on text descriptions or melodic features, providing enhanced control over the output. Extensive evaluations, including both automatic and human studies, demonstrate that MusicGen outperforms baseline models in text-to-music generation benchmarks. Ablation studies further highlight the significance of its key components.&lt;br /&gt;
&lt;br /&gt;
MusicGen-large and MusicGen-medium will be used as baselines.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13919</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13919"/>
		<updated>2024-10-12T03:17:42Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Baseline */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexMusicGen2024 dataset will serve as the evaluation benchmark for this task. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
For training, any data can be used.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MirexMusicGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset is only used for testing.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''MusicGen'''&lt;br /&gt;
&lt;br /&gt;
MusicGen, developed by Meta, is a single-stage transformer-based Language Model (LM) designed for conditional music generation. It operates over multiple streams of compressed discrete music tokens, eliminating the need for multi-stage models like hierarchical or upsampling methods. MusicGen efficiently generates high-quality mono and stereo music samples conditioned on text descriptions or melodic features, providing enhanced control over the output. Extensive evaluations, including both automatic and human studies, demonstrate that MusicGen outperforms baseline models in text-to-music generation benchmarks. Ablation studies further highlight the significance of its key components.&lt;br /&gt;
&lt;br /&gt;
MusicGen-large and MusicGen-medium will be used as baselines.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13918</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13918"/>
		<updated>2024-10-12T03:10:38Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description of Audio Files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexMusicGen2024 dataset will serve as the evaluation benchmark for this task. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
For training, any data can be used.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MirexMusicGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset is only used for testing.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''Gen-MusicTransformer: Model Architecture'''&lt;br /&gt;
&lt;br /&gt;
Gen-MusicTransformer employs a transformer-based architecture tailored for music generation tasks. The model is designed to handle sequential data, making it well-suited for generating coherent and contextually rich music clips.&lt;br /&gt;
&lt;br /&gt;
* '''Encoder''': The encoder processes the input textual prompt, transforming it into a series of embeddings that capture the key aspects of the prompt, such as mood, genre, and instrumentation.&lt;br /&gt;
* '''Decoder''': The decoder is responsible for generating the music audio. It utilizes a series of transformer blocks to predict the next audio feature based on the previous context, producing a continuous stream of audio data. The model generates log-mel spectrograms, which are subsequently converted into audio waveforms using a vocoder.&lt;br /&gt;
* '''Conditioning''': The model can be conditioned on additional inputs, such as specific musical motifs or rhythms, allowing for more controlled generation outputs.&lt;br /&gt;
&lt;br /&gt;
Gen-MusicTransformer is pre-trained on a large corpus of music data and fine-tuned on the MusicGen2024 dataset to optimize its performance on the specific task of prompt-based music generation.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13917</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13917"/>
		<updated>2024-10-12T03:10:26Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MirexMusicGen2024 dataset will serve as the evaluation benchmark for this task. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
For training, any data can be used.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MusicGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset is only used for testing.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''Gen-MusicTransformer: Model Architecture'''&lt;br /&gt;
&lt;br /&gt;
Gen-MusicTransformer employs a transformer-based architecture tailored for music generation tasks. The model is designed to handle sequential data, making it well-suited for generating coherent and contextually rich music clips.&lt;br /&gt;
&lt;br /&gt;
* '''Encoder''': The encoder processes the input textual prompt, transforming it into a series of embeddings that capture the key aspects of the prompt, such as mood, genre, and instrumentation.&lt;br /&gt;
* '''Decoder''': The decoder is responsible for generating the music audio. It utilizes a series of transformer blocks to predict the next audio feature based on the previous context, producing a continuous stream of audio data. The model generates log-mel spectrograms, which are subsequently converted into audio waveforms using a vocoder.&lt;br /&gt;
* '''Conditioning''': The model can be conditioned on additional inputs, such as specific musical motifs or rhythms, allowing for more controlled generation outputs.&lt;br /&gt;
&lt;br /&gt;
Gen-MusicTransformer is pre-trained on a large corpus of music data and fine-tuned on the MusicGen2024 dataset to optimize its performance on the specific task of prompt-based music generation.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13916</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13916"/>
		<updated>2024-10-12T03:10:02Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description of Split */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MusicGen2024 dataset will serve as the evaluation benchmark for this task. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
For training, any data can be used.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MusicGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset is only used for testing.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''Gen-MusicTransformer: Model Architecture'''&lt;br /&gt;
&lt;br /&gt;
Gen-MusicTransformer employs a transformer-based architecture tailored for music generation tasks. The model is designed to handle sequential data, making it well-suited for generating coherent and contextually rich music clips.&lt;br /&gt;
&lt;br /&gt;
* '''Encoder''': The encoder processes the input textual prompt, transforming it into a series of embeddings that capture the key aspects of the prompt, such as mood, genre, and instrumentation.&lt;br /&gt;
* '''Decoder''': The decoder is responsible for generating the music audio. It utilizes a series of transformer blocks to predict the next audio feature based on the previous context, producing a continuous stream of audio data. The model generates log-mel spectrograms, which are subsequently converted into audio waveforms using a vocoder.&lt;br /&gt;
* '''Conditioning''': The model can be conditioned on additional inputs, such as specific musical motifs or rhythms, allowing for more controlled generation outputs.&lt;br /&gt;
&lt;br /&gt;
Gen-MusicTransformer is pre-trained on a large corpus of music data and fine-tuned on the MusicGen2024 dataset to optimize its performance on the specific task of prompt-based music generation.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13915</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13915"/>
		<updated>2024-10-12T03:09:17Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description of Audio Files */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MusicGen2024 dataset will serve as the evaluation benchmark for this task. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
For training, any data can be used.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MusicGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds).&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset is divided into training, validation, and evaluation subsets. Participants must not use the evaluation subset for training or validation purposes to ensure a fair and unbiased assessment of model performance. The dataset split ensures a diverse representation of musical styles and genres in both the training and evaluation phases.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''Gen-MusicTransformer: Model Architecture'''&lt;br /&gt;
&lt;br /&gt;
Gen-MusicTransformer employs a transformer-based architecture tailored for music generation tasks. The model is designed to handle sequential data, making it well-suited for generating coherent and contextually rich music clips.&lt;br /&gt;
&lt;br /&gt;
* '''Encoder''': The encoder processes the input textual prompt, transforming it into a series of embeddings that capture the key aspects of the prompt, such as mood, genre, and instrumentation.&lt;br /&gt;
* '''Decoder''': The decoder is responsible for generating the music audio. It utilizes a series of transformer blocks to predict the next audio feature based on the previous context, producing a continuous stream of audio data. The model generates log-mel spectrograms, which are subsequently converted into audio waveforms using a vocoder.&lt;br /&gt;
* '''Conditioning''': The model can be conditioned on additional inputs, such as specific musical motifs or rhythms, allowing for more controlled generation outputs.&lt;br /&gt;
&lt;br /&gt;
Gen-MusicTransformer is pre-trained on a large corpus of music data and fine-tuned on the MusicGen2024 dataset to optimize its performance on the specific task of prompt-based music generation.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13914</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13914"/>
		<updated>2024-10-12T03:07:45Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;== Task Description ==&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
== Dataset ==&lt;br /&gt;
&lt;br /&gt;
=== Description ===&lt;br /&gt;
&lt;br /&gt;
An in-house music generation dataset, MusicGen2024 dataset will serve as the evaluation benchmark for this task. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Clips''': A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
* '''Textual Prompts''': Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
For training, any data can be used.&lt;br /&gt;
&lt;br /&gt;
=== Description of Audio Files ===&lt;br /&gt;
&lt;br /&gt;
The audio files in the MusicGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds) to facilitate manageable training and evaluation cycles.&lt;br /&gt;
&lt;br /&gt;
=== Description of Text ===&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
=== Description of Split ===&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset is divided into training, validation, and evaluation subsets. Participants must not use the evaluation subset for training or validation purposes to ensure a fair and unbiased assessment of model performance. The dataset split ensures a diverse representation of musical styles and genres in both the training and evaluation phases.&lt;br /&gt;
&lt;br /&gt;
== Baseline ==&lt;br /&gt;
&lt;br /&gt;
'''Gen-MusicTransformer: Model Architecture'''&lt;br /&gt;
&lt;br /&gt;
Gen-MusicTransformer employs a transformer-based architecture tailored for music generation tasks. The model is designed to handle sequential data, making it well-suited for generating coherent and contextually rich music clips.&lt;br /&gt;
&lt;br /&gt;
* '''Encoder''': The encoder processes the input textual prompt, transforming it into a series of embeddings that capture the key aspects of the prompt, such as mood, genre, and instrumentation.&lt;br /&gt;
* '''Decoder''': The decoder is responsible for generating the music audio. It utilizes a series of transformer blocks to predict the next audio feature based on the previous context, producing a continuous stream of audio data. The model generates log-mel spectrograms, which are subsequently converted into audio waveforms using a vocoder.&lt;br /&gt;
* '''Conditioning''': The model can be conditioned on additional inputs, such as specific musical motifs or rhythms, allowing for more controlled generation outputs.&lt;br /&gt;
&lt;br /&gt;
Gen-MusicTransformer is pre-trained on a large corpus of music data and fine-tuned on the MusicGen2024 dataset to optimize its performance on the specific task of prompt-based music generation.&lt;br /&gt;
&lt;br /&gt;
== Metrics ==&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
* '''MOS (Mean Opinion Score)''': A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
* '''Inception Score (IS)''': An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
* '''FAD (Fréchet Audio Distance)''': Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
* '''Prompt Adherence Score''': A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
== Download ==&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
== Rules ==&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
== Submission ==&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using [https://www.codabench.org/ CodaBench] for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
* '''Audio Files''': A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
* '''PDF File''': A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13676</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13676"/>
		<updated>2024-08-26T06:05:24Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;### Task Description&lt;br /&gt;
&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
&lt;br /&gt;
### Dataset&lt;br /&gt;
&lt;br /&gt;
#### Description&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset will serve as the benchmark for this task. This dataset is specially curated to facilitate the generation of music in response to specific prompts. It includes:&lt;br /&gt;
&lt;br /&gt;
- **Audio Clips**: A collection of diverse music clips across various genres, ranging from classical to electronic music, to help in training and evaluation.&lt;br /&gt;
- **Textual Prompts**: Detailed prompts associated with each music clip, describing the desired musical characteristics such as mood, genre, instrumentation, and tempo.&lt;br /&gt;
&lt;br /&gt;
The dataset is designed to support both the training of generative models and the evaluation of their outputs.&lt;br /&gt;
&lt;br /&gt;
#### Description of Audio Files&lt;br /&gt;
&lt;br /&gt;
The audio files in the MusicGen2024 dataset are selected to represent a broad spectrum of musical genres and styles. Each clip is provided in a high-quality format, ensuring that the nuances of musical elements are preserved. The dataset includes clips of varying lengths, with a focus on short to medium-length excerpts (10 to 30 seconds) to facilitate manageable training and evaluation cycles.&lt;br /&gt;
&lt;br /&gt;
#### Description of Text&lt;br /&gt;
&lt;br /&gt;
The textual prompts provided in the dataset are carefully crafted to guide the generation process. These prompts include specific instructions regarding the desired genre, mood, instrumentation, and other musical characteristics. They are designed to challenge the generative models to produce music that is not only coherent but also closely aligned with the given descriptions.&lt;br /&gt;
&lt;br /&gt;
#### Description of Split&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset is divided into training, validation, and evaluation subsets. Participants must not use the evaluation subset for training or validation purposes to ensure a fair and unbiased assessment of model performance. The dataset split ensures a diverse representation of musical styles and genres in both the training and evaluation phases.&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
&lt;br /&gt;
### Baseline&lt;br /&gt;
&lt;br /&gt;
**Gen-MusicTransformer: Model Architecture**&lt;br /&gt;
&lt;br /&gt;
Gen-MusicTransformer employs a transformer-based architecture tailored for music generation tasks. The model is designed to handle sequential data, making it well-suited for generating coherent and contextually rich music clips.&lt;br /&gt;
&lt;br /&gt;
- **Encoder**: The encoder processes the input textual prompt, transforming it into a series of embeddings that capture the key aspects of the prompt, such as mood, genre, and instrumentation.&lt;br /&gt;
- **Decoder**: The decoder is responsible for generating the music audio. It utilizes a series of transformer blocks to predict the next audio feature based on the previous context, producing a continuous stream of audio data. The model generates log-mel spectrograms, which are subsequently converted into audio waveforms using a vocoder.&lt;br /&gt;
- **Conditioning**: The model can be conditioned on additional inputs, such as specific musical motifs or rhythms, allowing for more controlled generation outputs.&lt;br /&gt;
&lt;br /&gt;
Gen-MusicTransformer is pre-trained on a large corpus of music data and fine-tuned on the MusicGen2024 dataset to optimize its performance on the specific task of prompt-based music generation.&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
&lt;br /&gt;
### Metrics&lt;br /&gt;
&lt;br /&gt;
The evaluation of the generated music will be based on a combination of objective and subjective metrics:&lt;br /&gt;
&lt;br /&gt;
- **MOS (Mean Opinion Score)**: A subjective evaluation metric where human listeners rate the overall quality and aesthetic appeal of the generated music.&lt;br /&gt;
- **Inception Score (IS)**: An objective metric that evaluates the diversity and quality of the generated music, based on a pre-trained music classification model.&lt;br /&gt;
- **FAD (Fréchet Audio Distance)**: Measures the similarity between the distribution of generated music and real music, capturing both quality and diversity.&lt;br /&gt;
- **Prompt Adherence Score**: A metric designed to assess how well the generated music aligns with the provided textual prompts.&lt;br /&gt;
&lt;br /&gt;
Each metric will contribute to the final ranking, with MOS and Prompt Adherence Score being given the highest weight.&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
&lt;br /&gt;
### Download&lt;br /&gt;
&lt;br /&gt;
The MusicGen2024 dataset, including both the audio clips and corresponding textual prompts, will be made available for download. Participants can access the dataset via a link that will be posted here.&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
&lt;br /&gt;
### Rules&lt;br /&gt;
&lt;br /&gt;
Participants are allowed to utilize external datasets and pre-trained models to develop their systems. However, the use of the MusicGen2024 evaluation split for training or validation is strictly prohibited. Participants must ensure that their submissions are original and do not overlap with the evaluation data.&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
&lt;br /&gt;
### Submission&lt;br /&gt;
&lt;br /&gt;
Submissions will be evaluated using CodaBench (https://www.codabench.org/) for automated assessment.&lt;br /&gt;
&lt;br /&gt;
Participants are required to submit the following:&lt;br /&gt;
&lt;br /&gt;
- **Audio Files**: A set of generated music clips corresponding to the prompts in the evaluation dataset.&lt;br /&gt;
- **PDF File**: A detailed report describing the system architecture, training process, and any external data or models used.&lt;br /&gt;
&lt;br /&gt;
Each participant or team may submit up to three versions of their system. The final ranking will be based on the metrics outlined above.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13675</id>
		<title>2024:Music Audio Generation</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Music_Audio_Generation&amp;diff=13675"/>
		<updated>2024-08-26T04:31:29Z</updated>

		<summary type="html">&lt;p&gt;A43992899: Created page with &amp;quot;=Description= The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task ai...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;=Description=&lt;br /&gt;
The MIREX 2024 Music Audio Generation Task challenges participants to develop models capable of generating high-quality, original music audio clips. This task aims to advance the state-of-the-art in music generation by encouraging the creation of systems that can produce coherent, aesthetically pleasing, and musically diverse outputs across various genres and styles.&lt;br /&gt;
Participants will be required to generate music clips based on textual prompts or other conditioning information provided in the dataset. The generated audio will be evaluated based on its musical quality, creativity, adherence to the provided prompt, and overall listenability.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Cover_Song_Identification&amp;diff=13670</id>
		<title>2024:Cover Song Identification</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Cover_Song_Identification&amp;diff=13670"/>
		<updated>2024-08-26T03:48:26Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Packaging submissions */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Description==&lt;br /&gt;
This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or &amp;quot;cover songs&amp;quot;.&lt;br /&gt;
 &lt;br /&gt;
Within the a collection of pieces in the cover song datasets, there are embedded a number of different &amp;quot;original songs&amp;quot; or compositions each represented by a number of different &amp;quot;versions&amp;quot;. The &amp;quot;cover songs&amp;quot; or &amp;quot;versions&amp;quot; represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations. &lt;br /&gt;
&lt;br /&gt;
Using each of these version files in turn as as the &amp;quot;seed/query&amp;quot; file, we examine the returned ranked lists of items from each algorithm for the presence of the other versions of the &amp;quot;seed/query&amp;quot; file.&lt;br /&gt;
&lt;br /&gt;
Two datasets are used in this task, the MIREX 2006 US Pop Music Cover Song dataset Audio Cover Song dataset the [http://www.mazurka.org.uk/ Mazurka dataset]. &lt;br /&gt;
&lt;br /&gt;
=== Task specific mailing list ===&lt;br /&gt;
In the past we have use a specific mailing list for the discussion of this task and related tasks. This year, however, we are asking that all discussions take place on the MIREX  [https://mail.lis.illinois.edu/mailman/listinfo/evalfest &amp;quot;EvalFest&amp;quot; list]. If you have an question or comment, simply include the task name in the subject heading.&lt;br /&gt;
&lt;br /&gt;
== Data ==&lt;br /&gt;
Two datasets will be used to evaluate cover song identification:&lt;br /&gt;
&lt;br /&gt;
===US Pop Music Collection Cover Song (aka Mixed Collection)===&lt;br /&gt;
This is the &amp;quot;original&amp;quot; ACS collection. Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different &amp;quot;cover songs&amp;quot; each represented by 11 different &amp;quot;versions&amp;quot; for a total of 330 audio files. &lt;br /&gt;
&lt;br /&gt;
Using each of these cover song files in turn as as the &amp;quot;seed/query&amp;quot; file, we will examine the returned lists of items for the presence of the other 10 versions of the &amp;quot;seed/query&amp;quot; file.&lt;br /&gt;
&lt;br /&gt;
Collection statistics:&lt;br /&gt;
* 16bit, monophonic, 22.05khz, wav&lt;br /&gt;
* The &amp;quot;cover songs&amp;quot; represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations. &lt;br /&gt;
* Size: 1000 tracks&lt;br /&gt;
* Queries: 330 tracks&lt;br /&gt;
&lt;br /&gt;
=== Sapp's Mazurka Collection Information ===&lt;br /&gt;
In addition to our original ACS dataset, we used the  [http://www.mazurka.org.uk/ Mazurka.org dataset] put together by Craig Sapp. We randomly chose 11 versions from 49 mazurkas and ran it as a separate ACS subtask. Systems should return a distance matrix of 539x539 from which we located the ranks of each of the associated cover versions.&lt;br /&gt;
&lt;br /&gt;
Collection statistics:&lt;br /&gt;
* 16bit, monophonic, 22.05khz, wav&lt;br /&gt;
* Size: 539 tracks&lt;br /&gt;
* Queries: 539 tracks&lt;br /&gt;
&lt;br /&gt;
== Evaluation ==&lt;br /&gt;
The following evaluation metrics will be computed for each submission:&lt;br /&gt;
* Total number of covers identified in top 10&lt;br /&gt;
* Mean number of covers identified in top 10 (average performance)&lt;br /&gt;
* Mean (arithmetic) of Avg. Precisions&lt;br /&gt;
* Mean rank of first correctly identified cover&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Ranking and significance testing ===&lt;br /&gt;
Friedman's ANOVA with Tukey-Kramer HSD will be run against the Average Precision summary data over the individual song groups to assess the significance of differences in performance and to rank the performances.&lt;br /&gt;
&lt;br /&gt;
For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see:&lt;br /&gt;
 @InProceedings{jones2007hsj,&lt;br /&gt;
   title={&amp;quot;Human Similarity Judgements: Implications for the Design of Formal Evaluations&amp;quot;},&lt;br /&gt;
   author=&amp;quot;M.C. Jones and J.S. Downie and A.F. Ehmann&amp;quot;,&lt;br /&gt;
   BOOKTITLE =&amp;quot;Proceedings of ISMIR  2007 International Society of Music Information Retrieval&amp;quot;, &lt;br /&gt;
   year=&amp;quot;2007&amp;quot;&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Runtime performance ===&lt;br /&gt;
In addition computation times for feature extraction and training/classification will be measured.&lt;br /&gt;
&lt;br /&gt;
== Submission Format ==&lt;br /&gt;
Submission to this task will have to conform to a specified format detailed below.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Implementation details ===&lt;br /&gt;
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.&lt;br /&gt;
&lt;br /&gt;
The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.&lt;br /&gt;
&lt;br /&gt;
Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== I/O formats ===&lt;br /&gt;
=== Input Files ===&lt;br /&gt;
&lt;br /&gt;
The feature extraction list file format will be of the form: &lt;br /&gt;
&lt;br /&gt;
 /path/to/audio/file/000.wav\n&lt;br /&gt;
 /path/to/audio/file/001.wav\n&lt;br /&gt;
 /path/to/audio/file/002.wav\n&lt;br /&gt;
 ... &lt;br /&gt;
&lt;br /&gt;
The query list file format will be very similar, taking the form, and listing a subset of files from the feature extraction list file: &lt;br /&gt;
&lt;br /&gt;
 /path/to/audio/file/182.wav\n&lt;br /&gt;
 /path/to/audio/file/245.wav\n&lt;br /&gt;
 /path/to/audio/file/432.wav\n&lt;br /&gt;
 ...&lt;br /&gt;
&lt;br /&gt;
For a total of ''&amp;lt;number of queries&amp;gt;'' rows -- query ids are assigned from the pool of ''&amp;lt;number of candidates&amp;gt;'' collection ids and should match the ids within the candidate collection.&lt;br /&gt;
&lt;br /&gt;
Lines will be terminated by a '\n' character.&lt;br /&gt;
&lt;br /&gt;
=== Output File ===&lt;br /&gt;
The only output will be a '''distance''' matrix file that is ''&amp;lt;number of queries&amp;gt;'' rows by ''&amp;lt;number of candidates&amp;gt;'' columns in the following format: &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Distance matrix header text with system name&lt;br /&gt;
1\t&amp;lt;/path/to/audio/file/track1.wav&amp;gt;&lt;br /&gt;
2\t&amp;lt;/path/to/audio/file/track2.wav&amp;gt;&lt;br /&gt;
3\t&amp;lt;/path/to/audio/file/track3.wav&amp;gt;&lt;br /&gt;
4\t&amp;lt;/path/to/audio/file/track4.wav&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
N\t&amp;lt;/path/to/audio/file/trackN.wav&amp;gt;&lt;br /&gt;
Q/R\t1\t2\t3\t4\t...\tN&lt;br /&gt;
1\t&amp;lt;dist 1 to 1&amp;gt;\t&amp;lt;dist 1 to 2&amp;gt;\t&amp;lt;dist 1 to 3&amp;gt;\t&amp;lt;dist 1 to 4&amp;gt;\t...\t&amp;lt;dist 1 to N&amp;gt;&lt;br /&gt;
3\t&amp;lt;dist 3 to 2&amp;gt;\t&amp;lt;dist 3 to 2&amp;gt;\t&amp;lt;dist 3 to 3&amp;gt;\t&amp;lt;dist 3 to 4&amp;gt;\t...\t&amp;lt;dist 3 to N&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where N is &amp;lt;number of candidates&amp;gt; and the queries are drawn from this set (and bear the same track indexes if possible).&lt;br /&gt;
&lt;br /&gt;
which might look like:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Example distance matrix 0.1&lt;br /&gt;
1    /path/to/audio/file/track1.wav&lt;br /&gt;
2    /path/to/audio/file/track2.wav&lt;br /&gt;
3    /path/to/audio/file/track3.wav&lt;br /&gt;
4    /path/to/audio/file/track4.wav&lt;br /&gt;
5    /path/to/audio/file/track5.wav&lt;br /&gt;
Q/R   1        2        3        4        5&lt;br /&gt;
1     0.00000  1.24100  0.2e-4   0.42559  0.21313&lt;br /&gt;
3     50.2e-4  0.62640  0.00000  0.38000  0.15152&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that indexes of the queries refer back to the track list at the top of the distance matrix file to identify the query track. However, as long as you ensure that the query songs are listed in exactly the same order as they appear in the query list file you are passed we will be able to interpret the data.&lt;br /&gt;
&lt;br /&gt;
All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a TAB.&lt;br /&gt;
&lt;br /&gt;
To summarize, the distance matrix should be preceded by a system name, ''&amp;lt;number of candidates&amp;gt;'' rows of file paths and should be composed of ''&amp;lt;number of candidates&amp;gt;'' columns of distance (separated by tab characters) and ''&amp;lt;number of queries&amp;gt;'' rows (one for each original track query). Each row corresponds to a particular query song (the track to find covers of).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Command Line Calling Format ===&lt;br /&gt;
&lt;br /&gt;
 /path/to/submission &amp;lt;collection_list_file&amp;gt; &amp;lt;query_list_file&amp;gt; &amp;lt;working_directory&amp;gt; &amp;lt;output_file&amp;gt;&lt;br /&gt;
     '''&amp;lt;collection_list_file&amp;gt;''': Text file containing ''&amp;lt;number of candidates&amp;gt;'' full path file names for the&lt;br /&gt;
                             ''&amp;lt;number of candidates&amp;gt;'' audio files in the collection (including the ''&amp;lt;number of queries&amp;gt;'' &lt;br /&gt;
                             query documents).&lt;br /&gt;
                             '''Example: /path/to/coversong/collection.txt'''&lt;br /&gt;
     '''&amp;lt;query_list_file&amp;gt;'''     : Text file containing the ''&amp;lt;number of queries&amp;gt;'' full path file names for the &lt;br /&gt;
                             ''&amp;lt;number of queries&amp;gt;'' query documents.&lt;br /&gt;
                             '''Example: /path/to/coversong/queries.txt'''&lt;br /&gt;
     '''&amp;lt;working_directory&amp;gt;'''   : Full path to a temporary directory where submission will &lt;br /&gt;
                             have write access for caching features or calculations.&lt;br /&gt;
                             '''Example: /tmp/submission_id/'''&lt;br /&gt;
     '''&amp;lt;output_file&amp;gt;'''         : Full path to file where submission should output the similarity &lt;br /&gt;
                             matrix (''&amp;lt;number of candidates&amp;gt;'' header rows + ''&amp;lt;number of queries&amp;gt;'' x ''&amp;lt;number of candidates&amp;gt;'' data matrix).&lt;br /&gt;
                             '''Example: /path/to/coversong/results/submission_id.txt'''&lt;br /&gt;
&lt;br /&gt;
E.g.&lt;br /&gt;
 /path/to/m/submission.sh /path/to/feat_extract_file.txt /path/to/query_file.txt /path/to/scratch/dir /path/to/output_file.txt&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Packaging submissions ===&lt;br /&gt;
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).&lt;br /&gt;
&lt;br /&gt;
All submissions should include a README file including the following the information:&lt;br /&gt;
&lt;br /&gt;
* Command line calling format for all executables and an example formatted set of commands&lt;br /&gt;
* Number of threads/cores used or whether this should be specified on the command line&lt;br /&gt;
* Expected memory footprint&lt;br /&gt;
* Expected runtime&lt;br /&gt;
* Any required environments (and versions), e.g. python, java, bash, matlab.&lt;br /&gt;
&lt;br /&gt;
== Time and hardware limits ==&lt;br /&gt;
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified. &lt;br /&gt;
 &lt;br /&gt;
A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Cover_Song_Identification&amp;diff=13669</id>
		<title>2024:Cover Song Identification</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Cover_Song_Identification&amp;diff=13669"/>
		<updated>2024-08-26T03:47:38Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Evaluation */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Description==&lt;br /&gt;
This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or &amp;quot;cover songs&amp;quot;.&lt;br /&gt;
 &lt;br /&gt;
Within the a collection of pieces in the cover song datasets, there are embedded a number of different &amp;quot;original songs&amp;quot; or compositions each represented by a number of different &amp;quot;versions&amp;quot;. The &amp;quot;cover songs&amp;quot; or &amp;quot;versions&amp;quot; represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations. &lt;br /&gt;
&lt;br /&gt;
Using each of these version files in turn as as the &amp;quot;seed/query&amp;quot; file, we examine the returned ranked lists of items from each algorithm for the presence of the other versions of the &amp;quot;seed/query&amp;quot; file.&lt;br /&gt;
&lt;br /&gt;
Two datasets are used in this task, the MIREX 2006 US Pop Music Cover Song dataset Audio Cover Song dataset the [http://www.mazurka.org.uk/ Mazurka dataset]. &lt;br /&gt;
&lt;br /&gt;
=== Task specific mailing list ===&lt;br /&gt;
In the past we have use a specific mailing list for the discussion of this task and related tasks. This year, however, we are asking that all discussions take place on the MIREX  [https://mail.lis.illinois.edu/mailman/listinfo/evalfest &amp;quot;EvalFest&amp;quot; list]. If you have an question or comment, simply include the task name in the subject heading.&lt;br /&gt;
&lt;br /&gt;
== Data ==&lt;br /&gt;
Two datasets will be used to evaluate cover song identification:&lt;br /&gt;
&lt;br /&gt;
===US Pop Music Collection Cover Song (aka Mixed Collection)===&lt;br /&gt;
This is the &amp;quot;original&amp;quot; ACS collection. Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different &amp;quot;cover songs&amp;quot; each represented by 11 different &amp;quot;versions&amp;quot; for a total of 330 audio files. &lt;br /&gt;
&lt;br /&gt;
Using each of these cover song files in turn as as the &amp;quot;seed/query&amp;quot; file, we will examine the returned lists of items for the presence of the other 10 versions of the &amp;quot;seed/query&amp;quot; file.&lt;br /&gt;
&lt;br /&gt;
Collection statistics:&lt;br /&gt;
* 16bit, monophonic, 22.05khz, wav&lt;br /&gt;
* The &amp;quot;cover songs&amp;quot; represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations. &lt;br /&gt;
* Size: 1000 tracks&lt;br /&gt;
* Queries: 330 tracks&lt;br /&gt;
&lt;br /&gt;
=== Sapp's Mazurka Collection Information ===&lt;br /&gt;
In addition to our original ACS dataset, we used the  [http://www.mazurka.org.uk/ Mazurka.org dataset] put together by Craig Sapp. We randomly chose 11 versions from 49 mazurkas and ran it as a separate ACS subtask. Systems should return a distance matrix of 539x539 from which we located the ranks of each of the associated cover versions.&lt;br /&gt;
&lt;br /&gt;
Collection statistics:&lt;br /&gt;
* 16bit, monophonic, 22.05khz, wav&lt;br /&gt;
* Size: 539 tracks&lt;br /&gt;
* Queries: 539 tracks&lt;br /&gt;
&lt;br /&gt;
== Evaluation ==&lt;br /&gt;
The following evaluation metrics will be computed for each submission:&lt;br /&gt;
* Total number of covers identified in top 10&lt;br /&gt;
* Mean number of covers identified in top 10 (average performance)&lt;br /&gt;
* Mean (arithmetic) of Avg. Precisions&lt;br /&gt;
* Mean rank of first correctly identified cover&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Ranking and significance testing ===&lt;br /&gt;
Friedman's ANOVA with Tukey-Kramer HSD will be run against the Average Precision summary data over the individual song groups to assess the significance of differences in performance and to rank the performances.&lt;br /&gt;
&lt;br /&gt;
For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see:&lt;br /&gt;
 @InProceedings{jones2007hsj,&lt;br /&gt;
   title={&amp;quot;Human Similarity Judgements: Implications for the Design of Formal Evaluations&amp;quot;},&lt;br /&gt;
   author=&amp;quot;M.C. Jones and J.S. Downie and A.F. Ehmann&amp;quot;,&lt;br /&gt;
   BOOKTITLE =&amp;quot;Proceedings of ISMIR  2007 International Society of Music Information Retrieval&amp;quot;, &lt;br /&gt;
   year=&amp;quot;2007&amp;quot;&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Runtime performance ===&lt;br /&gt;
In addition computation times for feature extraction and training/classification will be measured.&lt;br /&gt;
&lt;br /&gt;
== Submission Format ==&lt;br /&gt;
Submission to this task will have to conform to a specified format detailed below.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Implementation details ===&lt;br /&gt;
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.&lt;br /&gt;
&lt;br /&gt;
The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.&lt;br /&gt;
&lt;br /&gt;
Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== I/O formats ===&lt;br /&gt;
=== Input Files ===&lt;br /&gt;
&lt;br /&gt;
The feature extraction list file format will be of the form: &lt;br /&gt;
&lt;br /&gt;
 /path/to/audio/file/000.wav\n&lt;br /&gt;
 /path/to/audio/file/001.wav\n&lt;br /&gt;
 /path/to/audio/file/002.wav\n&lt;br /&gt;
 ... &lt;br /&gt;
&lt;br /&gt;
The query list file format will be very similar, taking the form, and listing a subset of files from the feature extraction list file: &lt;br /&gt;
&lt;br /&gt;
 /path/to/audio/file/182.wav\n&lt;br /&gt;
 /path/to/audio/file/245.wav\n&lt;br /&gt;
 /path/to/audio/file/432.wav\n&lt;br /&gt;
 ...&lt;br /&gt;
&lt;br /&gt;
For a total of ''&amp;lt;number of queries&amp;gt;'' rows -- query ids are assigned from the pool of ''&amp;lt;number of candidates&amp;gt;'' collection ids and should match the ids within the candidate collection.&lt;br /&gt;
&lt;br /&gt;
Lines will be terminated by a '\n' character.&lt;br /&gt;
&lt;br /&gt;
=== Output File ===&lt;br /&gt;
The only output will be a '''distance''' matrix file that is ''&amp;lt;number of queries&amp;gt;'' rows by ''&amp;lt;number of candidates&amp;gt;'' columns in the following format: &lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Distance matrix header text with system name&lt;br /&gt;
1\t&amp;lt;/path/to/audio/file/track1.wav&amp;gt;&lt;br /&gt;
2\t&amp;lt;/path/to/audio/file/track2.wav&amp;gt;&lt;br /&gt;
3\t&amp;lt;/path/to/audio/file/track3.wav&amp;gt;&lt;br /&gt;
4\t&amp;lt;/path/to/audio/file/track4.wav&amp;gt;&lt;br /&gt;
...&lt;br /&gt;
N\t&amp;lt;/path/to/audio/file/trackN.wav&amp;gt;&lt;br /&gt;
Q/R\t1\t2\t3\t4\t...\tN&lt;br /&gt;
1\t&amp;lt;dist 1 to 1&amp;gt;\t&amp;lt;dist 1 to 2&amp;gt;\t&amp;lt;dist 1 to 3&amp;gt;\t&amp;lt;dist 1 to 4&amp;gt;\t...\t&amp;lt;dist 1 to N&amp;gt;&lt;br /&gt;
3\t&amp;lt;dist 3 to 2&amp;gt;\t&amp;lt;dist 3 to 2&amp;gt;\t&amp;lt;dist 3 to 3&amp;gt;\t&amp;lt;dist 3 to 4&amp;gt;\t...\t&amp;lt;dist 3 to N&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where N is &amp;lt;number of candidates&amp;gt; and the queries are drawn from this set (and bear the same track indexes if possible).&lt;br /&gt;
&lt;br /&gt;
which might look like:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Example distance matrix 0.1&lt;br /&gt;
1    /path/to/audio/file/track1.wav&lt;br /&gt;
2    /path/to/audio/file/track2.wav&lt;br /&gt;
3    /path/to/audio/file/track3.wav&lt;br /&gt;
4    /path/to/audio/file/track4.wav&lt;br /&gt;
5    /path/to/audio/file/track5.wav&lt;br /&gt;
Q/R   1        2        3        4        5&lt;br /&gt;
1     0.00000  1.24100  0.2e-4   0.42559  0.21313&lt;br /&gt;
3     50.2e-4  0.62640  0.00000  0.38000  0.15152&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Note that indexes of the queries refer back to the track list at the top of the distance matrix file to identify the query track. However, as long as you ensure that the query songs are listed in exactly the same order as they appear in the query list file you are passed we will be able to interpret the data.&lt;br /&gt;
&lt;br /&gt;
All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a TAB.&lt;br /&gt;
&lt;br /&gt;
To summarize, the distance matrix should be preceded by a system name, ''&amp;lt;number of candidates&amp;gt;'' rows of file paths and should be composed of ''&amp;lt;number of candidates&amp;gt;'' columns of distance (separated by tab characters) and ''&amp;lt;number of queries&amp;gt;'' rows (one for each original track query). Each row corresponds to a particular query song (the track to find covers of).&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Command Line Calling Format ===&lt;br /&gt;
&lt;br /&gt;
 /path/to/submission &amp;lt;collection_list_file&amp;gt; &amp;lt;query_list_file&amp;gt; &amp;lt;working_directory&amp;gt; &amp;lt;output_file&amp;gt;&lt;br /&gt;
     '''&amp;lt;collection_list_file&amp;gt;''': Text file containing ''&amp;lt;number of candidates&amp;gt;'' full path file names for the&lt;br /&gt;
                             ''&amp;lt;number of candidates&amp;gt;'' audio files in the collection (including the ''&amp;lt;number of queries&amp;gt;'' &lt;br /&gt;
                             query documents).&lt;br /&gt;
                             '''Example: /path/to/coversong/collection.txt'''&lt;br /&gt;
     '''&amp;lt;query_list_file&amp;gt;'''     : Text file containing the ''&amp;lt;number of queries&amp;gt;'' full path file names for the &lt;br /&gt;
                             ''&amp;lt;number of queries&amp;gt;'' query documents.&lt;br /&gt;
                             '''Example: /path/to/coversong/queries.txt'''&lt;br /&gt;
     '''&amp;lt;working_directory&amp;gt;'''   : Full path to a temporary directory where submission will &lt;br /&gt;
                             have write access for caching features or calculations.&lt;br /&gt;
                             '''Example: /tmp/submission_id/'''&lt;br /&gt;
     '''&amp;lt;output_file&amp;gt;'''         : Full path to file where submission should output the similarity &lt;br /&gt;
                             matrix (''&amp;lt;number of candidates&amp;gt;'' header rows + ''&amp;lt;number of queries&amp;gt;'' x ''&amp;lt;number of candidates&amp;gt;'' data matrix).&lt;br /&gt;
                             '''Example: /path/to/coversong/results/submission_id.txt'''&lt;br /&gt;
&lt;br /&gt;
E.g.&lt;br /&gt;
 /path/to/m/submission.sh /path/to/feat_extract_file.txt /path/to/query_file.txt /path/to/scratch/dir /path/to/output_file.txt&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Packaging submissions ===&lt;br /&gt;
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).&lt;br /&gt;
&lt;br /&gt;
All submissions should include a README file including the following the information:&lt;br /&gt;
&lt;br /&gt;
* Command line calling format for all executables and an example formatted set of commands&lt;br /&gt;
* Number of threads/cores used or whether this should be specified on the command line&lt;br /&gt;
* Expected memory footprint&lt;br /&gt;
* Expected runtime&lt;br /&gt;
* Any required environments (and versions), e.g. python, java, bash, matlab.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Cover_Song_Identification&amp;diff=13668</id>
		<title>2024:Cover Song Identification</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Cover_Song_Identification&amp;diff=13668"/>
		<updated>2024-08-26T03:47:01Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Data */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Description==&lt;br /&gt;
This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or &amp;quot;cover songs&amp;quot;.&lt;br /&gt;
 &lt;br /&gt;
Within the a collection of pieces in the cover song datasets, there are embedded a number of different &amp;quot;original songs&amp;quot; or compositions each represented by a number of different &amp;quot;versions&amp;quot;. The &amp;quot;cover songs&amp;quot; or &amp;quot;versions&amp;quot; represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations. &lt;br /&gt;
&lt;br /&gt;
Using each of these version files in turn as as the &amp;quot;seed/query&amp;quot; file, we examine the returned ranked lists of items from each algorithm for the presence of the other versions of the &amp;quot;seed/query&amp;quot; file.&lt;br /&gt;
&lt;br /&gt;
Two datasets are used in this task, the MIREX 2006 US Pop Music Cover Song dataset Audio Cover Song dataset the [http://www.mazurka.org.uk/ Mazurka dataset]. &lt;br /&gt;
&lt;br /&gt;
=== Task specific mailing list ===&lt;br /&gt;
In the past we have use a specific mailing list for the discussion of this task and related tasks. This year, however, we are asking that all discussions take place on the MIREX  [https://mail.lis.illinois.edu/mailman/listinfo/evalfest &amp;quot;EvalFest&amp;quot; list]. If you have an question or comment, simply include the task name in the subject heading.&lt;br /&gt;
&lt;br /&gt;
== Data ==&lt;br /&gt;
Two datasets will be used to evaluate cover song identification:&lt;br /&gt;
&lt;br /&gt;
===US Pop Music Collection Cover Song (aka Mixed Collection)===&lt;br /&gt;
This is the &amp;quot;original&amp;quot; ACS collection. Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different &amp;quot;cover songs&amp;quot; each represented by 11 different &amp;quot;versions&amp;quot; for a total of 330 audio files. &lt;br /&gt;
&lt;br /&gt;
Using each of these cover song files in turn as as the &amp;quot;seed/query&amp;quot; file, we will examine the returned lists of items for the presence of the other 10 versions of the &amp;quot;seed/query&amp;quot; file.&lt;br /&gt;
&lt;br /&gt;
Collection statistics:&lt;br /&gt;
* 16bit, monophonic, 22.05khz, wav&lt;br /&gt;
* The &amp;quot;cover songs&amp;quot; represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations. &lt;br /&gt;
* Size: 1000 tracks&lt;br /&gt;
* Queries: 330 tracks&lt;br /&gt;
&lt;br /&gt;
=== Sapp's Mazurka Collection Information ===&lt;br /&gt;
In addition to our original ACS dataset, we used the  [http://www.mazurka.org.uk/ Mazurka.org dataset] put together by Craig Sapp. We randomly chose 11 versions from 49 mazurkas and ran it as a separate ACS subtask. Systems should return a distance matrix of 539x539 from which we located the ranks of each of the associated cover versions.&lt;br /&gt;
&lt;br /&gt;
Collection statistics:&lt;br /&gt;
* 16bit, monophonic, 22.05khz, wav&lt;br /&gt;
* Size: 539 tracks&lt;br /&gt;
* Queries: 539 tracks&lt;br /&gt;
&lt;br /&gt;
== Evaluation ==&lt;br /&gt;
The following evaluation metrics will be computed for each submission:&lt;br /&gt;
* Total number of covers identified in top 10&lt;br /&gt;
* Mean number of covers identified in top 10 (average performance)&lt;br /&gt;
* Mean (arithmetic) of Avg. Precisions&lt;br /&gt;
* Mean rank of first correctly identified cover&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Ranking and significance testing ===&lt;br /&gt;
Friedman's ANOVA with Tukey-Kramer HSD will be run against the Average Precision summary data over the individual song groups to assess the significance of differences in performance and to rank the performances.&lt;br /&gt;
&lt;br /&gt;
For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see:&lt;br /&gt;
 @InProceedings{jones2007hsj,&lt;br /&gt;
   title={&amp;quot;Human Similarity Judgements: Implications for the Design of Formal Evaluations&amp;quot;},&lt;br /&gt;
   author=&amp;quot;M.C. Jones and J.S. Downie and A.F. Ehmann&amp;quot;,&lt;br /&gt;
   BOOKTITLE =&amp;quot;Proceedings of ISMIR  2007 International Society of Music Information Retrieval&amp;quot;, &lt;br /&gt;
   year=&amp;quot;2007&amp;quot;&lt;br /&gt;
 }&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
=== Runtime performance ===&lt;br /&gt;
In addition computation times for feature extraction and training/classification will be measured.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Cover_Song_Identification&amp;diff=13667</id>
		<title>2024:Cover Song Identification</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Cover_Song_Identification&amp;diff=13667"/>
		<updated>2024-08-26T03:46:23Z</updated>

		<summary type="html">&lt;p&gt;A43992899: /* Description */&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Description==&lt;br /&gt;
This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or &amp;quot;cover songs&amp;quot;.&lt;br /&gt;
 &lt;br /&gt;
Within the a collection of pieces in the cover song datasets, there are embedded a number of different &amp;quot;original songs&amp;quot; or compositions each represented by a number of different &amp;quot;versions&amp;quot;. The &amp;quot;cover songs&amp;quot; or &amp;quot;versions&amp;quot; represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations. &lt;br /&gt;
&lt;br /&gt;
Using each of these version files in turn as as the &amp;quot;seed/query&amp;quot; file, we examine the returned ranked lists of items from each algorithm for the presence of the other versions of the &amp;quot;seed/query&amp;quot; file.&lt;br /&gt;
&lt;br /&gt;
Two datasets are used in this task, the MIREX 2006 US Pop Music Cover Song dataset Audio Cover Song dataset the [http://www.mazurka.org.uk/ Mazurka dataset]. &lt;br /&gt;
&lt;br /&gt;
=== Task specific mailing list ===&lt;br /&gt;
In the past we have use a specific mailing list for the discussion of this task and related tasks. This year, however, we are asking that all discussions take place on the MIREX  [https://mail.lis.illinois.edu/mailman/listinfo/evalfest &amp;quot;EvalFest&amp;quot; list]. If you have an question or comment, simply include the task name in the subject heading.&lt;br /&gt;
&lt;br /&gt;
== Data ==&lt;br /&gt;
Two datasets will be used to evaluate cover song identification:&lt;br /&gt;
&lt;br /&gt;
===US Pop Music Collection Cover Song (aka Mixed Collection)===&lt;br /&gt;
This is the &amp;quot;original&amp;quot; ACS collection. Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different &amp;quot;cover songs&amp;quot; each represented by 11 different &amp;quot;versions&amp;quot; for a total of 330 audio files. &lt;br /&gt;
&lt;br /&gt;
Using each of these cover song files in turn as as the &amp;quot;seed/query&amp;quot; file, we will examine the returned lists of items for the presence of the other 10 versions of the &amp;quot;seed/query&amp;quot; file.&lt;br /&gt;
&lt;br /&gt;
Collection statistics:&lt;br /&gt;
* 16bit, monophonic, 22.05khz, wav&lt;br /&gt;
* The &amp;quot;cover songs&amp;quot; represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations. &lt;br /&gt;
* Size: 1000 tracks&lt;br /&gt;
* Queries: 330 tracks&lt;br /&gt;
&lt;br /&gt;
=== Sapp's Mazurka Collection Information ===&lt;br /&gt;
In addition to our original ACS dataset, we used the  [http://www.mazurka.org.uk/ Mazurka.org dataset] put together by Craig Sapp. We randomly chose 11 versions from 49 mazurkas and ran it as a separate ACS subtask. Systems should return a distance matrix of 539x539 from which we located the ranks of each of the associated cover versions.&lt;br /&gt;
&lt;br /&gt;
Collection statistics:&lt;br /&gt;
* 16bit, monophonic, 22.05khz, wav&lt;br /&gt;
* Size: 539 tracks&lt;br /&gt;
* Queries: 539 tracks&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2024:Cover_Song_Identification&amp;diff=13666</id>
		<title>2024:Cover Song Identification</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2024:Cover_Song_Identification&amp;diff=13666"/>
		<updated>2024-08-26T03:43:47Z</updated>

		<summary type="html">&lt;p&gt;A43992899: Created page with &amp;quot;==Description== This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or &amp;quot;cover songs&amp;quot;.   Within the a collection of...&amp;quot;&lt;/p&gt;
&lt;hr /&gt;
&lt;div&gt;==Description==&lt;br /&gt;
This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or &amp;quot;cover songs&amp;quot;.&lt;br /&gt;
 &lt;br /&gt;
Within the a collection of pieces in the cover song datasets, there are embedded a number of different &amp;quot;original songs&amp;quot; or compositions each represented by a number of different &amp;quot;versions&amp;quot;. The &amp;quot;cover songs&amp;quot; or &amp;quot;versions&amp;quot; represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations. &lt;br /&gt;
&lt;br /&gt;
Using each of these version files in turn as as the &amp;quot;seed/query&amp;quot; file, we examine the returned ranked lists of items from each algorithm for the presence of the other versions of the &amp;quot;seed/query&amp;quot; file.&lt;br /&gt;
&lt;br /&gt;
Two datasets are used in this task, the MIREX 2006 US Pop Music Cover Song dataset Audio Cover Song dataset the [http://www.mazurka.org.uk/ Mazurka dataset]. &lt;br /&gt;
&lt;br /&gt;
=== Task specific mailing list ===&lt;br /&gt;
In the past we have use a specific mailing list for the discussion of this task and related tasks. This year, however, we are asking that all discussions take place on the MIREX  [https://mail.lis.illinois.edu/mailman/listinfo/evalfest &amp;quot;EvalFest&amp;quot; list]. If you have an question or comment, simply include the task name in the subject heading.&lt;/div&gt;</summary>
		<author><name>A43992899</name></author>
		
	</entry>
</feed>