MIREX Wiki - User contributions [en]

2010:MIREX2010 Results

2010-09-08T23:54:20Z

Kriswest: Updating test train reports

==OVERALL RESULTS POSTERS (First Version: Will need updating as last runs are completed)==
[https://www.music-ir.org/mirex/results/2010/mirex_2010_poster.pdf MIREX 2010 Overall Results Posters (PDF)]

==Results by Task ==
This year we ran many MIREX 2010 Tasks using the new [http://nema.lis.uiuc.edu/drupal/?q=nema/architecture NEMA MIREX DIY] infrastructure. Task results with "(DIY)" appended are those generated using the NEMA MIREX DIY system. Where appropriate, do explore the various new outputs that help visualize both individual and task-wide comparative performances. A demonstration [https://www.music-ir.org/diy-demo/ video] of the NEMA MIREX DIY system can be found is also available.

===Train-Test Task Set===
* [https://nema.lis.illinois.edu/nema_out/4ffcb482-b83c-4ba6-bc42-9b538b31143c/results/evaluation/ Audio Classical Composer Identification Results ]   (DIY)
* [https://nema.lis.illinois.edu/nema_out/6731c97a-240c-4d3d-8be9-90d715ea04e1/results/evaluation/ Audio Latin Genre Classification Results ]   (DIY)
* [https://nema.lis.illinois.edu/nema_out/9b11a5c8-9fcf-4029-95eb-51ed561cfb5f/results/evaluation/ Audio Music Mood Classification Results ]   (DIY)
* [https://nema.lis.illinois.edu/nema_out/2b5839b3-3012-4f76-8807-31823588ae25/results/evaluation/ Audio Mixed Popular Genre Classification Results ]   (DIY)

===Other Tasks===

* Audio Beat Tracking Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/ MCK Dataset]  (DIY)
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/maz/ MAZ Dataset]  (DIY)
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ace/ Audio Chord Detection]  (DIY)
* [[2010:Audio_Cover_Song_Identification_Results | Audio Cover Song Identification Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/akd/ Audio Key Detection Results]  (DIY)
* Audio Melody Extraction Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/adc04/ ADC04 Dataset]  (DIY)
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex05/ MIREX05 Dataset]  (DIY)
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/indian08/ INDIAN08 Dataset]  (DIY)
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_0dB/ MIREX09 0dB Dataset]  (DIY)
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_m5dB/ MIREX09 -5dB Dataset]  (DIY)
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_p5dB/ MIREX09 +5dB Dataset]  (DIY)
* [[2010:Audio_Music_Similarity_and_Retrieval_Results | Audio Music Similarity and Retrieval Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/aod/ Audio Onset Detection Results]  (DIY)
* Audio Tag Classification Results
** Major Miner Tag dataset
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask1_report/bin/ Binary relevance (classification evaluation)]  (DIY)
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask1_report/aff/ Affinity estimation evaluation]  (DIY)
** Mood Tag dataset
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask2_report/bin/ Binary relevance (classification evaluation)]  (DIY)
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask2_report/aff/ Affinity estimation evaluation]  (DIY)
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ate/ Audio Tempo Estimation Results]  (DIY)
* [[2010:Multiple_Fundamental_Frequency_Estimation_&_Tracking_Results | Multiple Fundamental Frequency Estimation & Tracking Results]]
* Music Structure Segmentation Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/struct/mirex09/ MIREX09 dataset]  (DIY)
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/struct/mirex10/ MIREX10 dataset]  (DIY)
* [[2010:Query-by-Singing/Humming_Results | Query-by-Singing/Humming Results]]
* [[2010:Query-by-Tapping_Results | Query-by-Tapping Results]]
*[[2010:Real-time_Audio_to_Score_Alignment_(a.k.a._Score_Following)_Results | Real-time Audio to Score Alignment (a.k.a. Score Following) Results ]]
* [[2010:Symbolic_Melodic_Similarity_Results | Symbolic Melodic Similarity Results]]

== Machine Specifications ==

== Runtime for Submissions Run by NEMA DIY ==

* [[2010:Runtime | Runtime]]

[[Category:Results]]

2010:MIREX2010 Results

2010-08-09T11:07:11Z

Kriswest: /* Train-Test Task Set */

==OVERALL RESULTS POSTERS (NOT READY YET)==
[https://www.music-ir.org/mirex/results/2010/MIREX2010ResultsPoster1.pdf MIREX 2010 Overall Results Poster #1 (PDF)]

[https://www.music-ir.org/mirex/results/2010/MIREX2010ResultsPoster2.pdf MIREX 2010 Overall Results Poster #2 (PDF)]

==Results by Task ==

===Train-Test Task Set===
* [http://nema.lis.uiuc.edu/nema_out/664ccbda-d5b6-48ae-8c47-c27e7c2372fe/results/evaluation/ Audio Classical Composer Identification Results ]
* [http://nema.lis.uiuc.edu/nema_out/d97c8282-883e-4e71-93b7-55283829ad21/results/evaluation/ Audio Latin Genre Classification Results ]
* [http://nema.lis.uiuc.edu/nema_out/0e2212ca-2c1a-4c4e-b164-de74974afe43/results/evaluation/ Audio Music Mood Classification Results ]
* [[2010:Audio_Mixed_Popular_Genre_Classification_Results | Audio Mixed Popular Genre Classification Results]]

===Other Tasks===

* Audio Beat Tracking Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/ MCK Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/maz/ MAZ Dataset]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ace/ Audio Chord Detection]
* [[2010:Audio_Cover_Song_Identification_Results | Audio Cover Song Identification Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/akd/ Audio Key Detection Results]
* Audio Melody Extraction Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/adc04/ ADC04 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex05/ MIREX05 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/indian08/ INDIAN08 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_0dB/ MIREX09 0dB Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_m5dB/ MIREX09 -5dB Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_p5dB/ MIREX09 +5dB Dataset]
* [[2010:Audio_Music_Similarity_and_Retrieval_Results | Audio Music Similarity and Retrieval Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/aod/ Audio Onset Detection Results]
* Audio Tag Classification Results
** Major Miner Tag dataset
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask1_report/bin/ Binary relevance (classification evaluation)]
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask1_report/aff/ Affinity estimation evaluation]
** Mood Tag dataset
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask2_report/bin/ Binary relevance (classification evaluation)]
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask2_report/aff/ Affinity estimation evaluation]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ate/ Audio Tempo Estimation Results]
* [[2010:Multiple_Fundamental_Frequency_Estimation_&_Tracking_Results | Multiple Fundamental Frequency Estimation & Tracking Results]]
* Music Structure Segmentation Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/struct/mirex09/ MIREX09 dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/struct/mirex10/ MIREX10 dataset]
* [[2010:Query-by-Singing/Humming_Results | Query-by-Singing/Humming Results]]
* [[2010:Query-by-Tapping_Results | Query-by-Tapping Results]]
*[[2010:Real-time_Audio_to_Score_Alignment_(a.k.a._Score_Following)_Results | Real-time Audio to Score Alignment (a.k.a. Score Following) Results ]]
* [[2010:Symbolic_Melodic_Similarity_Results | Symbolic Melodic Similarity Results]]

== Machine Specifications ==

== Runtime for Submissions Run by NEMA DIY ==

* [[2010:Runtime | Runtime]]

[[Category:Results]]

2010:MIREX2010 Results

2010-08-06T15:28:16Z

Kriswest: /* Other Tasks */

==OVERALL RESULTS POSTERS (NOT READY YET)==
[https://www.music-ir.org/mirex/results/2010/MIREX2010ResultsPoster1.pdf MIREX 2010 Overall Results Poster #1 (PDF)]

[https://www.music-ir.org/mirex/results/2010/MIREX2010ResultsPoster2.pdf MIREX 2010 Overall Results Poster #2 (PDF)]

==Results by Task ==

===Train-Test Task Set===
* [[2010:Audio_Classical_Composer_Identification_Results | Audio Classical Composer Identification Results ]]
* [[2010:Audio_Mixed_Popular_Genre_Classification_Results | Audio Mixed Popular Genre Classification Results]]
* [[2010:Audio_Latin_Genre_Classification_Results | Audio Latin Genre Classification Results ]]
* [[2010:Audio_Music_Mood_Classification_Results | Audio Music Mood Classification Results ]]

===Other Tasks===

* Audio Beat Tracking Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/ MCK Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/maz/ MAZ Dataset]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ace/ Audio Chord Detection]
* [[2010:Audio_Cover_Song_Identification_Results | Audio Cover Song Identification Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/akd/ Audio Key Detection Results]
* Audio Melody Extraction Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/adc04/ ADC04 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex05/ MIREX05 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/indian08/ INDIAN08 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_0dB/ MIREX09 0dB Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_m5dB/ MIREX09 -5dB Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_p5dB/ MIREX09 +5dB Dataset]
* [[2010:Audio_Music_Similarity_and_Retrieval_Results | Audio Music Similarity and Retrieval Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/aod/ Audio Onset Detection Results]
* Audio Tag Classification Results
** Major Miner Tag dataset
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask1_report/bin/ Binary relevance (classification evaluation)]
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask1_report/aff/ Affinity estimation evaluation]
** Mood Tag dataset
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask2_report/bin/ Binary relevance (classification evaluation)]
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask2_report/aff/ Affinity estimation evaluation]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ate/ Audio Tempo Estimation Results]
* [[2010:Multiple_Fundamental_Frequency_Estimation_&_Tracking_Results | Multiple Fundamental Frequency Estimation & Tracking Results]]
* Music Structure Segmentation Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/struct/mirex09/ MIREX09 dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/struct/mirex10/ MIREX10 dataset]
* [[2010:Query-by-Singing/Humming_Results | Query-by-Singing/Humming Results]]
* [[2010:Query-by-Tapping_Results | Query-by-Tapping Results]]
*[[2010:Real-time_Audio_to_Score_Alignment_(a.k.a._Score_Following)_Results | Real-time Audio to Score Alignment (a.k.a. Score Following) Results ]]
* [[2010:Symbolic_Melodic_Similarity_Results | Symbolic Melodic Similarity Results]]

== Machine Specifications ==

[[Category:Results]]

2010:MIREX2010 Results

2010-08-06T11:20:35Z

Kriswest: Posting tag results

==OVERALL RESULTS POSTERS (NOT READY YET)==
[https://www.music-ir.org/mirex/results/2010/MIREX2010ResultsPoster1.pdf MIREX 2010 Overall Results Poster #1 (PDF)]

[https://www.music-ir.org/mirex/results/2010/MIREX2010ResultsPoster2.pdf MIREX 2010 Overall Results Poster #2 (PDF)]

==Results by Task ==

===Train-Test Task Set===
* [[2010:Audio_Classical_Composer_Identification_Results | Audio Classical Composer Identification Results ]]
* [[2010:Audio_Mixed_Popular_Genre_Classification_Results | Audio Mixed Popular Genre Classification Results]]
* [[2010:Audio_Latin_Genre_Classification_Results | Audio Latin Genre Classification Results ]]
* [[2010:Audio_Music_Mood_Classification_Results | Audio Music Mood Classification Results ]]

===Other Tasks===

* Audio Beat Tracking Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/ MCK Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/maz/ MAZ Dataset]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ace/ Audio Chord Detection]
* [[2010:Audio_Cover_Song_Identification_Results | Audio Cover Song Identification Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/akd/ Audio Key Detection Results]
* Audio Melody Extraction Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/adc04/ ADC04 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex05/ MIREX05 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/indian08/ INDIAN08 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_0dB/ MIREX09 0dB Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_m5dB/ MIREX09 -5dB Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_p5dB/ MIREX09 +5dB Dataset]
* [[2010:Audio_Music_Similarity_and_Retrieval_Results | Audio Music Similarity and Retrieval Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/aod/ Audio Onset Detection Results]
* Audio Tag Classification Results
** Major Miner Tag dataset
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask1_report/bin/ Binary relevance (classification evaluation)]
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask1_report/aff/ Affinity estimation evaluation]
** Mood Tag dataset
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask2_report/bin/ Binary relevance (classification evaluation)]
*** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/atg/subtask2_report/aff/ Affinity estimation evaluation]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ate/ Audio Tempo Estimation Results]
* [[2010:Multiple_Fundamental_Frequency_Estimation_&_Tracking_Results | Multiple Fundamental Frequency Estimation & Tracking Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/struct/ Music Structure Segmentation Results]
* [[2010:Query-by-Singing/Humming_Results | Query-by-Singing/Humming Results]]
* [[2010:Query-by-Tapping_Results | Query-by-Tapping Results]]
*[[2010:Real-time_Audio_to_Score_Alignment_(a.k.a._Score_Following)_Results | Real-time Audio to Score Alignment (a.k.a. Score Following) Results ]]
* [[2010:Symbolic_Melodic_Similarity_Results | Symbolic Melodic Similarity Results]]

== Machine Specifications ==

[[Category:Results]]

2010:MIREX2010 Results

2010-08-04T00:46:55Z

Kriswest: Posting chord results

==OVERALL RESULTS POSTERS (NOT READY YET)==
[https://www.music-ir.org/mirex/results/2010/MIREX2010ResultsPoster1.pdf MIREX 2010 Overall Results Poster #1 (PDF)]

[https://www.music-ir.org/mirex/results/2010/MIREX2010ResultsPoster2.pdf MIREX 2010 Overall Results Poster #2 (PDF)]

==Results by Task ==

===Train-Test Task Set===
* [[2010:Audio_Classical_Composer_Identification_Results | Audio Classical Composer Identification Results ]]
* [[2010:Audio_Mixed_Popular_Genre_Classification_Results | Audio Mixed Popular Genre Classification Results]]
* [[2010:Audio_Latin_Genre_Classification_Results | Audio Latin Genre Classification Results ]]
* [[2010:Audio_Music_Mood_Classification_Results | Audio Music Mood Classification Results ]]

===Other Tasks===

* Audio Beat Tracking Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/ MCK Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/maz/ MAZ Dataset]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ace/ Audio Chord Detection]
* [[2010:Audio_Cover_Song_Identification_Results | Audio Cover Song Identification Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/akd/ Audio Key Detection Results]
* Audio Melody Extraction Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/adc04/ ADC04 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex05/ MIREX05 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/indian08/ INDIAN08 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_0dB/ MIREX09 0dB Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_m5dB/ MIREX09 -5dB Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_p5dB/ MIREX09 +5dB Dataset]
* [[2010:Audio_Music_Similarity_and_Retrieval_Results | Audio Music Similarity and Retrieval Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/aod/ Audio Onset Detection Results]
* Audio Tag Classification Results
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ate/ Audio Tempo Estimation Results]
* [[2010:Multiple_Fundamental_Frequency_Estimation_&_Tracking_Results | Multiple Fundamental Frequency Estimation & Tracking Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/struct/ Music Structure Segmentation Results]
* [[2010:Query-by-Singing/Humming_Results | Query-by-Singing/Humming Results]]
* [[2010:Query-by-Tapping_Results | Query-by-Tapping Results]]
*[[2010:Real-time_Audio_to_Score_Alignment_(a.k.a._Score_Following)_Results | Real-time Audio to Score Alignment (a.k.a. Score Following) Results ]]
* [[2010:Symbolic_Melodic_Similarity_Results | Symbolic Melodic Similarity Results]]

== Machine Specifications ==

[[Category:Results]]

2010:Audio Music Similarity and Retrieval Results

2010-08-04T00:24:21Z

Kriswest: /* BROAD Scores */

== Introduction ==
These are the results for the 2010 running of the Audio Music Similarity and Retrieval task set. For background information about this task set please refer to the Audio Music Similarity and Retrieval page.

Each system was given 7000 songs chosen from IMIRSEL's "uspop", "uscrap" and "american" "classical" and "sundry" collections. Each system then returned a 7000x7000 distance matrix. 100 songs were randomly selected from the 10 genre groups (10 per genre) as queries and the first 5 most highly ranked songs out of the 7000 were extracted for each query (after filtering out the query itself, returned results from the same artist were also omitted). Then, for each query, the returned results (candidates) from all participants were grouped and were evaluated by human graders using the Evalutron 6000 grading system. Each individual query/candidate set was evaluated by a single grader. For each query/candidate pair, graders provided two scores. Graders were asked to provide 1 categorical '''BROAD''' score with 3 categories: NS,SS,VS as explained below, and one '''FINE''' score (in the range from 0 to 100). A description and analysis is provided below.

The systems read in 30 second audio clips as their raw data. The same 30 second clips were used in the grading stage.

=== General Legend ===

==== Team ID ====
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! BWL1
| MTG-AMS || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/BWL1.pdf PDF] || [http://mtg.upf.edu Dmitry Bogdanov], [http://mtg.upf.edu Nicolas Wack], [http://mtg.upf.edu Cyril Laurier]
|-
! PS1
| PS09 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/PS1.pdf PDF] || [http://www.cp.jku.at/ Tim Pohle], [http://www.cp.jku.at/ Dominik Schnitzer]
|-
! PSS1
| PSS10 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/PSS1.pdf PDF] || [http://www.cp.jku.at/ Tim Pohle], [http://www.cp.jku.at Klaus Seyerlehner], [http://www.cp.jku.at/ Dominik Schnitzer]
|-
! RZ1
| RND || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/RZ1.pdf PDF] || [http://www.cp.jku.at Rainer Zufall]
|-
! SSPK2
| cbmr_sim || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/SSPK2.pdf PDF] || [http://www.cp.jku.at Klaus Seyerlehner], [http://www.cp.jku.at Markus Schedl], [http://www.cp.jku.at Tim Pohle], [http://www.cp.jku.at Peter Knees]
|-
! TLN1
| MarsyasSimilarity || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/TNL1.pdf PDF] || [http://www.cs.uvic.ca/~gtzan George Tzanetakis], [http://sness.net Steven Ness], [http://recherche.ircam.fr/equipes/analyse-synthese/home.html Mathieu Lagrange]
|-
! TLN2
| Post-Processing 1 of Marsyas similarity results || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/TLN1.pdf PDF] || [http://www.cs.uvic.ca/~gtzan George Tzanetakis], [http://recherche.ircam.fr/equipes/analyse-synthese/home.html Mathieu Lagrange], [http://sness.net Steven Ness]
|-
! TLN3
| Post-Processing 2 of Marsyas similarity results || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/TLN2.pdf PDF] || [http://www.cs.uvic.ca/~gtzan George Tzanetakis], [http://recherche.ircam.fr/equipes/analyse-synthese/home.html Mathieu Lagrange], [http://sness.net Steven Ness]
|}

====Broad Categories====
'''NS''' = Not Similar 
'''SS''' = Somewhat Similar 
'''VS''' = Very Similar 

=====Understanding Summary Measures=====
'''Fine''' = Has a range from 0 (failure) to 100 (perfection). 
'''Broad''' = Has a range from 0 (failure) to 2 (perfection) as each query/candidate pair is scored with either NS=0, SS=1 or VS=2. 

==Human Evaluation==
===Overall Summary Results===

<csv p=3>2010/ams/AMS2010summary_evalutron.csv</csv>
 
'''Note:RZ1''' is the random result for comparing purpose.

===Friedman's Tests===
====Friedman's Test (FINE Scores)====
The Friedman test was run in MATLAB against the '''Fine''' summary data over the 100 queries. 
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

<csv p=3>2010/ams/evalutron.fine.friedman.tukeyKramerHSD.csv</csv>

[[File:2010AMS.evalutron.fine.friedman.tukeyKramerHSD.png|500px]]

====Friedman's Test (BROAD Scores)====
The Friedman test was run in MATLAB against the '''BROAD''' summary data over the 100 queries. 
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

<csv p=3>2010/ams/evalutron.cat.friedman.tukeyKramerHSD.csv</csv>

[[File:2010AMS.evalutron.cat.friedman.tukeyKramerHSD.png|500px]]

===Summary Results by Query===
====FINE Scores====
These are the mean FINE scores per query assigned by Evalutron graders. The FINE scores for the 5 candidates returned per algorithm, per query, have been averaged. Values are bounded between 0 and 100. A perfect score would be 100. Genre labels have been included for reference.

<csv p=1>2010/ams/fine_scores.csv</csv>

====BROAD Scores====
These are the mean BROAD scores per query assigned by Evalutron graders. The BROAD scores for the 5 candidates returned per algorithm, per query, have been averaged. Values are bounded between 0 (not similar) and 2 (very similar). A perfect score would be 2. Genre labels have been included for reference.

<csv p=1>2010/ams/cat_scores.csv</csv>

===Raw Scores===
The raw data derived from the Evalutron 6000 human evaluations are located on the [[2010:Audio Music Similarity and Retrieval Raw Data]] page.

==Metadata and Distance Space Evaluation==
The following reports provide evaluation statistics based on analysis of the distance space and metadata matches and include:
* Neighbourhood clustering by artist, album and genre
* Artist-filtered genre clustering
* How often the triangular inequality holds
* Statistics on 'hubs' (tracks similar to many tracks) and orphans (tracks that are not similar to any other tracks at N results).

=== Reports ===

'''BWL1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/BWL1/report.txt Dmitry Bogdanov, Nicolas Wack, Cyril Laurier] 
'''PS1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/PS1/report.txt Tim Pohle, Dominik Schnitzer] 
'''PSS1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/PSS1/report.txt Tim Pohle, Klaus Seyerlehner, Dominik Schnitzer] 
'''RZ1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/RZ1/report.txt Dmitry Rainer Zufall] 
'''SSPK2''' = [https://music-ir.org/mirex/results/2010/ams/statistics/SSPK2/report.txt Klaus Seyerlehner, Markus Schedl, Tim Pohle, Peter Knees] 
'''TLN1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/TLN1/report.txt George Tzanetakis, Mathieu Lagrange, Steven Ness] 
'''TLN2''' = [https://music-ir.org/mirex/results/2010/ams/statistics/TLN2/report.txt George Tzanetakis, Mathieu Lagrange, Steven Ness] 
'''TLN3''' = [https://music-ir.org/mirex/results/2010/ams/statistics/TLN3/report.txt George Tzanetakis, Mathieu Lagrange, Steven Ness] 
== Run Times ==
<csv>2010/ams/audiosim.runtime.csv</csv>

2010:Audio Music Similarity and Retrieval Results

2010-08-04T00:24:05Z

Kriswest: /* FINE Scores */

== Introduction ==
These are the results for the 2010 running of the Audio Music Similarity and Retrieval task set. For background information about this task set please refer to the Audio Music Similarity and Retrieval page.

Each system was given 7000 songs chosen from IMIRSEL's "uspop", "uscrap" and "american" "classical" and "sundry" collections. Each system then returned a 7000x7000 distance matrix. 100 songs were randomly selected from the 10 genre groups (10 per genre) as queries and the first 5 most highly ranked songs out of the 7000 were extracted for each query (after filtering out the query itself, returned results from the same artist were also omitted). Then, for each query, the returned results (candidates) from all participants were grouped and were evaluated by human graders using the Evalutron 6000 grading system. Each individual query/candidate set was evaluated by a single grader. For each query/candidate pair, graders provided two scores. Graders were asked to provide 1 categorical '''BROAD''' score with 3 categories: NS,SS,VS as explained below, and one '''FINE''' score (in the range from 0 to 100). A description and analysis is provided below.

The systems read in 30 second audio clips as their raw data. The same 30 second clips were used in the grading stage.

=== General Legend ===

==== Team ID ====
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! BWL1
| MTG-AMS || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/BWL1.pdf PDF] || [http://mtg.upf.edu Dmitry Bogdanov], [http://mtg.upf.edu Nicolas Wack], [http://mtg.upf.edu Cyril Laurier]
|-
! PS1
| PS09 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/PS1.pdf PDF] || [http://www.cp.jku.at/ Tim Pohle], [http://www.cp.jku.at/ Dominik Schnitzer]
|-
! PSS1
| PSS10 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/PSS1.pdf PDF] || [http://www.cp.jku.at/ Tim Pohle], [http://www.cp.jku.at Klaus Seyerlehner], [http://www.cp.jku.at/ Dominik Schnitzer]
|-
! RZ1
| RND || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/RZ1.pdf PDF] || [http://www.cp.jku.at Rainer Zufall]
|-
! SSPK2
| cbmr_sim || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/SSPK2.pdf PDF] || [http://www.cp.jku.at Klaus Seyerlehner], [http://www.cp.jku.at Markus Schedl], [http://www.cp.jku.at Tim Pohle], [http://www.cp.jku.at Peter Knees]
|-
! TLN1
| MarsyasSimilarity || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/TNL1.pdf PDF] || [http://www.cs.uvic.ca/~gtzan George Tzanetakis], [http://sness.net Steven Ness], [http://recherche.ircam.fr/equipes/analyse-synthese/home.html Mathieu Lagrange]
|-
! TLN2
| Post-Processing 1 of Marsyas similarity results || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/TLN1.pdf PDF] || [http://www.cs.uvic.ca/~gtzan George Tzanetakis], [http://recherche.ircam.fr/equipes/analyse-synthese/home.html Mathieu Lagrange], [http://sness.net Steven Ness]
|-
! TLN3
| Post-Processing 2 of Marsyas similarity results || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/TLN2.pdf PDF] || [http://www.cs.uvic.ca/~gtzan George Tzanetakis], [http://recherche.ircam.fr/equipes/analyse-synthese/home.html Mathieu Lagrange], [http://sness.net Steven Ness]
|}

====Broad Categories====
'''NS''' = Not Similar 
'''SS''' = Somewhat Similar 
'''VS''' = Very Similar 

=====Understanding Summary Measures=====
'''Fine''' = Has a range from 0 (failure) to 100 (perfection). 
'''Broad''' = Has a range from 0 (failure) to 2 (perfection) as each query/candidate pair is scored with either NS=0, SS=1 or VS=2. 

==Human Evaluation==
===Overall Summary Results===

<csv p=3>2010/ams/AMS2010summary_evalutron.csv</csv>
 
'''Note:RZ1''' is the random result for comparing purpose.

===Friedman's Tests===
====Friedman's Test (FINE Scores)====
The Friedman test was run in MATLAB against the '''Fine''' summary data over the 100 queries. 
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

<csv p=3>2010/ams/evalutron.fine.friedman.tukeyKramerHSD.csv</csv>

[[File:2010AMS.evalutron.fine.friedman.tukeyKramerHSD.png|500px]]

====Friedman's Test (BROAD Scores)====
The Friedman test was run in MATLAB against the '''BROAD''' summary data over the 100 queries. 
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

<csv p=3>2010/ams/evalutron.cat.friedman.tukeyKramerHSD.csv</csv>

[[File:2010AMS.evalutron.cat.friedman.tukeyKramerHSD.png|500px]]

===Summary Results by Query===
====FINE Scores====
These are the mean FINE scores per query assigned by Evalutron graders. The FINE scores for the 5 candidates returned per algorithm, per query, have been averaged. Values are bounded between 0 and 100. A perfect score would be 100. Genre labels have been included for reference.

<csv p=1>2010/ams/fine_scores.csv</csv>

====BROAD Scores====
These are the mean BROAD scores per query assigned by Evalutron graders. The BROAD scores for the 5 candidates returned per algorithm, per query, have been averaged. Values are bounded between 0 (not similar) and 2 (very similar). A perfect score would be 2. Genre labels have been included for reference.

<csv p=3>2010/ams/cat_scores.csv</csv>

===Raw Scores===
The raw data derived from the Evalutron 6000 human evaluations are located on the [[2010:Audio Music Similarity and Retrieval Raw Data]] page.

==Metadata and Distance Space Evaluation==
The following reports provide evaluation statistics based on analysis of the distance space and metadata matches and include:
* Neighbourhood clustering by artist, album and genre
* Artist-filtered genre clustering
* How often the triangular inequality holds
* Statistics on 'hubs' (tracks similar to many tracks) and orphans (tracks that are not similar to any other tracks at N results).

=== Reports ===

'''BWL1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/BWL1/report.txt Dmitry Bogdanov, Nicolas Wack, Cyril Laurier] 
'''PS1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/PS1/report.txt Tim Pohle, Dominik Schnitzer] 
'''PSS1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/PSS1/report.txt Tim Pohle, Klaus Seyerlehner, Dominik Schnitzer] 
'''RZ1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/RZ1/report.txt Dmitry Rainer Zufall] 
'''SSPK2''' = [https://music-ir.org/mirex/results/2010/ams/statistics/SSPK2/report.txt Klaus Seyerlehner, Markus Schedl, Tim Pohle, Peter Knees] 
'''TLN1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/TLN1/report.txt George Tzanetakis, Mathieu Lagrange, Steven Ness] 
'''TLN2''' = [https://music-ir.org/mirex/results/2010/ams/statistics/TLN2/report.txt George Tzanetakis, Mathieu Lagrange, Steven Ness] 
'''TLN3''' = [https://music-ir.org/mirex/results/2010/ams/statistics/TLN3/report.txt George Tzanetakis, Mathieu Lagrange, Steven Ness] 
== Run Times ==
<csv>2010/ams/audiosim.runtime.csv</csv>

2010:Audio Music Similarity and Retrieval Results

2010-08-04T00:22:43Z

Kriswest: /* Introduction */

== Introduction ==
These are the results for the 2010 running of the Audio Music Similarity and Retrieval task set. For background information about this task set please refer to the Audio Music Similarity and Retrieval page.

Each system was given 7000 songs chosen from IMIRSEL's "uspop", "uscrap" and "american" "classical" and "sundry" collections. Each system then returned a 7000x7000 distance matrix. 100 songs were randomly selected from the 10 genre groups (10 per genre) as queries and the first 5 most highly ranked songs out of the 7000 were extracted for each query (after filtering out the query itself, returned results from the same artist were also omitted). Then, for each query, the returned results (candidates) from all participants were grouped and were evaluated by human graders using the Evalutron 6000 grading system. Each individual query/candidate set was evaluated by a single grader. For each query/candidate pair, graders provided two scores. Graders were asked to provide 1 categorical '''BROAD''' score with 3 categories: NS,SS,VS as explained below, and one '''FINE''' score (in the range from 0 to 100). A description and analysis is provided below.

The systems read in 30 second audio clips as their raw data. The same 30 second clips were used in the grading stage.

=== General Legend ===

==== Team ID ====
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! BWL1
| MTG-AMS || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/BWL1.pdf PDF] || [http://mtg.upf.edu Dmitry Bogdanov], [http://mtg.upf.edu Nicolas Wack], [http://mtg.upf.edu Cyril Laurier]
|-
! PS1
| PS09 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/PS1.pdf PDF] || [http://www.cp.jku.at/ Tim Pohle], [http://www.cp.jku.at/ Dominik Schnitzer]
|-
! PSS1
| PSS10 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/PSS1.pdf PDF] || [http://www.cp.jku.at/ Tim Pohle], [http://www.cp.jku.at Klaus Seyerlehner], [http://www.cp.jku.at/ Dominik Schnitzer]
|-
! RZ1
| RND || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/RZ1.pdf PDF] || [http://www.cp.jku.at Rainer Zufall]
|-
! SSPK2
| cbmr_sim || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/SSPK2.pdf PDF] || [http://www.cp.jku.at Klaus Seyerlehner], [http://www.cp.jku.at Markus Schedl], [http://www.cp.jku.at Tim Pohle], [http://www.cp.jku.at Peter Knees]
|-
! TLN1
| MarsyasSimilarity || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/TNL1.pdf PDF] || [http://www.cs.uvic.ca/~gtzan George Tzanetakis], [http://sness.net Steven Ness], [http://recherche.ircam.fr/equipes/analyse-synthese/home.html Mathieu Lagrange]
|-
! TLN2
| Post-Processing 1 of Marsyas similarity results || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/TLN1.pdf PDF] || [http://www.cs.uvic.ca/~gtzan George Tzanetakis], [http://recherche.ircam.fr/equipes/analyse-synthese/home.html Mathieu Lagrange], [http://sness.net Steven Ness]
|-
! TLN3
| Post-Processing 2 of Marsyas similarity results || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2010/TLN2.pdf PDF] || [http://www.cs.uvic.ca/~gtzan George Tzanetakis], [http://recherche.ircam.fr/equipes/analyse-synthese/home.html Mathieu Lagrange], [http://sness.net Steven Ness]
|}

====Broad Categories====
'''NS''' = Not Similar 
'''SS''' = Somewhat Similar 
'''VS''' = Very Similar 

=====Understanding Summary Measures=====
'''Fine''' = Has a range from 0 (failure) to 100 (perfection). 
'''Broad''' = Has a range from 0 (failure) to 2 (perfection) as each query/candidate pair is scored with either NS=0, SS=1 or VS=2. 

==Human Evaluation==
===Overall Summary Results===

<csv p=3>2010/ams/AMS2010summary_evalutron.csv</csv>
 
'''Note:RZ1''' is the random result for comparing purpose.

===Friedman's Tests===
====Friedman's Test (FINE Scores)====
The Friedman test was run in MATLAB against the '''Fine''' summary data over the 100 queries. 
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

<csv p=3>2010/ams/evalutron.fine.friedman.tukeyKramerHSD.csv</csv>

[[File:2010AMS.evalutron.fine.friedman.tukeyKramerHSD.png|500px]]

====Friedman's Test (BROAD Scores)====
The Friedman test was run in MATLAB against the '''BROAD''' summary data over the 100 queries. 
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

<csv p=3>2010/ams/evalutron.cat.friedman.tukeyKramerHSD.csv</csv>

[[File:2010AMS.evalutron.cat.friedman.tukeyKramerHSD.png|500px]]

===Summary Results by Query===
====FINE Scores====
These are the mean FINE scores per query assigned by Evalutron graders. The FINE scores for the 5 candidates returned per algorithm, per query, have been averaged. Values are bounded between 0 and 100. A perfect score would be 100. Genre labels have been included for reference.

<csv p=3>2010/ams/fine_scores.csv</csv>

====BROAD Scores====
These are the mean BROAD scores per query assigned by Evalutron graders. The BROAD scores for the 5 candidates returned per algorithm, per query, have been averaged. Values are bounded between 0 (not similar) and 2 (very similar). A perfect score would be 2. Genre labels have been included for reference.

<csv p=3>2010/ams/cat_scores.csv</csv>

===Raw Scores===
The raw data derived from the Evalutron 6000 human evaluations are located on the [[2010:Audio Music Similarity and Retrieval Raw Data]] page.

==Metadata and Distance Space Evaluation==
The following reports provide evaluation statistics based on analysis of the distance space and metadata matches and include:
* Neighbourhood clustering by artist, album and genre
* Artist-filtered genre clustering
* How often the triangular inequality holds
* Statistics on 'hubs' (tracks similar to many tracks) and orphans (tracks that are not similar to any other tracks at N results).

=== Reports ===

'''BWL1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/BWL1/report.txt Dmitry Bogdanov, Nicolas Wack, Cyril Laurier] 
'''PS1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/PS1/report.txt Tim Pohle, Dominik Schnitzer] 
'''PSS1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/PSS1/report.txt Tim Pohle, Klaus Seyerlehner, Dominik Schnitzer] 
'''RZ1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/RZ1/report.txt Dmitry Rainer Zufall] 
'''SSPK2''' = [https://music-ir.org/mirex/results/2010/ams/statistics/SSPK2/report.txt Klaus Seyerlehner, Markus Schedl, Tim Pohle, Peter Knees] 
'''TLN1''' = [https://music-ir.org/mirex/results/2010/ams/statistics/TLN1/report.txt George Tzanetakis, Mathieu Lagrange, Steven Ness] 
'''TLN2''' = [https://music-ir.org/mirex/results/2010/ams/statistics/TLN2/report.txt George Tzanetakis, Mathieu Lagrange, Steven Ness] 
'''TLN3''' = [https://music-ir.org/mirex/results/2010/ams/statistics/TLN3/report.txt George Tzanetakis, Mathieu Lagrange, Steven Ness] 
== Run Times ==
<csv>2010/ams/audiosim.runtime.csv</csv>

2010:MIREX2010 Results

2010-08-03T00:47:58Z

Kriswest: Adding results for 6 NEMA DIY based tasks

==OVERALL RESULTS POSTERS (NOT READY YET)==
[https://www.music-ir.org/mirex/results/2010/MIREX2010ResultsPoster1.pdf MIREX 2010 Overall Results Poster #1 (PDF)]

[https://www.music-ir.org/mirex/results/2010/MIREX2010ResultsPoster2.pdf MIREX 2010 Overall Results Poster #2 (PDF)]

==Results by Task ==

===Train-Test Task Set===
* [[2010:Audio_Artist_Identification_Results | Audio Artist Identification Results ]]
* [[2010:Audio_Classical_Composer_Identification_Results | Audio Classical Composer Identification Results ]]
* [[2010:Audio_US_Pop_Genre_Classification_Results | Audio US Pop Genre Classification Results]]
* [[2010:Audio_Latin_Genre_Classification_Results | Audio Latin Genre Classification Results ]]
* [[2010:Audio_Music_Mood_Classification_Results | Audio Music Mood Classification Results ]]

===Other Tasks===

* Audio Beat Tracking Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/mck/ MCK Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/abt/maz/ MAZ Dataset]
* [[2010:Audio_Chord_Detection_Results | Audio Chord Detection]]
* [[2010:Audio_Cover_Song_Identification_Results | Audio Cover Song Identification Results]]
* [[2010:Audio_Music_Similarity_and_Retrieval_Results | Audio Music Similarity and Retrieval Results]]
* [[2010:Symbolic_Melodic_Similarity_Results | Symbolic Melodic Similarity Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/akd/ Audio Key Detection Results]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/aod/ Audio Onset Detection Results]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ate/ Audio Tempo Estimation Results]
* Audio Melody Extraction Results
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/adc04/ ADC04 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex05/ MIREX05 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/indian08/ INDIAN08 Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_0dB/ MIREX09 0dB Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_m5dB/ MIREX09 -5dB Dataset]
** [https://nema.lis.illinois.edu/nema_out/mirex2010/results/ame/mirex09_p5dB/ MIREX09 +5dB Dataset]
* [[2010:Audio_Music_Similarity_and_Retrieval_Results | Audio Music Similarity and Retrieval Results]]
* [[2010:Multiple_Fundamental_Frequency_Estimation_&_Tracking_Results | Multiple Fundamental Frequency Estimation & Tracking Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2010/results/struct/ Music Structure Segmentation Results]
* [[2010:Query-by-Singing/Humming_Results | Query-by-Singing/Humming Results]]
* [[2010:Query-by-Tapping_Results | Query-by-Tapping Results]]

== Machine Specifications ==

[[Category:Results]]

2010:Audio Music Similarity and Retrieval

2010-06-19T01:17:07Z

Kriswest:

== Description ==
As the size of digitial music collections grow, music similarity has an increasingly important role as an aid to music discovery. A music similarity system can help a music consumer find new music by finding the music that is most musically similar to specific query songs (or is nearest to songs that the consumer already likes).

This page presents the Audio Music Similarity Evaluation, including the submission rules and formats. Additionally background information can be found here that should help explain some of the reasoning behind the approach taken in the evaluation. The intention of the Music Audio Search track is to evaluate music similarity searches (A music search engine that takes a single song as a query aka Query-by-example), not playlist generation or music recommendation.

The Audio Music Similarity and Retrieval task has been run in MIREX 2009, 2007, and 2006.

[[2009:Audio_Music_Similarity_and_Retrieval|Audio Music Similarity and Retrieval task in MIREX 2009]] || [[2009:Audio_Music_Similarity_and_Retrieval_Results|Results]]

[[2007:Audio_Music_Similarity_and_Retrieval|Audio Music Similarity and Retrieval task in MIREX 2007]] || [[2007:Audio_Music_Similarity_and_Retrieval_Results|Results]]

[[2006:Audio_Music_Similarity_and_Retrieval|Audio Music Similarity and Retrieval task in MIREX 2006]] || [[2006:Audio_Music_Similarity_and_Retrieval_Results|Results]]

=== Task specific mailing list ===
In the past we have use a specific mailing list for the discussion of this task and related tasks (e.g., [[2010:Audio Classification (Train/Test) Tasks]], [[2010:Audio Cover Song Identification]], [[2010:Audio Tag Classification]], [[2010:Audio Music Similarity and Retrieval]]). This year, however, we are asking that all discussions take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have an question or comment, simply include the task name in the subject heading.

== Data ==
Collection statistics: 7000 30-second audio clips drawn from 10 genres (700 clips from each genre).

The Genres that data was drawn from are:
*Blues
*Jazz
*Country/Western
*Baroque
*Classical
*Romantic
*Electronica
*Hip-Hop
*Rock
*HardRock/Metal

=== Audio formats ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 22 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV
* clip length: 30 secs from the middle of each file

== Evaluation ==
Two distinct evaluations will be performed
* Human Evaluation
* Objective statistics derived from the results lists

Note that at MIREX 2006 particpating algorithms were required to return full distance matrices showing the distance between all tracks, however, in subsequent years we have also supported sparse distance matrix format (detailed below) where only the distances of the top 100 results for each query in the collection are returned.

=== Human Evaluation ===
The primary evaluation will involve subjective judgments by human evaluators of the retrieved sets using IMIRSEL's Evalutron 6000 system. This year algorithms will be presented with the same 30 second preview clip that will be reviewed by the human evaluators.

* Evaluator question: Given a search based on track A, the following set of results was returned by all systems. Please place each returned track into one of three classes (not similar, somewhat similar, very similar) and provide an inidcation on a continuous scale of 0 - 10 of high similar the track is to the query.
* ~120 randomly selected queries, 5 results per query, 1 set of eyes, ~10 participating labs
* Higher number of queries preferred as IR research indicates variance is in queries
* The songs by the same artist as the query will be filtered out of each result list (artist-filtering) to avoid colouring an evaluators judgement (a cover song or song by the same artist in a result list is likely to reduce the relative ranking of other similar but independent songs - use of songs by the same artist may allow over-fitting to affect the results)
* It will be possible for researchers to use this data for other types of system comparisons after MIREX 2010 results have been finalized.
* Human evaluation to be designed and led by IMIRSEL following a similar format to that used at MIREX 2006 (see: [[2006:Evalutron6000_Issues|Evalutron Issues in MIREX 2006]]).
* Human evaluators will be drawn from the participating labs (and any volunteers from IMIRSEL or on the MIREX lists)

=== Objective Statistics derived from the distance matrix ===
Statistics of each distance matrix will be calculated including:

* Average % of Genre, Artist and Album matches in the top 5, 10, 20 & 50 results - Precision at 5, 10, 20 & 50
* Average % of Genre matches in the top 5, 10, 20 & 50 results after artist filtering of results
* Average % of available Genre, Artist and Album matches in the top 5, 10, 20 & 50 results - Recall at 5, 10, 20 & 50 (just normalising scores when less than 20 matches for an artist, album or genre are available in the database)
* Always similar - Maximum # times a file was in the top 5, 10, 20 & 50 results
* % File never similar (never in a top 5, 10, 20 & 50 result list)
* % of 'test-able' song triplets where triangular inequality holds
** Note that as we are not requiring full distance matrices this year we will only be testing triangles that are found in the sparse distance matrix.
* Plot of the "number of times similar curve" - plot of song number vs. number of times it appeared in a top 20 list with songs sorted according to number times it appeared in a top 20 list (to produce the curve). Systems with a sharp rise at the end of this plot have "hubs", while a long 'zero' tail shows many never similar results.

=== Runtimes ===
In addition computation times for feature extraction/Index-building and querying
will be measured.

== Submission format ==
Submission to this task will have to conform to a specified format detailed below.

=== Implementation details ===
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.

Submissions will have to output either a full distance matrix or a search results file with the top 100 search results for each track in the collection. This list of results will be used to extract the artist-filtered results to present to the human evaluators and will facilitate the computation of the objective statistics.

=== I/O formats ===
In this section the input and output files used in this task are described as
are the command line calling format requirements for submissions.

==== Audio collection list file (input)====
The list file passed for feature extraction and indexing will be a simple ASCII list file. This file will contain one path per line with no header line, all paths will be absolute (full paths).

e.g.

/aDirectory/collectionFolder/b002342.wav
/aDirectory/collectionFolder/a005921.wav
...

==== Distance matrix output files ====
Participants should return one of two available output file formats, a full distance matrix or a sparse distance matrix. The sparse distance matrix format is preferred (as the dense distance matrices can be very large).

===== Sparse Distance Matrix =====
If computation or exhaustive search is a concern or not a normal output of the indexing algorithm employed, the sparse distance matric format detailed below may be used:

A simple ASCII file listing a name for the algorithm and the top 100 search results for every track in the collection.

This file should start with a header line with a name for the algorithm and should be followed by the results for one query per line, prefixed by the filename portion of the query path. This should be followed by a tab character and a tab separated, ordered list of the top 100 search results. Each result should include the result filename (e.g. a034728.wav) and the distance (e.g. 17.1 or 0.23) separated by a a comma.

<pre>
MyAlgorithm (my.email@address.com)
<example 1 filename>\t<result 1 name>,<result 1 distance>,\t<result 2 name>,<result 2 distance>, ... \t<result 100 name>,<result 100 distance>
<example 2 filename>\t<result 1 name>,<result 1 distance>,\t<result 2 name>,<result 2 distance>, ... \t<result 100 name>,<result 100 distance>
...
</pre>

which might look like:

<pre>
MyAlgorithm (my.email@address.com)
a009342.wav b229311.wav,0.16 a023821.wav,0.19 a001329,0.24 ... etc.
a009343.wav a661931.wav,0.12 a043322.wav,0.17 c002346,0.21 ... etc.
a009347.wav a671239.wav,0.13 c112393.wav,0.20 b083293,0.25 ... etc.
...
</pre>

The path to which this list file should be written must be accepted as a parameter on the command line.

===== Full Distance Matrix =====
Full distance matrix files should be generated in the the following format:

* A simple ASCII file listing a name for the algorithm on the first line,
* Numbered paths for each file appearing in the matrix, these can be in any order (i.e. the files don't have to be i the same order as they appeared in the list file) but should index into the columns/rows of of the distance matrix.
* A line beginning with 'Q/R' followed by a tab and tab separated list of the numbers 1 to N, where N is the files covered by the matrix.
* One line per file in the matrix give the distances of that files to each other file in the matrix. All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a single tab character. Obviously the diagonal of the matrix (distance or a track to itself) should be zero.

<pre>
Distance matrix header text with system name
1\t</path/to/audio/file/1.wav>
2\t</path/to/audio/file/2.wav>
3\t</path/to/audio/file/3.wav>
...
N\t</path/to/audio/file/N.wav>
Q/R\t1\t2\t3\t...\tN
1\t0.0\t<dist 1 to 2>\t<dist 1 to 3>\t...\t<dist 1 to N>
2\t<dist 2 to 1>\t0.0\t<dist 2 to 3>\t...\t<dist 2 to N>
3\t<dist 3 to 2>\t<dist 3 to 2>\t0.0\t...\t<dist 3 to N>
...\t...\t...\t...\t...\t...
N\t<dist N to 1>\t<dist N to 2>\t<dist N to 3>\t...\t0.0
</pre>

which might look like:

<pre>
Example distance matrix 0.1
1 /path/to/audio/file/1.wav
2 /path/to/audio/file/2.wav
3 /path/to/audio/file/3.wav
4 /path/to/audio/file/4.wav
Q/R 1 2 3 4
1 0.00000 1.24100 0.2e-4 0.42559
2 1.24100 0.00000 0.62640 0.23564
3 50.2e-4 0.62640 0.00000 0.38000
4 0.42559 0.23567 0.38000 0.00000
</pre>

==== Example submission calling formats ====
extractFeatures.sh /path/to/scratch/folder /path/to/collectionListFile.txt
Query.sh /path/to/scratch/folder /path/to/collectionListFile.txt /path/to/outputResultsFile.txt

or

doAudioSim.sh -numThreads 8 /path/to/scratch/folder /path/to/collectionListFile.txt /path/to/outputResultsFile.txt

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of
dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the
information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks,
hard limits on the runtime of submissions are specified.

A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

== AMS evaluation software ==
The legacy software for performing various AMS related function is available [https://www.music-ir.org/mirex/results/2010/AMS_tools.zip here], [https://www.music-ir.org/mirex/results/2010/AMS_TOOLS_README%20.txt README file]. It maybe used to benchmark systems prior to submission and to check distance matrix file formats.

This tool set supports the following functions:
* the import of collection metadata from a delimited text file (e.g. TAB or CSV)
* the selection of a stratified random list of queries from the collection (i.e. an equal number of queries are chosen for each class of a particular metadata field, such as genre).
* the generation of results from distance matrices based on a list of pre-chosen queries.
* (pseudo-)objective statistical evaluation of distance matrices by comparing query metadata to the metadata of the top N results retrieved. Supports artist, album, genre and artist-filtered genre (where results form the same artist as query are skipped). Additionally, the number tracks never returned as results for all possible queries (orphans) and the largest hub (track similar to the most other tracks) are measured. Finally, the number of cases where the triangular inequality holds.
* preparation and post processing of results for the IMIRSEL Evalutron 6k human evaluation interface.

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Audio Music Similarity and Retrieval

2010-06-19T01:15:45Z

Kriswest: Linking AMS eval software

2010:Score File Format

2010-06-09T21:17:53Z

Kriswest: Created page with 'This page describes the score file format proposed for the MIREX 2007 2007:Real-time Audio to Score Alignment (a.k.a Score Following) task and likely to be used for the MIREX…'

This page describes the score file format proposed for the MIREX 2007 [[2007:Real-time Audio to Score Alignment (a.k.a Score Following)]] task and likely to be used for the MIREX 2010 [[2010:Real-time_Audio_to_Score_Alignment_(a.k.a_Score_Following)]] task.

= Description =

The proposed MIREX score file format is in ASCII text, with one line per note or other event such as trill, tempo change, or signature change.
Lines are separated by newline characters and contain 10 fields, separated by whitespace.

Notes are time-ordered, tempo or signature changes can come at any place.
(This could be restrained to "must come at the beginning of the file", if this helps someone.)

The MIREX text score file format will be also used as alignment reference format, where the clock time is no longer the score time but the aligned time in the performance, and maybe as alignment result file.

== Columns ==

Note that we introduced an field ''event ID'', that unambiguously links events across all three types of files.

The last field ''stream-id'' can serve later to separate different channels, voices, or streams of events.

The 9th field ''cue'' serves to mark events that are musically important, e.g. because they synchronise accompaniment with the performance. This could later be used for a more detailed evaluation.

== Example Score Files ==

Here is a zip with a few examples. Please have a look at them and use them to test your parsers. The corresponding midi files are in there, also, except for Anthemes 2, but that one has a trill...

The metric positions are not very pretty, but it's the best a simple algorithm can do. The absolute time positions are good, anyway.

[[Image:2007_mirex2007-score-examples.zip]]

== Note on higher-level formats ==

It would of course be great if there was an even better format, maybe XML-based, such as the [http://www.mx.dico.unimi.it MX format] that was mentioned on the list. From this format, the text-based format could be easily generated, and even MIDI for those participants who couldn't otherwise implement a parser.
We'd be glad to hear a concrete proposal.

= Event Types=

== NOTE EVENTS ==

=== Template ===

''event-id onset-position onset-ms type pitch interval duration-beat duration-ms cue-num stream-id''

=== Example ===
<pre>
1 1 0 note 72 0 2 4000 1 0
2 3+1/4 4500 note 60 0 0+1/4 500 2 0
3 3+3/4 5000 note 58 0 0+1/2 1000 3 0
4 3+3/4 5000 note 48 0 0+1/2 1000 0 0
</pre>

=== Columns for note events ===

# event ID [int > 1]
# event onset position [measure+rational] (ex. 42+1/4, 28+3/7)
# event onset clock time [ms] (must be consistent with the onset position)
# event type [symbol: note, trill, tremolo, ...] (for tempo and signature, see below)
# event pitch [float MIDI note number]
# event interval [float halftones] (0 for non-trill)
# event duration [measure+rational]
# event duration [ms]
# cue number [int > 0, 0 = no cue]
# stream ID [int]

== TEMPO CHANGES ==

=== Template ===

'''0''' ''onset-position onset-ms'' '''tempo''' ''tempo-bpm - - - - stream-id ''

=== Example ===
<pre>
0 1 0 tempo 120 - - - - 0
</pre>

This signifies a tempo of 120 beats per minute.

=== Columns for tempo changes ===

# event ID [constant int = 0]
# event onset position [measure+rational] (ex. 42+1/4, 28+3/7)
# event onset clock time [ms] (must be consistent with the onset position)
# event type [constant symbol = '''tempo''']
# tempo [float bpm]
# unused
# unused
# unused
# unused
# stream ID [int]

== METER CHANGES ==

=== Template ===

'''0''' ''onset-position onset-ms'' '''meter''' ''numerator denominator - - - stream-id ''

=== Example ===
<pre>
0 1 0 meter 4 4 - - - 0
</pre>

This signifies a rhythmic signature of 4/4.

=== Columns for signature changes ===

# event ID [constant int = 0]
# event onset position [measure+rational] (ex. 42+1/4, 28+3/7)
# event onset clock time [ms] (must be consistent with the onset position)
# event type [constant symbol = '''meter''']
# meter numerator [int]
# meter denominator [int]
# unused
# unused
# unused
# stream ID [int]

= Evaluation Metrics =

times:

* t_a reference alignment time
* t_r reporting time
* t_e estimated time

measures:

* d_l = t_r - t_e system latency (between reporting and estimation)
* d_o = t_a - t_r offset or lag between reporting time and reference
* d_e = t_a - t_e error between estimation time and reference

2010:Real-time Audio to Score Alignment (a.k.a Score Following)

2010-06-09T21:15:01Z

Kriswest: /* Evolution */

''Real-time Audio to Score Alignment'', also known as ''Score Following''

== Description ==
Score Following is the real-time alignment of an incoming music signal to the music score. The music signal can be symbolic (MIDI) or audio, but we will concentrate here on audio following, unless there are some candidates who'd want their symbolic followers to be evaluated and can propose reference data.

This page describes a proposal for evaluation of score following systems. Discussion of the evaluation procedures on the [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com01 Score Following contest planning list] will be documented on the [[Score Following]] page. A full digest of the discussions is available to subscribers from the [https://mail.lis.uiuc.edu/mailman/private/mrx-com01/ Score Following contest planning list archives].

Submissions will be required to estimate alignment precision according to the indexed times. In order for your system to participate, please specify the type of alignment (monophonic, polyphonic), type of training and realtime performance, also separated into two domains (upon enough submissions) for symbolic and audio systems. Note that we also do accept systems that don't run in real-time in practice, as soon as their algorithm is on-line, i.e. without making use of global knowledge of the input.

== Data ==
46 recordings and their corresponding MIDI representations of the score will be used in the evaluation. These 46 excerpts were extracted from 4 distinct musical pieces.
Recordings are in 44.1Khz 16bit wav format. The reference scores are in MIDI format.

== Evolution ==
This year's changes are proposed here and on the list, and are currently under discussion. Proposed changes are mainly about the score and reference file formats and the evaluation metrics:

* the proposed new score and reference file format is described here: [[2010:Score File Format]]
* evaluation metrics will more closely reflect the different approaches and applications of score following

See the details of last year's proposal on the [[2006:Score_Following_Proposal|MIREX 2006 Wiki]]

== Evaluation procedures ==

Evaluation procedure consists of running score followers on a database of aligned audio to score where the database contains score, and performance audio (for system call) and a reference alignment (for evaluations) -- See below for details.

=== I/O Format ===
Each system should conform to the following format:

''doScofo.sh "/path/to/audiofile.wav" "/path/to/midi_score_file.wav" "/path/to/result/filename.txt"

The stdout and stderr will be logged.

"/path/to/result/filenam.txt" should be have one line per detected note with the following 4 columns

1. estimated note onset time in performance audio file (ms)
2. detection time relative to performance audio file (ms)
3. note start time in score (ms)
4. MIDI note number in score (int)

Example :
''1800 1800 0 75''
''2021 2022 187.5 73''
''... ... ... ...''

Remarks: The third column with the detected note's start time in score serves as the unique identifier of a note (or chord for polyphonic scores) that links it to the ground truth onset of that note within the reference alignment files. The fourth column of MIDI note number is there only for your convenience, to know your way around in the result files, if you know the melody in MIDI.

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of
dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the
information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks,
hard limits on the runtime of submissions are specified.

A hard limit of 12 hours will be imposed on rthe total runtime of algorithms. Submissions that exceed this runtime may not receive a result.

== Submission opening date ==
Friday 4th June 2010

== Submission closing date ==
TBA

2010:Audio Tempo Estimation

2010-06-07T13:41:12Z

Kriswest:

== Description ==
This task compares current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for the extraction of perceptual tempo.

We differentiate between notated tempo and perceived tempo. If you have the notated tempo (e.g., from the score) it is straightforward attach a tempo annotation to an excerpt and run a contest for algorithms to predict the notated tempo. For excerpts for which we have no "official" tempo annotation, we can also annotate the *perceived* tempo. This is not a straightforward task and needs to be done carefully. If you ask a group of listeners (including skilled musicians) to annotate the tempo of music excerpts, they can give you different answers (they tap at different metrical levels) if they are unfamiliar with the piece. For some excerpts the perceived pulse or tempo is less ambiguous and everyone taps at the same metrical level, but for other excerpts the tempo can be quite ambiguous and you get a complete split across listeners.

The annotation of perceptual tempo can take several forms: a probability density function as a function of tempo; a series of tempos, ranked by their respective perceptual salience; etc. These measures of perceptual tempo can be used as a ground truth on which to test algorithms for tempo extraction. The dominant perceived tempo is sometimes the same as the notated tempo but not always. A piece of music can "feel" faster or slower than it's notated tempo in that the dominant perceived pulse can be a metrical level higher or lower than the notated tempo.

There are several reasons to examine the perceptual tempo, either in place of or in addition to the notated tempo. For many applications of automatic tempo extractors, the perceived tempo of the music is more relevant than the notated tempo. An automatic playlist generator or music navigator, for instance, might allow listeners to select or filter music by its (automatically extracted) tempo. In this case, the "feel", or perceptual tempo may be more relevant than the notated tempo. An automatic DJ apparatus might also perform better with a representation of perceived tempo rather than notated tempo.

A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.

== Data ==
=== Collections ===
MIREX 2006 Tempo dataset collected by Martin F. McKinney (Philips) and Dirk Moelants (IPEM, Ghent University). Composed of 160 30-second clips in WAV format with annotated tempos.

=== Audio Formats ===
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* 30 second clips

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input data ===
Individual audio files in WAV format (30-second clips drawn from the 140 unseen tracks in the dataset). The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.

=== Output Data ===
Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength ST2 (not output) is simply 1 - ST1. The tempo estimates from each algorithm should be written to a text file in the following format:

T1<tab>T2<tab>ST1

E.g.
60 180 0.7

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the tempo estimation detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as ''%input'' and the output file path and name as ''%output'', a program called foobar could be called from the command-line as follows:

foobar %input %output
or
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

== Evaluation Procedures ==

This section focuses on the mechanics of the method while we discuss the data (music excerpts and perceptual data) in the next section. There are two general steps to the method: 1) collection of perceptual tempo annotations; and 2) evaluation of tempo extraction algorithms.

=== Perceptual tempo data collection ===

The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners were asked to tap to the beat of a series of musical excerpts. Responses were collected and their perceived tempo was calculated. For each excerpt, a distribution of perceived tempo was generated. A relatively simple form of perceived tempo was proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt were taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo.

==== References ====
* McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://www-gewi.uni-graz.at/staff/parncutt/cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
* Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://icmpc8.umn.edu/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF

=== Evaluation of tempo extraction algorithms ===
Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:

P = ST1 * TT1 + (1 - ST1) * TT2

where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to within 8%, and TT2 is the ability of the algorithm to identify T2 to within 8%. No credit will be given for tempi other than T1 and T2.

The algorithm with the best average P-score will achieve the highest rank in the task.

== Relevant Test Collections ==
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of McKinney/Moelants previous experiments (See McKinney/Moelants ICMPC paper above).

Excerpts were selected to provide:

* stable tempo within each excerpt
* a good distribution of tempi across excerpts
* a large variety of instrumentation and beat strengths (with and without percussion)
* a variation of musical styles, including many non-western styles
* the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).

We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.

===Practice Data===
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 8 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Submission opening date ==
Tuesday 9th June 2010

== Submission closing date ==
TBA

2010:Audio Tempo Estimation

2010-06-07T13:33:15Z

Kriswest: Created page with '== Description == This task compares current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for t…'

== Description ==
This task compares current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for the extraction of perceptual tempo.

We differentiate between notated tempo and perceived tempo. If you have the notated tempo (e.g., from the score) it is straightforward attach a tempo annotation to an excerpt and run a contest for algorithms to predict the notated tempo. For excerpts for which we have no "official" tempo annotation, we can also annotate the *perceived* tempo. This is not a straightforward task and needs to be done carefully. If you ask a group of listeners (including skilled musicians) to annotate the tempo of music excerpts, they can give you different answers (they tap at different metrical levels) if they are unfamiliar with the piece. For some excerpts the perceived pulse or tempo is less ambiguous and everyone taps at the same metrical level, but for other excerpts the tempo can be quite ambiguous and you get a complete split across listeners.

The annotation of perceptual tempo can take several forms: a probability density function as a function of tempo; a series of tempos, ranked by their respective perceptual salience; etc. These measures of perceptual tempo can be used as a ground truth on which to test algorithms for tempo extraction. The dominant perceived tempo is sometimes the same as the notated tempo but not always. A piece of music can "feel" faster or slower than it's notated tempo in that the dominant perceived pulse can be a metrical level higher or lower than the notated tempo.

There are several reasons to examine the perceptual tempo, either in place of or in addition to the notated tempo. For many applications of automatic tempo extractors, the perceived tempo of the music is more relevant than the notated tempo. An automatic playlist generator or music navigator, for instance, might allow listeners to select or filter music by its (automatically extracted) tempo. In this case, the "feel", or perceptual tempo may be more relevant than the notated tempo. An automatic DJ apparatus might also perform better with a representation of perceived tempo rather than notated tempo.

A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.

== Data ==
=== Collections ===
MIREX 2006 Beat tracking dataset collected by Martin F. McKinney (Philips) and Dirk Moelants (IPEM, Ghent University). Composed of 160 30-second clips in WAV format with annotated tempos.

=== Audio Formats ===
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* 30 second clips

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input data ===
Individual audio files in WAV format (30-second clips drawn from the 140 unseen tracks in the dataset). The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.

=== Output Data ===
Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength ST2 (not output) is simply 1 - ST1. The tempo estimates from each algorithm should be written to a text file in the following format:

T1<tab>T2<tab>ST1

E.g.
60 180 0.7

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the tempo estimation detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as ''%input'' and the output file path and name as ''%output'', a program called foobar could be called from the command-line as follows:

foobar %input %output
or
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

== Evaluation Procedures ==

This section focuses on the mechanics of the method while we discuss the data (music excerpts and perceptual data) in the next section. There are two general steps to the method: 1) collection of perceptual tempo annotations; and 2) evaluation of tempo extraction algorithms.

=== Perceptual tempo data collection ===

The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners were asked to tap to the beat of a series of musical excerpts. Responses were collected and their perceived tempo was calculated. For each excerpt, a distribution of perceived tempo was generated. A relatively simple form of perceived tempo was proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt were taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo.

==== References ====
* McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://www-gewi.uni-graz.at/staff/parncutt/cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
* Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://icmpc8.umn.edu/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF

=== Evaluation of tempo extraction algorithms ===
Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:

P = ST1 * TT1 + (1 - ST1) * TT2

where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to within 8%, and TT2 is the ability of the algorithm to identify T2 to within 8%. No credit will be given for tempi other than T1 and T2.

The algorithm with the best average P-score will achieve the highest rank in the task.

== Relevant Test Collections ==
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of McKinney/Moelants previous experiments (See McKinney/Moelants ICMPC paper above).

Excerpts were selected to provide:

* stable tempo within each excerpt
* a good distribution of tempi across excerpts
* a large variety of instrumentation and beat strengths (with and without percussion)
* a variation of musical styles, including many non-western styles
* the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).

We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.

===Practice Data===
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 8 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Submission opening date ==
Tuesday 9th June 2010

== Submission closing date ==
TBA

2010:MIREX HOME

2010-06-07T13:15:39Z

Kriswest: /* MIREX 2010 Evaluation Tasks */

==Welcome to MIREX 2010==
This is the main page for the sixth running of the Music Information Retrieval Evaluation eXchange (MIREX 2010). The International Music Information Retrieval Systems Evaluation Laboratory ([https://music-ir.org/evaluation IMIRSEL]) at the Graduate School of Library and Information Science ([http://www.lis.illinois.edu GSLIS]), University of Illinois at Urbana-Champaign ([http://www.illinois.edu UIUC]) is the principal organizer of MIREX 2010.

The MIREX 2010 community will hold its annual meeting as part of [http://ismir2010.ismir.net/ The 11th International Conference on Music Information Retrieval], ISMIR 2010, which will be held in Utrecht, Netherlands, from August 9th to 13th, 2010. The MIREX plenary (working lunch) and poster sessions will be held Wednesday, 11 August 2010.

J. Stephen Downie 
Director, IMIRSEL 

==MIREX 2010 Submission Instructions==
* Be sure to read through the rest of this page
* Be sure to read though the task pages for which you are submitting
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[2010:MIREX 2010 Submission Instructions | MIREX 2010 Submission Instructions ]] including both the tutorial video and the text

===MIREX 2010 Evaluation Tasks===

The IMIRSEL team at UIUC solicited proposals for evaluation tasks to be performed at the Music Information Retrieval Evaluation eXchange 2010 (MIREX 2010) and polled the community on their likelihood of participation in each task. A summary of the responses from the community is given below:

Results as of Monday 24th May 2010:

Total individual responses = 74

<csv p=0>2010/poll/MIREX_Task_Participation_Poll.csv</csv>

Hence, the IMIRSEL team has decided to attempt the running of the following tasks at MIREX 2010:
* [[2010:Audio Classification (Train/Test) Tasks]], incorporating:
** Audio Artist Identification
** Audio US Pop Genre Classification
** Audio Latin Genre Classification
** Audio Music Mood Classification
** Audio Classical Composer Identification
* [[2010:Audio Cover Song Identification]]
* [[2010:Audio Tag Classification]]
* [[2010:Audio Music Similarity and Retrieval]]
* [[2010:Symbolic Melodic Similarity]]
* [[2010:Audio Onset Detection]]
* [[2010:Audio Key Detection]]
* [[2010:Real-time Audio to Score Alignment (a.k.a Score Following)]]
* [[2010:Query by Singing/Humming]]
* [[2010:Audio Melody Extraction]]
* [[2010:Multiple Fundamental Frequency Estimation & Tracking]]
* [[2010:Audio Chord Estimation]]
* <strike>[[2010:Query by Tapping]]</strike>
* [[2010:Audio Beat Tracking]]
* [[2010:Structural Segmentation]]
* [[2010:Audio Tempo Estimation]]

==== New 2010 Proposals ====
* <strike>[[2010:Harmonic Analysis]]</strike>

===Projected dates===
* 1st June 2010: MIREX submission system opens (target date)
* 22nd June - 1st July 2010: Rolling MIREX submission system closures (dates to be announced)
* 15th July 2010: MIREX results posting begins
* 1st August 2010: All MIREX results posted (somewhat hopeful target date)
* 2-6th August 2010: USMIR Summer School
* 9-13th August 2010: ISMIR conference

===Note to New Participants===
Please take the time to read the following review article that explains the history and structure of MIREX.

Downie, J. Stephen (2008). The Music Information Retrieval Evaluation Exchange (2005-2007): 
A window into music information retrieval research.''Acoustical Science and Technology 29'' (4): 247-255. 
Available at: [http://dx.doi.org/10.1250/ast.29.247 http://dx.doi.org/10.1250/ast.29.247]

===Note to All Participants===
Because MIREX is premised upon the sharing of ideas and results, '''ALL''' MIREX participants are expected to:

# submit a DRAFT 2-3 page extended abstract PDF in the ISMIR format about the submitted programme(s) to help us and the community better understand how the algorithm works when submitting their programme(s).
# submit a FINALIZED 2-3 page extended abstract PDF in the ISMIR format prior to ISMIR 2010 for posting on the respective results pages (sometimes the same abstract can be used for multiple submissions; in many cases the DRAFT and FINALIZED abstracts are the same)
# present a poster at the MIREX 2010 poster session at ISMIR 2010 (Wednesday, 11 August 2010)

===Software Dependency Requests===
If you have not submitted to MIREX before or are unsure whether IMIRSEL/NEMA currently supports some of the software/architecture dependencies for your submission a [https://spreadsheets.google.com/embeddedform?formkey=dDltRjc4NDBDdkZiaF9qZXV0bU5ScUE6MA dependency request form is available]. Please submit details of your dependencies on this form and the IMIRSEL team will attempt to satisfy them for you.

Due to the high volume of submissions expected at MIREX 2010, submissions with difficult to satisfy dependencies that the team has not been given sufficient notice of may result in the submission being rejected.

Finally, you will also be expected to detail your software/architecture dependencies in a README file to be provided to the submission system.

==Getting Involved in MIREX 2010==
MIREX is a community-based endeavour. Be a part of the community and help make MIREX 2010 the best yet.

===Mailing List Participation===
If you are interested in formal MIR evaluation, you should also subscribe to the "MIREX" (aka "EvalFest") mail list and participate in the community discussions about defining and running MIREX 2010 tasks. Subscription information at:
[https://mail.lis.illinois.edu/mailman/listinfo/evalfest EvalFest Central].

If you are participating in MIREX 2010, it is VERY IMPORTANT that you are subscribed to EvalFest. Deadlines, task updates and other important information will be announced via this mailing list. Please use the EvalFest for discussion of MIREX task proposals and other MIREX related issues. This wiki (MIREX 2010 wiki) will be used to embody and disseminate task proposals, however, task related discussions should be conducted on the MIREX organization mailing list (EvalFest) rather than on this wiki, but should be summarized here.

Where possible, definitions or example code for new evaluation metrics or tasks should be provided to the IMIRSEL team who will embody them in software as part of the NEMA analytics framework, which will be released to the community at or before ISMIR 2010 - providing a standardised set of interfaces and output to disciplined evaluation procedures for a great many MIR tasks.

===Wiki Participation===
'''''Please note that you may need to create a NEW login for this wiki even if you have a login that you previously used for editing the MIREX 2005, 2006, 2007, 2008 or 2009 wikis.'''''

However, starting in 2010 the MIREX wikis have been merged so that logins will persist for future iterations of MIREX.

Please create an account via: [[Special:Userlogin]].

Please note that because of "spam-bots", MIREX wiki registration requests may be moderated by IMIRSEL members. It might take up to 24 hours for approval (Thank you for your patience!).

==MIREX 2005 - 2009 Wikis==
This is the new wiki for MIREX 2010. The wikis for MIREX 2005 - 2009 are available at:

'''[[2009:Main_Page|MIREX 2009]]'''
https://www.music-ir.org/mirex/2009/

'''[[2008:Main_Page|MIREX 2008]]'''
https://www.music-ir.org/mirex/2008/

'''[[2007:Main_Page|MIREX 2007]]'''
https://www.music-ir.org/mirex/2007/

'''[[2006:Main_Page|MIREX 2006]]'''
https://www.music-ir.org/mirex/2006/

'''[[2005:Main_Page|MIREX 2005]]'''
https://www.music-ir.org/mirex/2005/

You can interlink between this wiki and the previous wikis using '''2005:''' prefix on links to connect to pages in MIREX 2005 and '''2006:''' for MIREX 2006 and '''2007:''' for MIREX 2007 and '''2008:''' for MIREX 2008 and '''2009:''' for MIREX 2009.

===ISMIR 2004 Audio Description Contest===
The Audio Description Contest held at ISMIR 2004 is a precursor to MIREX. Details of the ISMIR 2004 Audio Description Contest can be found at:

'''[http://ismir2004.ismir.net/ISMIR_Contest.html| ISMIR 2004 Audio Description Contest]'''
http://ismir2004.ismir.net/ISMIR_Contest.html

2010:MIREX HOME

2010-06-07T13:15:25Z

Kriswest: /* MIREX 2010 Evaluation Tasks */

==Welcome to MIREX 2010==
This is the main page for the sixth running of the Music Information Retrieval Evaluation eXchange (MIREX 2010). The International Music Information Retrieval Systems Evaluation Laboratory ([https://music-ir.org/evaluation IMIRSEL]) at the Graduate School of Library and Information Science ([http://www.lis.illinois.edu GSLIS]), University of Illinois at Urbana-Champaign ([http://www.illinois.edu UIUC]) is the principal organizer of MIREX 2010.

The MIREX 2010 community will hold its annual meeting as part of [http://ismir2010.ismir.net/ The 11th International Conference on Music Information Retrieval], ISMIR 2010, which will be held in Utrecht, Netherlands, from August 9th to 13th, 2010. The MIREX plenary (working lunch) and poster sessions will be held Wednesday, 11 August 2010.

J. Stephen Downie 
Director, IMIRSEL 

==MIREX 2010 Submission Instructions==
* Be sure to read through the rest of this page
* Be sure to read though the task pages for which you are submitting
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[2010:MIREX 2010 Submission Instructions | MIREX 2010 Submission Instructions ]] including both the tutorial video and the text

===MIREX 2010 Evaluation Tasks===

The IMIRSEL team at UIUC solicited proposals for evaluation tasks to be performed at the Music Information Retrieval Evaluation eXchange 2010 (MIREX 2010) and polled the community on their likelihood of participation in each task. A summary of the responses from the community is given below:

Results as of Monday 24th May 2010:

Total individual responses = 74

<csv p=0>2010/poll/MIREX_Task_Participation_Poll.csv</csv>

Hence, the IMIRSEL team has decided to attempt the running of the following tasks at MIREX 2010:
* [[2010:Audio Classification (Train/Test) Tasks]], incorporating:
** Audio Artist Identification
** Audio US Pop Genre Classification
** Audio Latin Genre Classification
** Audio Music Mood Classification
** Audio Classical Composer Identification
* [[2010:Audio Cover Song Identification]]
* [[2010:Audio Tag Classification]]
* [[2010:Audio Music Similarity and Retrieval]]
* [[2010:Symbolic Melodic Similarity]]
* [[2010:Audio Onset Detection]]
* [[2010:Audio Key Detection]]
* [[2010:Real-time Audio to Score Alignment (a.k.a Score Following)]]
* [[2010:Query by Singing/Humming]]
* [[2010:Audio Melody Extraction]]
* [[2010:Multiple Fundamental Frequency Estimation & Tracking]]
* [[2010:Audio Chord Estimation]]
* <strike>[[2010:Query by Tapping]]</strike>
* [[2010:Audio Beat Tracking]]
* [[2010:Structural Segmentation]]
* [[2010: Audio Tempo Estimation]]

==== New 2010 Proposals ====
* <strike>[[2010:Harmonic Analysis]]</strike>

===Projected dates===
* 1st June 2010: MIREX submission system opens (target date)
* 22nd June - 1st July 2010: Rolling MIREX submission system closures (dates to be announced)
* 15th July 2010: MIREX results posting begins
* 1st August 2010: All MIREX results posted (somewhat hopeful target date)
* 2-6th August 2010: USMIR Summer School
* 9-13th August 2010: ISMIR conference

===Note to New Participants===
Please take the time to read the following review article that explains the history and structure of MIREX.

Downie, J. Stephen (2008). The Music Information Retrieval Evaluation Exchange (2005-2007): 
A window into music information retrieval research.''Acoustical Science and Technology 29'' (4): 247-255. 
Available at: [http://dx.doi.org/10.1250/ast.29.247 http://dx.doi.org/10.1250/ast.29.247]

===Note to All Participants===
Because MIREX is premised upon the sharing of ideas and results, '''ALL''' MIREX participants are expected to:

# submit a DRAFT 2-3 page extended abstract PDF in the ISMIR format about the submitted programme(s) to help us and the community better understand how the algorithm works when submitting their programme(s).
# submit a FINALIZED 2-3 page extended abstract PDF in the ISMIR format prior to ISMIR 2010 for posting on the respective results pages (sometimes the same abstract can be used for multiple submissions; in many cases the DRAFT and FINALIZED abstracts are the same)
# present a poster at the MIREX 2010 poster session at ISMIR 2010 (Wednesday, 11 August 2010)

===Software Dependency Requests===
If you have not submitted to MIREX before or are unsure whether IMIRSEL/NEMA currently supports some of the software/architecture dependencies for your submission a [https://spreadsheets.google.com/embeddedform?formkey=dDltRjc4NDBDdkZiaF9qZXV0bU5ScUE6MA dependency request form is available]. Please submit details of your dependencies on this form and the IMIRSEL team will attempt to satisfy them for you.

Due to the high volume of submissions expected at MIREX 2010, submissions with difficult to satisfy dependencies that the team has not been given sufficient notice of may result in the submission being rejected.

Finally, you will also be expected to detail your software/architecture dependencies in a README file to be provided to the submission system.

==Getting Involved in MIREX 2010==
MIREX is a community-based endeavour. Be a part of the community and help make MIREX 2010 the best yet.

===Mailing List Participation===
If you are interested in formal MIR evaluation, you should also subscribe to the "MIREX" (aka "EvalFest") mail list and participate in the community discussions about defining and running MIREX 2010 tasks. Subscription information at:
[https://mail.lis.illinois.edu/mailman/listinfo/evalfest EvalFest Central].

If you are participating in MIREX 2010, it is VERY IMPORTANT that you are subscribed to EvalFest. Deadlines, task updates and other important information will be announced via this mailing list. Please use the EvalFest for discussion of MIREX task proposals and other MIREX related issues. This wiki (MIREX 2010 wiki) will be used to embody and disseminate task proposals, however, task related discussions should be conducted on the MIREX organization mailing list (EvalFest) rather than on this wiki, but should be summarized here.

Where possible, definitions or example code for new evaluation metrics or tasks should be provided to the IMIRSEL team who will embody them in software as part of the NEMA analytics framework, which will be released to the community at or before ISMIR 2010 - providing a standardised set of interfaces and output to disciplined evaluation procedures for a great many MIR tasks.

===Wiki Participation===
'''''Please note that you may need to create a NEW login for this wiki even if you have a login that you previously used for editing the MIREX 2005, 2006, 2007, 2008 or 2009 wikis.'''''

However, starting in 2010 the MIREX wikis have been merged so that logins will persist for future iterations of MIREX.

Please create an account via: [[Special:Userlogin]].

Please note that because of "spam-bots", MIREX wiki registration requests may be moderated by IMIRSEL members. It might take up to 24 hours for approval (Thank you for your patience!).

==MIREX 2005 - 2009 Wikis==
This is the new wiki for MIREX 2010. The wikis for MIREX 2005 - 2009 are available at:

'''[[2009:Main_Page|MIREX 2009]]'''
https://www.music-ir.org/mirex/2009/

'''[[2008:Main_Page|MIREX 2008]]'''
https://www.music-ir.org/mirex/2008/

'''[[2007:Main_Page|MIREX 2007]]'''
https://www.music-ir.org/mirex/2007/

'''[[2006:Main_Page|MIREX 2006]]'''
https://www.music-ir.org/mirex/2006/

'''[[2005:Main_Page|MIREX 2005]]'''
https://www.music-ir.org/mirex/2005/

You can interlink between this wiki and the previous wikis using '''2005:''' prefix on links to connect to pages in MIREX 2005 and '''2006:''' for MIREX 2006 and '''2007:''' for MIREX 2007 and '''2008:''' for MIREX 2008 and '''2009:''' for MIREX 2009.

===ISMIR 2004 Audio Description Contest===
The Audio Description Contest held at ISMIR 2004 is a precursor to MIREX. Details of the ISMIR 2004 Audio Description Contest can be found at:

'''[http://ismir2004.ismir.net/ISMIR_Contest.html| ISMIR 2004 Audio Description Contest]'''
http://ismir2004.ismir.net/ISMIR_Contest.html

2010:Audio Beat Tracking

2010-06-07T11:13:20Z

Kriswest: /* Collections */

== Description ==

The aim of the automatic beat tracking task is to track each beat locations in a collection of sound files. Unlike the Audio Tempo Extraction task, which aim is to detect tempi for each file, the beat tracking task aims at detecting all beat locations in recordings. The algorithms will be evaluated in terms of their accuracy in predicting beat locations annotated by a group of listeners.

== Data ==
=== Collections ===
The original 2006 dataset contains 160 30-second excerpts (WAV format) used for the Audio Tempo and Beat contests in 2006. Beat locations have been annotated in each excerpt by 40 different listeners (39 listeners for a few excerpts. The length of each excerpt is 30 seconds. These audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters. One disadvantage of using this set for beat tracking is that the tempi are rather stable and this set will not test beat-tracking algorithms in their ability to track tempo changes.

The second collection is comprised of 367 Chopin Mazurkas, represented as full audio tracks (WAV format). The Mazurka dataset contains tempo changes so it will evaluate the ability of algorithms to track these.

=== Audio Formats ===

The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* file length between 2 and 36 seconds (total time: 14 minutes)

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input Data ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 44.1 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV

=== Output Data ===

The beat tracking algorithms will return beat-times in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

=== Output File Format (Audio Beat tracking) ===

The Beat Tracking output file format is an ASCII text format. Each beat time is specified, in seconds, on its own line. Specifically,

<beat time(in seconds)>\n

where \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

0.243
0.486
0.729

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the onset detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, such as a detection threshold, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with different values for parameters param1, the README file would look like:

foobar -param1 .1 -i %input -o %output
foobar -param1 .15 -i %input -o %output
foobar -param1 .2 -i %input -o %output
foobar -param1 .25 -i %input -o %output
foobar -param1 .3 -i %input -o %output
...

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"
matlab -r "foobar(.15,'%input','%output');quit;"
matlab -r "foobar(.2,'%input','%output');quit;"
matlab -r "foobar(.25,'%input','%output');quit;"
matlab -r "foobar(.3,'%input','%output');quit;"
...

The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input' and '%output' strings.

== Evaluation Procedures ==

The evaluation methods are taken from the beat evaluation toolbox and
are described in the following technical report:

M. E. P. Davies, N. Degara and M. D. Plumbley. "Evaluation methods for musical audio beat tracking algorithms". [https://music-ir.org/mirex/results/2009/beat/techreport_beateval.pdf ''Technical Report C4DM-TR-09-06''].

For further details on the specifics of the methods please refer to the
paper. However, here is a brief summary with appropriate references:

*'''F-measure''' - the standard calculation as used in onset evaluation but
with a 70ms window.

S. Dixon, "Onset detection revisited," in ''Proceedings of 9th
International Conference on Digital Audio Effects (DAFx)'', Montreal,
Canada, pp. 133-137, 2006.

S. Dixon, "Evaluation of audio beat tracking system beatroot," ''Journal
of New Music Research'', vol. 36, no. 1, pp. 39-51, 2007.

*'''Cemgil''' - beat accuracy is calculated using a Gaussian error function
with 40ms standard deviation.

A. T. Cemgil, B. Kappen, P. Desain, and H. Honing, "On tempo tracking:
Tempogram representation and Kalman filtering," ''Journal Of New Music
Research'', vol. 28, no. 4, pp. 259-273, 2001

*'''Goto''' - binary decision of correct or incorrect tracking based on
statistical properties of a beat error sequence.

M. Goto and Y. Muraoka, "Issues in evaluating beat tracking systems," in
''Working Notes of the IJCAI-97 Workshop on Issues in AI and Music -
Evaluation and Assessment'', 1997, pp. 9-16.

*'''PScore''' - McKinney's impulse train cross-correlation method as used in
2006.

M. F. McKinney, D. Moelants, M. E. P. Davies, and A. Klapuri,
"Evaluation of audio beat tracking and music tempo extraction
algorithms," ''Journal of New Music Research'', vol. 36, no. 1, pp. 1-16,
2007.

*'''CMLc''', '''CMLt''', '''AMLc''', '''AMLt''' - continuity-based evaluation methods based on
the longest continuously correctly tracked section.

S. Hainsworth, "Techniques for the automated analysis of musical audio,"
Ph.D. dissertation, Department of Engineering, Cambridge University,
2004.

A. P. Klapuri, A. Eronen, and J. Astola, "Analysis of the meter of
acoustic musical signals," IEEE Transactions on Audio, Speech and
Language Processing, vol. 14, no. 1, pp. 342-355, 2006.

*'''D''', '''Dg''' - information based criteria based on analysis of a beat error
histogram (note the results are measured in 'bits' and not percentages),
see the technical report for a description.

== Relevant Development Collections ==
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 12 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Structural Segmentation

2010-06-05T09:27:44Z

Kriswest:

== Description ==

The aim of the MIREX structural segmentation evaluation is to identify the key structural sections in musical audio. The segment structure (or form) is one of the most important musical parameters. It is furthermore special because musical structure -- especially in popular music genres (e.g. verse, chorus, etc.) -- is accessible to everybody: it needs no particular musical knowledge. This task was first run in 2009.

== Data ==

=== Collections ===
The final MIREX data set for structural segmentation is comprised of 297 songs. The majority come from the Beatles collection. Works from other artists round out the evaluation dataset.

There is a good chance a second dataset donated by the QUERO project will be included. This data includes segment boundaries for 100 songs from its "popular music" section of the RWC dataset. Since there is no grouping information of the segments, only boundary retrieval metrics will be calculated. More info about this annotations can be found at http://hal.inria.fr/docs/00/47/34/79/PDF/PI-1948.pdf .

=== Audio Formats ===

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)

== Submission Format ==

Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input Data ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 44.1 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV

=== Output Data ===

The structural segmentation algorithms will return the segmentation in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

=== Output File Format (Structural Segmentation) ===

The Structural Segmentation output file format is a tab-delimited ASCII text format. This is the same as Chris Harte's chord labelling files (.lab), and so is the same format as the ground truth as well. Onset and offset times are given in seconds, and the labels are simply letters: 'A', 'B', ... with segments referring to the same structural element having the same label.

Three column text file of the format

<onset_time(sec)>\t<offset_time(sec)>\t<label>\n
<onset_time(sec)>\t<offset_time(sec)>\t<label>\n
...

where \t denotes a tab, \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

0.000 5.223 A
5.223 15.101 B
15.101 20.334 A

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the structural segmentation on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with a specific value for parameter param1, the README file would look like:

foobar -param1 .1 -i %input -o %output

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"

== Evaluation Procedures ==
At the last ISMIR conference [http://ismir2008.ismir.net/papers/ISMIR2008_219.pdf Lukashevich] proposed a measure for segmentation evaluation. Because of the complexity of the structural segmentation task definition, several different evaluation measures will be employed to address different aspects. It should be noted that none of the evaluation measures cares about the true labels of the sections: they only denote the clustering. This means that it does not matter if the systems produce true labels such as "chorus" and "verse", or arbitrary labels such as "A" and "B".

=== Boundary retrieval ===
'''Hit rate''' Found segment boundaries are accepted to be correct if they are within 0.5s ([http://ismir2007.ismir.net/proceedings/ISMIR2007_p051_turnbull.pdf Turnbull et al. ISMIR2007]) or 3s ([http://dx.doi.org/10.1109/TASL.2007.910781 Levy & Sandler TASLP2008]) from a border in the ground truth. Based on the matched hits, ''boundary retrieval recall rate'', ''boundary retrieval precision rate'', and ''boundary retrieval F-measure'' are be calculated.

'''Median deviation''' Two median deviation measure between boundaries in the result and ground truth are calculated: ''median true-to-guess'' is the median time from boundaries in ground truth to the closest boundaries in the result, and ''median guess-to-true'' is similarly the median time from boundaries in the result to boundaries in ground truth. ([http://ismir2007.ismir.net/proceedings/ISMIR2007_p051_turnbull.pdf Turnbull et al. ISMIR2007])

=== Frame clustering ===
Both the result and the ground truth are handled in short frames (e.g., beat or fixed 100ms). All frame pairs in a structure description are handled. The pairs in which both frames are assigned to the same cluster (i.e., have the same label) form the sets <math>P_E</math> (for the system result) and <math>P_A</math> (for the ground truth). The ''pairwise precision rate'' can be calculated by <math>P = \frac{|P_E \cap P_A|}{|P_E|}</math>, ''pairwise recall rate'' by <math>R = \frac{|P_E \cap P_A|}{|P_A|}</math>, and ''pairwise F-measure'' by <math>F=\frac{2 P R}{P + R}</math>. ([http://dx.doi.org/10.1109/TASL.2007.910781 Levy & Sandler TASLP2008])

=== Normalised conditional entropies ===
Over- and under segmentation based evaluation measures proposed in [http://ismir2008.ismir.net/papers/ISMIR2008_219.pdf Lukashevich ISMIR2008].
Structure descriptions are represented as frame sequences with the associated cluster information (similar to the Frame clustering measure). Confusion matrix between the labels in ground truth and the result is calculated. The matrix C is of size |L_A| * |L_E|, i.e., number of unique labels in the ground truth times number of unique labels in the result. From the confusion matrix, the joint distribution is calculated by normalising the values with the total number of frames F:

<math>p_{i,j} = C_{i,j} / F</math>

Similarly, the two marginals are calculated:

<math>p_i^a = \sum_{j=1}^{|L_E|} C{i,j}/F</math>, and

<math>p_j^e = \sum_{i=1}^{|L_A|} C{i,j}/F</math>

Conditional distributions:

<math>p_{i,j}^{a|e} = C_{i,j} / \sum_{i=1}^{|L_A|} C{i,j}</math>, and

<math>p_{i,j}^{e|a} = C_{i,j} / \sum_{j=1}^{|L_E|} C{i,j}</math>

The conditional entropies will then be

<math>H(E|A) = - \sum_{i=1}^{|L_A|} p_i^a \sum_{j=1}^{|L_E|} p_{i,j}^{e|a} \log_2(p_{i,j}^{e|a})</math>, and

<math>H(A|E) = - \sum_{j=1}^{|L_E|} p_j^e \sum_{i=1}^{|L_A|} p_{i,j}^{a|e} \log_2(p_{i,j}^{a|e})</math>

The final evaluation measures will then be the oversegmentation score

<math>S_O = 1 - \frac{H(E|A)}{\log_2(|L_E|)}</math> , and the undersegmentation score

<math>S_U = 1 - \frac{H(A|E)}{\log_2(|L_A|)}</math>

== Relevant Development Collections ==
*Jouni Paulus's [http://www.cs.tut.fi/sgn/arg/paulus/structure.html structure analysis page] links to a corpus of 177 Beatles songs ([http://www.cs.tut.fi/sgn/arg/paulus/beatles_sections_TUT.zip zip file]). The Beatles annotations are not a part of the TUTstructure07 dataset. That dataset contains 557 songs, a list of which is available [http://www.cs.tut.fi/sgn/arg/paulus/TUTstructure07_files.html here].

*Ewald Peiszer's [http://www.ifs.tuwien.ac.at/mir/audiosegmentation.html thesis page] links to a portion of the corpus he used: 43 non-Beatles pop songs (including 10 J-pop songs) ([http://www.ifs.tuwien.ac.at/mir/audiosegmentation/dl/ep_groundtruth_excl_Paulus.zip zip file]).

These public corpora give a combined 220 songs.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 24 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Audio Beat Tracking

2010-06-05T09:27:19Z

Kriswest:

== Description ==

The aim of the automatic beat tracking task is to track each beat locations in a collection of sound files. Unlike the Audio Tempo Extraction task, which aim is to detect tempi for each file, the beat tracking task aims at detecting all beat locations in recordings. The algorithms will be evaluated in terms of their accuracy in predicting beat locations annotated by a group of listeners.

== Data ==
=== Collections ===
The original 2006 dataset contains 160 30-second excerpts (WAV format) used for the Audio Tempo and Beat contests in 2006. Beat locations have been annotated in each excerpt by 40 different listeners (39 listeners for a few excerpts. The length of each excerpt is 30 seconds. These audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters. One disadvantage of using this set for beat tracking is that the tempi are rather stable and this set will not test beat-tracking algorithms in their ability to track tempo changes.

The second collection is comprised of 367 Chopin Mazurkas. The Mazurka dataset contains tempo changes so it will evaluate the ability of algorithms to track these.

=== Audio Formats ===

The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* file length between 2 and 36 seconds (total time: 14 minutes)

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input Data ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 44.1 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV

=== Output Data ===

The beat tracking algorithms will return beat-times in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

=== Output File Format (Audio Beat tracking) ===

The Beat Tracking output file format is an ASCII text format. Each beat time is specified, in seconds, on its own line. Specifically,

<beat time(in seconds)>\n

where \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

0.243
0.486
0.729

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the onset detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, such as a detection threshold, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with different values for parameters param1, the README file would look like:

foobar -param1 .1 -i %input -o %output
foobar -param1 .15 -i %input -o %output
foobar -param1 .2 -i %input -o %output
foobar -param1 .25 -i %input -o %output
foobar -param1 .3 -i %input -o %output
...

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"
matlab -r "foobar(.15,'%input','%output');quit;"
matlab -r "foobar(.2,'%input','%output');quit;"
matlab -r "foobar(.25,'%input','%output');quit;"
matlab -r "foobar(.3,'%input','%output');quit;"
...

The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input' and '%output' strings.

== Evaluation Procedures ==

The evaluation methods are taken from the beat evaluation toolbox and
are described in the following technical report:

M. E. P. Davies, N. Degara and M. D. Plumbley. "Evaluation methods for musical audio beat tracking algorithms". [https://music-ir.org/mirex/results/2009/beat/techreport_beateval.pdf ''Technical Report C4DM-TR-09-06''].

For further details on the specifics of the methods please refer to the
paper. However, here is a brief summary with appropriate references:

*'''F-measure''' - the standard calculation as used in onset evaluation but
with a 70ms window.

S. Dixon, "Onset detection revisited," in ''Proceedings of 9th
International Conference on Digital Audio Effects (DAFx)'', Montreal,
Canada, pp. 133-137, 2006.

S. Dixon, "Evaluation of audio beat tracking system beatroot," ''Journal
of New Music Research'', vol. 36, no. 1, pp. 39-51, 2007.

*'''Cemgil''' - beat accuracy is calculated using a Gaussian error function
with 40ms standard deviation.

A. T. Cemgil, B. Kappen, P. Desain, and H. Honing, "On tempo tracking:
Tempogram representation and Kalman filtering," ''Journal Of New Music
Research'', vol. 28, no. 4, pp. 259-273, 2001

*'''Goto''' - binary decision of correct or incorrect tracking based on
statistical properties of a beat error sequence.

M. Goto and Y. Muraoka, "Issues in evaluating beat tracking systems," in
''Working Notes of the IJCAI-97 Workshop on Issues in AI and Music -
Evaluation and Assessment'', 1997, pp. 9-16.

*'''PScore''' - McKinney's impulse train cross-correlation method as used in
2006.

M. F. McKinney, D. Moelants, M. E. P. Davies, and A. Klapuri,
"Evaluation of audio beat tracking and music tempo extraction
algorithms," ''Journal of New Music Research'', vol. 36, no. 1, pp. 1-16,
2007.

*'''CMLc''', '''CMLt''', '''AMLc''', '''AMLt''' - continuity-based evaluation methods based on
the longest continuously correctly tracked section.

S. Hainsworth, "Techniques for the automated analysis of musical audio,"
Ph.D. dissertation, Department of Engineering, Cambridge University,
2004.

A. P. Klapuri, A. Eronen, and J. Astola, "Analysis of the meter of
acoustic musical signals," IEEE Transactions on Audio, Speech and
Language Processing, vol. 14, no. 1, pp. 342-355, 2006.

*'''D''', '''Dg''' - information based criteria based on analysis of a beat error
histogram (note the results are measured in 'bits' and not percentages),
see the technical report for a description.

== Relevant Development Collections ==
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 12 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Audio Chord Estimation

2010-06-05T09:26:23Z

Kriswest: /* Submission opening date */

== Description ==
This task requires participants to extract or transcribe a sequence of chords from an audio music recording. For many applications in music information retrieval, extracting the harmonic structure of an audio track is very desirable, for example for segmenting pieces into characteristic segments, for finding similar pieces, or for semantic analysis of music.

The extraction of the harmonic structure requires the detection of as many chords as possible in a piece. That includes the characterisation of chords with a key and type as well as a chronological order with onset and duration of the chords.

Although some publications are available on this topic [1,2,3,4,5], comparison of the results is difficult, because different measures are used to assess the performance. To overcome this problem an accurately defined methodology is needed. This includes a repertory of the findable chords, a defined test set along with ground truth and unambiguous calculation rules to measure the performance.

== Data ==
Two datasets are used to evaluate chord transcription accuracy:

=== Beatles dataset ===
Christopher Harte`s Beatles dataset consisting of annotations of 12 Beatles albums.

The text annotation procedure of musical chords that was used to produce this dataset is presented in [6].

=== Queen and Zweieck dataset ===
Matthias Mauch's Queen and Zweieck dataset consisting of 38 songs from Queen and Zweieck.

===Example ground-truth file ===
The ground-truth files take the form:

...
41.2631021 44.2456460 B
44.2456460 45.7201130 E
45.7201130 47.2061900 E:7/3
47.2061900 48.6922670 A
48.6922670 50.1551240 A:min/b3
...

== Evaluation ==

=== Segmentation Score ===

The segmentation score will be calculated using directional hamming distance as described in [8]. An over-segmentation value (m) and an under-segmentation value (f) will be calculated and the final segmentation score will be calculated using the worst case from these two i.e:

segmentation score = 1 - max(m,f)

m and f are not independent of each other so combining them this way ensures that a good score in one does not hide a bad score in the other. The combined segmentation score can take values between 0 and 1 with 0 being the worst and 1 being the best result.--[[User:Chrish|Chrish]] 17:05, 9 September 2009 (UTC)

=== Frame-based recall ===

For recall evaluation, we may define a different chord dictionary for each level of evaluation (dyads, triads, tetrads etc). Each dictionary is a text file containing chord shorthands / interval lists of the chords that will be considered in that evaluation. The following dictionaries are proposed:

For dyad comparison of major/minor chords only:

N 
X:maj 
X:min 

For comparison of standard triad chords:

N 
X:maj 
X:min 
X:aug 
X:dim 
X:sus2 
X:sus4 

For comparison of tetrad (quad) chords:

N 
X:maj 
X:min 
X:aug 
X:dim 
X:sus2 
X:sus4 
X:maj7 
X:7 
X:maj(9) 
X:aug(7) 
X:min(7) 
X:min7 
X:min(9) 
X:dim(7) 
X:hdim7 
X:sus4(7) 
X:sus4(b7) 
X:dim7 

For each evaluation level, the ground truth annotation is compared against the dictionary. Any chord label not belonging to the current dictionary will be replaced with an "X" in a local copy of the annotation and will not be included in the recall calculation.

Note that the level of comparison in terms of intervals can be varied. For example, in a triad evaluation we can consider the first three component intervals in the chord so that a major (1,3,5) and a major7 (1,3,5,7) will be considered the same chord. For a tetrad (quad) evaluation, we would consider the first 4 intervals so major and major7 would then be considered to be different chords.

For the maj/min evaluation (using the first example dictionary), using an interval comparison of 2 (dyad) will compare only the first two intervals of each chord label. This would map augmented and diminished chords to major and minor respectively (and any other symbols that had a major 3rd or minor 3rd as their first interval). Using an interval comparison of 3 with the same dictionary would keep only those chords that have major and minor triads as their first 3 intervals so augmented and diminished chords would be removed from the evaluation.

After the annotation has been "filtered" using a given dictionary, it can be compared against the machine generated estimates output by the algorithm under test. The chord sequences described in the annotation and estimate text files are sampled at a given frame rate (in this case 10ms per frame) to give two sequences of chord frames which may be compared directly with each other. For calculating a hit or a miss, the chord labels from the current frame in each sequence will be compared. Chord comparison is done by converting each chord label into an ordered list of pitch classes then comparing the two lists element by element. If the lists match to the required number of intervals then a hit is recorded, otherwise the estimate is considered a miss. It should be noted that, by converting to pitch classes in the comparison, this evaluation ignores enharmonic pitch and interval spellings so the following chords (slightly silly example just for illustration) will all evaluate as identical:

C:maj = Dbb:maj = C#:(b1,b3,#4)

Basic recall calculation algorithm:

1) filter annotated transcription using chord dictionary for a defined number of intervals

2) sample annotated transcription and machine estimated transcription at 10ms intervals to create a sequence of annotation frames and estimate frames

3) start at the first frame

4) get chord label for current annotation frame and estimate frame

5) check annotation label: 

IF symbol is 'X' (i.e. non-dictionary) 

THEN ignore frame (record number of ignored frames) 

ELSE compare annotated/estimated chords for the predefined number of intervals 
increment hit count if chords match 

ENDIF

6) increment frame count

7) go back to 4 until final chord frame
--[[User:Chrish|Chrish]] 17:05, 9 September 2009 (UTC)

== Submission Format ==

=== Audio Format ===
Audio tracks will be encoded as 44.1 kHz 16bit mono WAV files.

=== I/O Format ===
The expected output chord transcription file for participating algorithms is that proposed by Christopher Harte [6].

Hence, algorithms should output text files with a similar format to that used in the ground truth transcriptions. That is to say, they should be flat text files with chord segment labels and times arranged thus:

start_time end_time chord_label

with elements separated by white spaces, times given in seconds, chord labels corresponding to the syntax described in [6] and one chord segment per line.

The chord root is given as a natural (A|B|C|D|E|F|G) followed by optional sharp or flat modifiers (#|b). For the evaluation process we may assume enharmonic equivalence for chord roots. For a given chord type on root X, the chord labels can be given as a list of intervals or as a shorthand notation as shown in the following table:

{|border="1" cellpadding="5" cellspacing="0" align="center"
|-
!NAME
!INTERVALS
!SHORTHAND
|-
|-*Triads:
|-
|-
|major
|X:(1,3,5)
|X or X:maj
|-
|-
|minor
|X:(1,b3,5)
|X:min
|-
|-
|diminished
|X:(1,b3,b5)
|X:dim
|-
|-
|augmented
|X:(1,3,#5)
|X:aug
|-
|-
|suspended4
|X:(1,4,5)
|X:sus4
|-
|-
|possible 6th triad:
|
|
|-
|-
|suspended2
|X:(1,2,5)
|X:sus2
|-
|-
|*Quads:
|
|
|-
|-
|major-major7
|X:(1,3,5,7)
|X:maj7
|-
|-
|major-minor7
|X:(1,3,5,b7)
|X:7
|-
|-
|major-add9
|X:(1,3,5,9)
|X:maj(9)
|-
|-
|major-major7-#5
|X:(1,3,#5,7)
|X:aug(7)
|-
|-
|minor-major7
|X:(1,b3,5,7)
|X:min(7)
|-
|-
|minor-minor7
|X:(1,b3,5,b7)
|X:min7
|-
|-
|minor-add9
|X:(1,b3,5,9)
|X:min(9)
|-
|-
|minor 7/b5 (ambiguous - could be either of the following)
|
|
|-
|-
|minor-major7-b5
|X:(1,b3,b5,7)
|X:dim(7)
|-
|-
|minor-minor7-b5 (a half diminished-7th)
|X:(1,b3,b5,b7)
|X:hdim7
|-
|-
|sus4-major7
|X:(1,4,5,7)
|X:sus4(7)
|-
|-
|sus4-minor7
|X:(1,4,5,b7)
|X:sus4(b7)
|-
|-
|omitted from list on wiki:
|
|
|-
|-
|diminished7
|X:(1,b3,b5,bb7)
|X:dim7
|-
|-
|No Chord
|N
|
|}

Please note that two things have changed in the syntax since it was originally described in [6]. The first change is that the root is no longer implied as a voiced element of a chord so a C major chord (notes C, E and G) should be written C:(1,3,5) instead of just C:(3,5) if using the interval list representation. As before, the labels C and C:maj are equivalent to C:(1,3,5). The second change is that the shorthand label "sus2" (intervals 1,2,5) has been added to the available shorthand list.--[[User:Chrish|Chrish]] 17:05, 9 September 2009 (UTC)

However, we still accept participants who would only like to be evaluated on major/minor chords and want to use the MIREX 2008 format which is an integer chord id on range 0-24, where values 0-11 denote the C major, C# major, ..., B major and 12-23 denote the C minor, C# minor, ..., B minor and 24 denotes silence or no-chord segments

=== Command line calling format ===

Submissions have to conform to the specified format below:

''extractFeaturesAndTrain "/path/to/trainFileList.txt" "/path/to/scratch/dir" ''

Where fileList.txt has the paths to each wav file. The features extracted on this stage can be stored under "/path/to/scratch/dir"
The ground truth files for the supervised learning will be in the same path with a ".txt" extension at the end. For example for "/path/to/trainFile1.wav", there will be a corresponding ground truth file called "/path/to/trainFile1.wav.txt" .

For testing:

''doChordID.sh "/path/to/testFileList.txt" "/path/to/scratch/dir" "/path/to/results/dir" ''

If there is no training, you can ignore the second argument here. In the results directory, there should be one file for each testfile with same name as the test file + .txt .

Programs can use their working directory if they need to keep temporary cache files or internal debuggin info. Stdout and stderr will be logged.

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed).

All submissions should include a README file including the following information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.

A hard limit of 24 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

== Submission opening date ==
Friday 4th June 2010

== Submission closing date ==
TBA

== Bibliography ==

1. Harte,C.A. and Sandler,M.B.(2005). '''Automatic chord identification using a quantised chromagram.''' Proceedings of 118th Audio Engineering Society's Convention.

2. Sailer,C. and Rosenbauer K.(2006). '''A bottom-up approach to chord detection.''' Proceedings of International Computer Music Conference 2006.

3. Shenoy,A. and Wang,Y.(2005). '''Key, chord, and rythm tracking of popular music recordings.''' Computer Music Journal 29(3), 75-86.

4. Sheh,A. and Ellis,D.P.W.(2003). '''Chord segmentation and recognition using em-trained hidden markov models.''' Proceedings of 4th International Conference on Music Information Retrieval.

5. Yoshioka,T. et al.(2004). '''Automatic Chord Transcription with concurrent recognition of chord symbols and boundaries.''' Proceedings of 5th International Conference on Music Information Retrieval.

6. Harte,C. and Sandler,M. and Abdallah,S. and G├│mez,E.(2005). '''Symbolic representation of musical chords: a proposed syntax for text annotations.''' Proceedings of 6th International Conference on Music Information Retrieval.

7. Papadopoulos,H. and Peeters,G.(2007). '''Large-scale study of chord estimation algorithms based on chroma representation and HMM.''' Proceedings of 5th International Conference on Content-Based Multimedia Indexing.

8. Samer Abdallah, Katy Noland, Mark Sandler, Michael Casey & Christophe Rhodes: '''Theory and Evaluation of a Bayesian Music Structure Extractor''' (pp. 420-425) Proc. 6th International Conference on Music Information Retrieval, ISMIR 2005.

2010:Multiple Fundamental Frequency Estimation & Tracking

2010-06-05T09:25:58Z

Kriswest: /* Submission opening date */

==Description==

That a complex music signal can be represented by the F0 contours of its constituent sources is a very useful concept for most music information retrieval systems. There have been many attempts at multiple (aka polyphonic) F0 estimation and melody extraction, a related area. The goal of multiple F0 estimation and tracking is to identify the active F0s in each time frame and to track notes and timbres continuously in a complex music signal. In this task, we would like to evaluate state-of-the-art multiple-F0 estimation and tracking algorithms. Since F0 tracking of all sources in a complex audio mixture can be very hard, we are restricting the problem to 3 cases:

# Estimate active fundamental frequencies on a frame-by-frame basis.
# Track note contours on a continuous time basis. (as in audio-to-midi). This task will also include a piano transcription sub task.
# Track timbre on a continous time basis.

=== Task Specific Mailing List ===
please add your name and email address here and also please sign up for the Multi-F0 mail list:
[https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com03 Multi-F0 Estimation Tracking email list]

==Data==
The 2009 Multi-F0 dataset will be reused, which is composed of:
* A woodwind quintet transcription of the fifth variation from L. van Beethoven's Variations for String Quartet Op.18 No. 5. Each part (flute, oboe, clarinet, horn, or bassoon) was recorded separately while the performer listened to the other parts (recorded previously) through headphones. Later the parts were mixed to a monaural 44.1kHz/16bits file.
* Synthesized pieces using RWC MIDI and RWC samples. Includes pieces from Classical and Jazz collections. Polyphony changes from 1 to 4 sources.
* Polyphonic piano recordings generated using a disklavier playback piano.

There are:
* 6, 30-sec clips for each polyphony (2-3-4-5) for a total of 30 examples,
* 10 30-sec polyphonic piano clips.

=== Development Dataset ===
A development dataset can be found at:
[https://www.music-ir.org/evaluation/MIREX/data/2007/multiF0/index.htm Development Set for MIREX 2007 MultiF0 Estimation Tracking Task].

Send an email to [mailto:mertbay@uiuc.edu mertbay@uiuc.edu] for the username and password.

==Evaluation==

This year, We would like to discuss different evaluation methods. From last year`s result, it can be seen that on note tracking, algorithms performed poorly when evaluated using note offsets. Below is the evaluation methods we used last year:

For Task 1 (frame level evaluation), systems will report the number of active pitches every 10ms. Precision (the portion of correct retrieved pitches for all pitches retrieved for each frame) and Recall (the ratio of correct pitches to all ground truth pitches for each frame) will be reported. A Returned Pitch is assumed to be correct if it is within a half semitone (+ - 3%) of a ground-truth pitch for that frame. Only one ground-truth pitch can be associated with each Returned Pitch.
Also as suggested, an error score as described in [http://www.hindawi.com/GetArticle.aspx?doi=10.1155/2007/48317 Poliner and Ellis p.g. 5 ] will be calculated.
The frame level ground truth will be calculated by [http://www.ircam.fr/pcm/cheveign/sw/yin.zip YIN] and hand corrected.

For Task 2 (note tracking), again Precision (the ratio of correctly transcribed ground truth notes to the number of ground truth notes for that input clip) and Recall (ratio of correctly transcribed ground truth notes to the number of transcribed notes) will be reported. A ground truth note is assumed to be correctly transcribed if the system returns a note that is within a half semitone (+ - 3%) of that note AND the returned note`s onset is within a 100ms range( + - 50ms) of the onset of the ground truth note, and its offset is within 20% range of the ground truth note`s offset. Again, one ground truth note can only be associated with one transcribed note.

The ground truth for this task will be annotated by hand. An amplitude threshold relative to the file/instrument will be determined. Note onset is going to be set to the time where its amplitude rises higher than the threshold and the offset is going to be set to the the time where the note`s amplitude decays lower than the threshold. The ground truth is going to be set as the average F0 between the onset and the offset of the note.
In the case of legato, the onset/offset is going to be set to the time where the F0 deviates more than 3% of the average F0 through out the the note up to that point. There is not going to be any vibrato larger than a half semitone in the test data.

Different statistics can also be reported if agreed by the participants.

== Submission Format ==

=== Audio Format ===
The audio files are encoded as 44.1kHz / 16 bit WAV files.

=== Command line calling format ===
Submissions have to conform to the specified format below:

''doMultiF0 "path/to/file.wav" "path/to/output/file.F0" ''

where:
* path/to/file.wav: Path to the input audio file.
* path/to/output/file.F0: The output file.

Programs can use their working directory if they need to keep temporary cache files or internal debuggin info. Stdout and stderr will be logged.

=== I/O format ===
For each task, the format of the output file is going to be different:

For the first task, F0-estimation on frame basis, the output will be a file where each row has a time stamp and a number of active F0s in that frame, separated by a tab for every 10ms increments.

Example :
''time F01 F02 F03 ''
''time F01 F02 F03 F04''
''time ... ... ... ...''

which might look like:

''0.78 146.83 220.00 349.23''
''0.79 349.23 146.83 369.99 220.00 ''
''0.80 ... ... ... ...''

For the second task, for each row, the file should contain the onset, offset and the F0 of each note event separated by a tab, ordered in terms of onset times:

onset offset F01
onset offset F02
... ... ...

which might look like:

0.68 1.20 349.23
0.72 1.02 220.00
... ... ...

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks,
hard limits on the runtime of submissions are specified.

A hard limit of 24 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Audio Melody Extraction

2010-06-05T09:25:30Z

Kriswest:

== Description ==

The aim of the MIREX audio melody extraction evaluation is to identify the melody pitch contour from polyphonic musical audio. Pitch is expressed as the fundamental frequency of the main melodic voice, and is reported in a frame-based manner on an evenly-spaced time-grid.

The task consists of two parts:
* Voicing detection (deciding whether a particular time frame contains a "melody pitch" or not),
* pitch detection (deciding the most likely melody pitch for each time frame).

We structure the submission to allow these parts to be done independently within a single output file. That is, it is possible (via a negative pitch value) to guess a pitch even for frames that were being judged unvoiced. Algorithms which don't perform a discrimination between melodic and non-melodic parts are also welcome!

== Data ==

=== Collections ===
* MIREX09 database : 374 Karaoke recordings of Chinese songs. Each recording is mixed at three different levels of Signal-to-Accompaniment Ratio {-5dB, 0dB, +5 dB} for a total of 1122 audio clips. Instruments: singing voice (male, female), synthetic accompaniment.
* MIREX08 database : 4 excerpts of 1 min. from "north Indian classical vocal performances", instruments: singing voice (male, female), tanpura (Indian instrument, perpetual background drone), harmonium (secondary melodic instrument) and tablas (pitched percussions). There are two different mixtures of each of the 4 excerpts with differing amounts of accompaniment for a total of 8 audio clips.
* MIREX05 database : 25 phrase excerpts of 10-40 sec from the following genres: Rock, R&B, Pop, Jazz, Solo classical piano.
* ADC04 database : Dataset from the 2004 Audio Description Contest. 20 excerpts of about 20s each.
* manually annotated reference data (10 ms time grid)

=== Audio Formats ===

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)

== Submission Format ==

Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input Data ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 44.1 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV

=== Output Data ===

The melody extraction algorithms will return the melody contour in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

=== Output File Format (Audio Melody Extraction) ===

The Audio Melody Extraction output file format is a tab-delimited ASCII text format. Fundamental frequencies (in Hz) of the main melody are reported on a 10ms time-grid. If an algorithm estimates that there is no melody present within a given time frame it is to report a NEGATIVE frequency estimate. This allows the algorithm to still output a pitch estimate even if its voiced/unvoiced detection mechanism is incorrect. Therefore, pitch accuracy and segmentation performance can be evaluated separately. Estimating ZERO frequency is also acceptable. However, Pitch Accuracy performance will go down if the voiced/unvoiced detection of the algorithm is incorrect. If the algorithm performs no segmentation, it can report all positive fundamental frequencies (and the segmentation aspects of the evaluation ignored). If the time-stamp in the algorithm output is not on a 10ms time-grid, it will be resampled using 0th-order interpolation during evaluation. Therefore, we encourage the use of a 10ms frame hop-size. Each line of the output file should look like:

<timestamp (seconds)>\t<frequency (Hz)>\n

where \t denotes a tab, \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

0.00 -439.3
0.01 -439.4
0.02 440.2
0.03 440.3
0.04 440.2

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the melody extraction on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with a specific value for parameter param1, the README file would look like:

foobar -param1 .1 -i %input -o %output

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"

== Evaluation Procedures ==

The task consists of two parts: Voicing detection (deciding whether a particular time frame contains a "melody pitch" or not), and pitch detection (deciding the most likely melody pitch for each time frame). We structured the submission to allow these parts to be done independently, i.e. it was possible (via a negative pitch value) to guess a pitch even for frames that were being judged unvoiced.
So consider a matrix of the per-frame voiced (Ground Truth or Detected values != 0) and unvoiced (GT, Det == 0) results, where the counts are:
Detected
unvx vx sum
---------------
Ground unvoiced | TN | FP | GU
Truth voiced | FN | TP | GV
---------------
sum DU DV TO

TP ("true positives", frames where the voicing was correctly detected) further breaks down into pitch correct and pitch incorrect, say TP = TPC + TPI

Similarly, the ability to record pitch guesses even for frames judged unvoiced breaks down FN ("false negatives", frames which were actually pitched but detected as unpitched) into pitch correct and pitch incorrect, say FN = FNC + FNI
In both these cases, we can also count the number of times the chroma was correct, i.e. ignoring octave errors, say TP = TPCch + TPIch and FN = FNCch + FNIch.

To assess the voicing detection portion, we use the standard tools of detection theory.

*'''Voicing Detection''' is the probability that a frame which is truly voiced is labeled as voiced i.e. TP/GV (also known as "hit rate").
*'''Voicing False Alarm''' is the probability that a frame which is not actually voiced is none the less labeled as voiced i.e. FP/GU.
*'''Voicing d-prime''' is a measure of the sensitivity of the detector that attempts to factor out the overall bias towards labeling any frame as voiced (which can move both hit rate and false alarm rate up and down in tandem). It converts the hit rate and false alarm into standard deviations away from the mean of an equivalent Gaussian distribution, and reports the difference between them. A larger value indicates a detection scheme with better discrimination between the two classes.

For the voicing detection, we pool the frames from all excerpts in a dataset to get an overall frame-level voicing detection performance. Because some excerpts had no unvoiced frames, averaging over the excerpts can give some misleading results.

Now we move on to the actual pitch detection.
*'''Raw Pitch Accuracy''' is the probability of a correct pitch value (to within ± ¼ tone) given that the frame is indeed pitched. This includes the pitch guesses for frames that were judged unvoiced i.e. (TPC + FNC)/GV.
*'''Raw Chroma Accuracy''' is the probability that the chroma (i.e. the note name) is correct over the voiced frames. This ignores errors where the pitch is wrong by an exact multiple of an octave (octave errors). It is (TPCch + FNCch)/GV.
*'''Overall Accuracy''' combines both the voicing detection and the pitch estimation to give the proportion of frames that were correctly labeled with both pitch and voicing, i.e. (TPC + TN)/TO.

When averaging the pitch statistics, we calculate the performance for each of the excerpts individually, then report the average of these measures. This helps increase the effective weight of some of the minority genres, which had shorter excerpts.

== Relevant Development Collections ==
* [http://unvoicedsoundseparation.googlepages.com/mir-1k MIR-1K]: [http://mirlab.org/dataset/public/MIR-1K_for_MIREX.rar MIR-1K for MIREX](Note that this is not the one used for evaluation. The MIREX 2009 dataset used for evaluation last year was created in the same way but has different content and singers).

* Graham's collection: you find the test set here and further explanations on the pages http://www.ee.columbia.edu/~graham/mirex_melody/ and http://labrosa.ee.columbia.edu/projects/melody/

* For the ISMIR 2004 Audio Description Contest, the Music Technology Group of the Pompeu Fabra University assembled a diverse of audio segments and corresponding melody transcriptions including audio excerpts from such genres as Rock, R&B, Pop, Jazz, Opera, and MIDI. http://ismir2004.ismir.net/melody_contest/results.html (full test set with the reference transcriptions (28.6 MB))

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 12 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Real-time Audio to Score Alignment (a.k.a Score Following)

2010-06-05T09:21:16Z

Kriswest:

''Real-time Audio to Score Alignment'', also known as ''Score Following''

== Description ==
Score Following is the real-time alignment of an incoming music signal to the music score. The music signal can be symbolic (MIDI) or audio, but we will concentrate here on audio following, unless there are some candidates who'd want their symbolic followers to be evaluated and can propose reference data.

This page describes a proposal for evaluation of score following systems. Discussion of the evaluation procedures on the [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com01 Score Following contest planning list] will be documented on the [[Score Following]] page. A full digest of the discussions is available to subscribers from the [https://mail.lis.uiuc.edu/mailman/private/mrx-com01/ Score Following contest planning list archives].

Submissions will be required to estimate alignment precision according to the indexed times. In order for your system to participate, please specify the type of alignment (monophonic, polyphonic), type of training and realtime performance, also separated into two domains (upon enough submissions) for symbolic and audio systems. Note that we also do accept systems that don't run in real-time in practice, as soon as their algorithm is on-line, i.e. without making use of global knowledge of the input.

== Data ==
46 recordings and their corresponding MIDI representations of the score will be used in the evaluation. These 46 excerpts were extracted from 4 distinct musical pieces.
Recordings are in 44.1Khz 16bit wav format. The reference scores are in MIDI format.

== Evolution ==
This year's changes are proposed here and on the list, and are currently under discussion. Proposed changes are mainly about the score and reference file formats and the evaluation metrics:

* the proposed new score and reference file format is described here: [[Score File Format]]
* evaluation metrics will more closely reflect the different approaches and applications of score following

See the details of last year's proposal on the [[2006:Score_Following_Proposal|MIREX 2006 Wiki]]

== Evaluation procedures ==

Evaluation procedure consists of running score followers on a database of aligned audio to score where the database contains score, and performance audio (for system call) and a reference alignment (for evaluations) -- See below for details.

=== I/O Format ===
Each system should conform to the following format:

''doScofo.sh "/path/to/audiofile.wav" "/path/to/midi_score_file.wav" "/path/to/result/filename.txt"

The stdout and stderr will be logged.

"/path/to/result/filenam.txt" should be have one line per detected note with the following 4 columns

1. estimated note onset time in performance audio file (ms)
2. detection time relative to performance audio file (ms)
3. note start time in score (ms)
4. MIDI note number in score (int)

Example :
''1800 1800 0 75''
''2021 2022 187.5 73''
''... ... ... ...''

Remarks: The third column with the detected note's start time in score serves as the unique identifier of a note (or chord for polyphonic scores) that links it to the ground truth onset of that note within the reference alignment files. The fourth column of MIDI note number is there only for your convenience, to know your way around in the result files, if you know the melody in MIDI.

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of
dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the
information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks,
hard limits on the runtime of submissions are specified.

A hard limit of 12 hours will be imposed on rthe total runtime of algorithms. Submissions that exceed this runtime may not receive a result.

== Submission opening date ==
Friday 4th June 2010

== Submission closing date ==
TBA

2010:Audio Onset Detection

2010-06-05T09:20:01Z

Kriswest:

== Description ==

Audio Onset Detection concerns itself with finding the time-locations of all sonic events in a piece of audio. This task was originally proposed in 2005 by Paul Brossier and Pierre Leveau . It has subsequently been run in 2005, 2006, 2007, 2009.

== Data ==
=== Collections ===
The dataset will be the same as in 2005/2006/2007/2009 unless new or updated datasets are made available. The current dataset is subdivided into classes, because onset detection is sometimes performed in applications dedicated to a single type of signal (ex: segmentation of a single track in a mix, drum transcription, complex mixes databases segmentation...). The performance of each algorithm will be assessed on the whole dataset but also on each class separately.

The dataset contains 85 files from 5 classes annotated as follows:

* 30 solo drum excerpts cross-annotated by 3 people
* 30 solo monophonic pitched instruments excerpts cross-annotated by 3 people
* 10 solo polyphonic pitched instruments excerpts cross-annotated by 3 people
* 15 complex mixes cross-annotated by 5 people

Moreover the monophonic pitched instruments class is divided into 6 sub-classes: brass (2 excerpts), winds (4), sustained strings (6), plucked strings (9), bars and bells (4), singing voice (5).

=== Audio Formats ===

The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* file length between 2 and 36 seconds (total time: 14 minutes)

== Evaluation Procedures ==

The detected onset times will be compared with the ground-truth ones. For a given ground-truth onset time, if there is a detection in a tolerance time-window around it, it is considered as a correct detection (CD). If not, there is a false negative (FN). The detections outside all the tolerance windows are counted as false positives (FP). Doubled onsets (two detections for one ground-truth onset) and merged onsets (one detection for two ground-truth onsets) will be taken into account in the evaluation. Doubled onsets are a subset of the FP onsets, and merged onsets a subset of FN onsets.

We define:

*'''Precision''' P = Ocd / (Ocd +Ofp)
*'''Recall''' R = Ocd / (Ocd + Ofn)
*'''F-measure''' F = 2*P*R/(P+R)

with these notations:

*'''Ocd''' number of correctly detected onsets (CD)
*'''Ofn''' number of missed onsets (FN)
*'''Om''' number of merged onsets
*'''Ofp''' number of false positive onsets (FP)
*'''Od''' number of double onsets

Other indicative measurements:

*'''FP rate''' FP = 100. * (Ofp) / (Ocd+Ofp)
*'''Doubled Onset rate in FP''' D = 100 * Od / Ofp
*'''Merged Onset rate in FN''' M = 100 * Om / Ofn

Because files are cross-annotated, the mean Precision and Recall rates are defined by averaging Precision and Recall rates computed for each annotation.

To establish a ranking, we will use the F-measure, widely used in string comparisons. This criterion is arbitrary, but gives an indication of performance. It must be remembered that onset detection is a preprocessing step, so the real cost of an error of each type (false positive or false negative) depends on the application following this task.

=== Evaluation measures ===

* percentage of correct detections / false positives (can also be expressed as precision/recall)
* time precision (tolerance from +/- 50 ms to less). For certain file, we can't be much more accurate than 50 ms because of the weak annotation precision. This must be taken into account.
* separate scoring for different instrument types (percussive, strings, winds, etc)
* percentage of doubled detections
* speed measurements of the algorithms

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input Data ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 44.1 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV

=== Output Data ===

The onset detection algorithms will return onset times in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

==== Output File Format (Audio Onset Detection) ====

The Audio Onset Detection output file format is an ASCII text format. Each onset time is specified, in seconds, on its own line. Specifically,

<onset time(in seconds)>\n

where \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

0.243
1.476
1.987
2.449
3.224

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the onset detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, such as a detection threshold, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== Parameter Sweeps ===
In past iterations of MIREX, submitters have been allowed to specify a parameter sweep so as to generate a precision-recall operator characteristic to better evaluate and understand the algorithm. If you wish to do so, please specify TEN different settings for your sweepable parameter. There are no guarantees that all ten will be tested and evaluated, however, as the time-constraints for MIREX are getting ever smaller as the number of submissions are getting ever larger. Therefore, please also specify the ONE single parameterization you feel is best in the README. If the whole parameter sweep cannot be evaluated, this single parameterization will be used.

=== Packaging submissions ===

* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2010 Submission Instructions]]

All submissions should include a README file including the following the information:

* Command line calling format for all executables including examples
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
* Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
* Any special notice regarding to running your algorithm

Note that the information that you place in the README file is '''extremely''' important in ensuring that your submission is evaluated properly.

==== README File ====

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with different values for parameters param1, the README file would look like:

foobar -param1 .1 -i %input -o %output
foobar -param1 .15 -i %input -o %output
foobar -param1 .2 -i %input -o %output
foobar -param1 .25 -i %input -o %output
foobar -param1 .3 -i %input -o %output
...

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"
matlab -r "foobar(.15,'%input','%output');quit;"
matlab -r "foobar(.2,'%input','%output');quit;"
matlab -r "foobar(.25,'%input','%output');quit;"
matlab -r "foobar(.3,'%input','%output');quit;"
...

The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input' and '%output' strings.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 6 hours will be imposed on analysis times.

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Audio Key Detection

2010-06-05T09:18:35Z

Kriswest:

==Description==

Determination of the key is a prerequisite for any analysis of tonal music. As a result, extensive work has been done in the area of automatic key detection. The goal of this task is the identification of the key from music in audio format.

== Data ==
=== Collections ===
The collection used for this year's evaluation is the same as the one used in 2005. It consists of 1252 classical music audio pieces rendered from MIDI using the timidity MIDI synthesizer. The ground-truth key is drawn from the title of the piece. The entire piece is not used, but rather the first 30 seconds. This is done because usually the beginnings of pieces are in the labeled key before they possibly deviate due to key modulation.

=== Audio Formats ===

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)

== Evaluation Procedures ==
The error analysis will center on comparing the key identified by the algorithm to the actual key of the piece. The key of the piece is the one defined by the composer in the title of the piece. We will then determine how "close" each identified key is to the corresponding correct key. Keys will be considered as "close" if they have one of the following relationships: distance of perfect fifth, relative major and minor, and parallel major and minor. A correct key assignment will be given a full point, and incorrect assignments will be allocated fractions of a point according to the following table:

{|border="1"
|'''Relation to Correct Key''' ||'''Points'''
|-
|Same||1.0
|-
|Perfect fifth||0.5
|-
|Relative major/minor||0.3
|-
|Parallel major/minor||0.2
|-
|Other||0.0
|}

The points are counted over all files and averaged. The number of correctly identified keys as well as the distribution of the errors is also reported.

== Submission Format ==

Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input Data ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 44.1 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV

=== Output Data ===

The audio key detection algorithms will return the estimated key in an individual ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

=== Output File Format (Audio Key Detection) ===

The Audio Key Detection output file format is a single-line tab-delimited ASCII text format. The tonic is reported, followed by a TAB and the mode. For sharps, the "#" symbol is used (e.g. A# for A sharp), for flats, a lowercase "b" is used, e.g. (Bb for B flat). Therefore, the output file should be of the form:

<tonic {A, A#, Bb, ...}>\t<mode {major, minor}>\n

where \t denotes a tab, \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

C major

or

G# minor

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the melody extraction on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== Packaging submissions ===

* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2010 Submission Instructions]]

All submissions should include a README file including the following the information:

* Command line calling format for all executables including examples
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
* Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
* Any special notice regarding to running your algorithm

Note that the information that you place in the README file is '''extremely''' important in ensuring that your submission is evaluated properly.

==== README File ====

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with a specific value for parameter param1, the README file would look like:

foobar -param1 .1 -i %input -o %output

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 6 hours will be imposed on analysis times.

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Audio Onset Detection

2010-06-05T09:16:24Z

Kriswest:

== Description ==

Audio Onset Detection concerns itself with finding the time-locations of all sonic events in a piece of audio. This task was originally proposed in 2005 by Paul Brossier and Pierre Leveau . It has subsequently been run in 2005, 2006, 2007, 2009.

== Data ==
=== Collections ===
The dataset will be the same as in 2005/2006/2007/2009 unless new or updated datasets are made available. The current dataset is subdivided into classes, because onset detection is sometimes performed in applications dedicated to a single type of signal (ex: segmentation of a single track in a mix, drum transcription, complex mixes databases segmentation...). The performance of each algorithm will be assessed on the whole dataset but also on each class separately.

The dataset contains 85 files from 5 classes annotated as follows:

* 30 solo drum excerpts cross-annotated by 3 people
* 30 solo monophonic pitched instruments excerpts cross-annotated by 3 people
* 10 solo polyphonic pitched instruments excerpts cross-annotated by 3 people
* 15 complex mixes cross-annotated by 5 people

Moreover the monophonic pitched instruments class is divided into 6 sub-classes: brass (2 excerpts), winds (4), sustained strings (6), plucked strings (9), bars and bells (4), singing voice (5).

=== Audio Formats ===

The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* file length between 2 and 36 seconds (total time: 14 minutes)

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input Data ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 44.1 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV

=== Output Data ===

The onset detection algorithms will return onset times in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

==== Output File Format (Audio Onset Detection) ====

The Audio Onset Detection output file format is an ASCII text format. Each onset time is specified, in seconds, on its own line. Specifically,

<onset time(in seconds)>\n

where \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

0.243
1.476
1.987
2.449
3.224

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the onset detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, such as a detection threshold, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== Parameter Sweeps ===
In past iterations of MIREX, submitters have been allowed to specify a parameter sweep so as to generate a precision-recall operator characteristic to better evaluate and understand the algorithm. If you wish to do so, please specify TEN different settings for your sweepable parameter. There are no guarantees that all ten will be tested and evaluated, however, as the time-constraints for MIREX are getting ever smaller as the number of submissions are getting ever larger. Therefore, please also specify the ONE single parameterization you feel is best in the README. If the whole parameter sweep cannot be evaluated, this single parameterization will be used.

== Evaluation Procedures ==

The detected onset times will be compared with the ground-truth ones. For a given ground-truth onset time, if there is a detection in a tolerance time-window around it, it is considered as a correct detection (CD). If not, there is a false negative (FN). The detections outside all the tolerance windows are counted as false positives (FP). Doubled onsets (two detections for one ground-truth onset) and merged onsets (one detection for two ground-truth onsets) will be taken into account in the evaluation. Doubled onsets are a subset of the FP onsets, and merged onsets a subset of FN onsets.

We define:

*'''Precision''' P = Ocd / (Ocd +Ofp)
*'''Recall''' R = Ocd / (Ocd + Ofn)
*'''F-measure''' F = 2*P*R/(P+R)

with these notations:

*'''Ocd''' number of correctly detected onsets (CD)
*'''Ofn''' number of missed onsets (FN)
*'''Om''' number of merged onsets
*'''Ofp''' number of false positive onsets (FP)
*'''Od''' number of double onsets

Other indicative measurements:

*'''FP rate''' FP = 100. * (Ofp) / (Ocd+Ofp)
*'''Doubled Onset rate in FP''' D = 100 * Od / Ofp
*'''Merged Onset rate in FN''' M = 100 * Om / Ofn

Because files are cross-annotated, the mean Precision and Recall rates are defined by averaging Precision and Recall rates computed for each annotation.

To establish a ranking, we will use the F-measure, widely used in string comparisons. This criterion is arbitrary, but gives an indication of performance. It must be remembered that onset detection is a preprocessing step, so the real cost of an error of each type (false positive or false negative) depends on the application following this task.

=== Evaluation measures ===

* percentage of correct detections / false positives (can also be expressed as precision/recall)
* time precision (tolerance from +/- 50 ms to less). For certain file, we can't be much more accurate than 50 ms because of the weak annotation precision. This must be taken into account.
* separate scoring for different instrument types (percussive, strings, winds, etc)
* percentage of doubled detections
* speed measurements of the algorithms

== Packaging submissions ==

* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2010 Submission Instructions]]

All submissions should include a README file including the following the information:

* Command line calling format for all executables including examples
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
* Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
* Any special notice regarding to running your algorithm

Note that the information that you place in the README file is '''extremely''' important in ensuring that your submission is evaluated properly.

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with different values for parameters param1, the README file would look like:

foobar -param1 .1 -i %input -o %output
foobar -param1 .15 -i %input -o %output
foobar -param1 .2 -i %input -o %output
foobar -param1 .25 -i %input -o %output
foobar -param1 .3 -i %input -o %output
...

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"
matlab -r "foobar(.15,'%input','%output');quit;"
matlab -r "foobar(.2,'%input','%output');quit;"
matlab -r "foobar(.25,'%input','%output');quit;"
matlab -r "foobar(.3,'%input','%output');quit;"
...

The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input' and '%output' strings.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 6 hours will be imposed on analysis times.

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Symbolic Melodic Similarity

2010-06-05T09:13:24Z

Kriswest: /* Submission opening date */

== Description ==
The goal of SMS is to retrieve the most similar items from a collection of symbolic pieces, given a symbolic query, and rank them by melodic similarity. There will be only 1 task this year which comprises a set of six "base" monophonic MIDI queries to be matched against a monophonic MIDI collection.

Each system will be given a query and is asked to return the 10 most melodically similar songs from those taken from the Essen Collection (5274 pieces in the MIDI format; see [http://www.esac-data.org/ ESAC Data Homepage] for more information). For each of the six "base" queries, we have created four classes of error-mutations, thus the query set comprises the following query classes:

# No errors (i.e., "base")
# One note deleted
# One note inserted
# One interval enlarged
# One interval compressed

Each system will be asked to return the top ten items for each of the 30 total queries. That is to say, 6(base queries) X 5(versions) = 30 query/candidate lists to be returned.

== Data ==
* 5,274 tunes belonging to the Essen folksong collection. The tunes are in standard MIDI file format. [http://www.ldc.usb.ve/~cgomez/essen.tar.gz Download] (< 1 MB)

==Evaluation ==

The 2010 SMS task replicates the 2007 task. After the algorithms have been submitted, their results will be pooled for every query, and human evaluators, using the Evalutron 6000 system, will asked to judge the relevance of the matches to the queries.

For each query (and its four mutations), the returned results (candidates) from all systems will be anonymously grouped together (query set) for evaluation by the human graders. The graders will be provided with only the "base" perfect version against which to evaluate the candidates and thus did not know whether the candidates came from a perfect or mutated query. We expect that each query/candidate set will be evaluated by one individual grader. Using the Evalutron 6000 system, the graders will give each query/candidate pair two types of scores. Graders will be asked to provide one "BROAD" categorical score with three categories: NS,SS,VS as explained below, and one "FINE" score (in the range from 0 to 10).

For more information, do take a look at the [[2007:Symbolic_Melodic_Similarity_Results |2007 SMS Results Page]].

== Submission Format ==

=== Input ===

Parameters: 
- the name of a directory containing about 5,000 MIDI files containing monophonic folk songs and 
- the name of one MIDI file containing a monophonic query.

E.g.
myAlgo.sh /path/to/folder/withMIDIfile/ /path/to/query.mid

The program will be called once for each query.

=== Output ===

A list of the names of the 10 most similar matching MIDI files, ordered by melodic similarity. Write the file name in separate lines, without empty lines in between.

E.g.
query1.mid song242.mid song213.mid song1242.mid ...
query2.mid song5454.mid song423.mid song454.mid ...
...

=== Packaging submissions ===

* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2010 Submission Instructions]]

All submissions should include a README file including the following the information:

* Command line calling format for all executables including examples
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
* Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
* Any special notice regarding to running your algorithm

Note that the information that you place in the README file is '''extremely''' important in ensuring that your submission is evaluated properly.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 24 hours will be imposed on feature extraction times.

A hard limit of 48 hours will be imposed on the 3 training/classification cycles, leading to a total runtime limit of 72 hours for each submission.

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Audio Music Similarity and Retrieval

2010-06-05T09:13:01Z

Kriswest: /* Submission opening date */

== Description ==
As the size of digitial music collections grow, music similarity has an increasingly important role as an aid to music discovery. A music similarity system can help a music consumer find new music by finding the music that is most musically similar to specific query songs (or is nearest to songs that the consumer already likes).

This page presents the Audio Music Similarity Evaluation, including the submission rules and formats. Additionally background information can be found here that should help explain some of the reasoning behind the approach taken in the evaluation. The intention of the Music Audio Search track is to evaluate music similarity searches (A music search engine that takes a single song as a query aka Query-by-example), not playlist generation or music recommendation.

The Audio Music Similarity and Retrieval task has been run in MIREX 2007 and 2006.

[[2007:Audio_Music_Similarity_and_Retrieval|Audio Music Similarity and Retrieval task in MIREX 2007]] || [[2007:Audio_Music_Similarity_and_Retrieval_Results|Results]]

[[2006:Audio_Music_Similarity_and_Retrieval|Audio Music Similarity and Retrieval task in MIREX 2006]] || [[2006:Audio_Music_Similarity_and_Retrieval_Results|Results]]

=== Task specific mailing list ===
In the past we have use a specific mailing list for the discussion of this task and related tasks (e.g., [[2010:Audio Classification (Train/Test) Tasks]], [[2010:Audio Cover Song Identification]], [[2010:Audio Tag Classification]], [[2010:Audio Music Similarity and Retrieval]]). This year, however, we are asking that all discussions take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have an question or comment, simply include the task name in the subject heading.

== Data ==
Collection statistics: 7000 30-second audio clips drawn from 10 genres (700 clips from each genre).

The Genres that data was drawn from are:
*Blues
*Jazz
*Country/Western
*Baroque
*Classical
*Romantic
*Electronica
*Hip-Hop
*Rock
*HardRock/Metal

=== Audio formats ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 22 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV
* clip length: 30 secs from the middle of each file

== Evaluation ==
Two distinct evaluations will be performed
* Human Evaluation
* Objective statistics derived from the results lists

Note that at MIREX 2006 particpating algorithms were required to return full distance matrices showing the distance between all tracks, however, in subsequent years we have also supported sparse distance matrix format (detailed below) where only the distances of the top 100 results for each query in the collection are returned.

=== Human Evaluation ===
The primary evaluation will involve subjective judgments by human evaluators of the retrieved sets using IMIRSEL's Evalutron 6000 system. This year algorithms will be presented with the same 30 second preview clip that will be reviewed by the human evaluators.

* Evaluator question: Given a search based on track A, the following set of results was returned by all systems. Please place each returned track into one of three classes (not similar, somewhat similar, very similar) and provide an inidcation on a continuous scale of 0 - 10 of high similar the track is to the query.
* ~120 randomly selected queries, 5 results per query, 1 set of eyes, ~10 participating labs
* Higher number of queries preferred as IR research indicates variance is in queries
* The songs by the same artist as the query will be filtered out of each result list (artist-filtering) to avoid colouring an evaluators judgement (a cover song or song by the same artist in a result list is likely to reduce the relative ranking of other similar but independent songs - use of songs by the same artist may allow over-fitting to affect the results)
* It will be possible for researchers to use this data for other types of system comparisons after MIREX 2010 results have been finalized.
* Human evaluation to be designed and led by IMIRSEL following a similar format to that used at MIREX 2006 (see: [[2006:Evalutron6000_Issues|Evalutron Issues in MIREX 2006]]).
* Human evaluators will be drawn from the participating labs (and any volunteers from IMIRSEL or on the MIREX lists)

=== Objective Statistics derived from the distance matrix ===
Statistics of each distance matrix will be calculated including:

* Average % of Genre, Artist and Album matches in the top 5, 10, 20 & 50 results - Precision at 5, 10, 20 & 50
* Average % of Genre matches in the top 5, 10, 20 & 50 results after artist filtering of results
* Average % of available Genre, Artist and Album matches in the top 5, 10, 20 & 50 results - Recall at 5, 10, 20 & 50 (just normalising scores when less than 20 matches for an artist, album or genre are available in the database)
* Always similar - Maximum # times a file was in the top 5, 10, 20 & 50 results
* % File never similar (never in a top 5, 10, 20 & 50 result list)
* % of 'test-able' song triplets where triangular inequality holds
** Note that as we are not requiring full distance matrices this year we will only be testing triangles that are found in the sparse distance matrix.
* Plot of the "number of times similar curve" - plot of song number vs. number of times it appeared in a top 20 list with songs sorted according to number times it appeared in a top 20 list (to produce the curve). Systems with a sharp rise at the end of this plot have "hubs", while a long 'zero' tail shows many never similar results.

=== Runtimes ===
In addition computation times for feature extraction/Index-building and querying
will be measured.

== Submission format ==
Submission to this task will have to conform to a specified format detailed below.

=== Implementation details ===
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.

Submissions will have to output either a full distance matrix or a search results file with the top 100 search results for each track in the collection. This list of results will be used to extract the artist-filtered results to present to the human evaluators and will facilitate the computation of the objective statistics.

=== I/O formats ===
In this section the input and output files used in this task are described as
are the command line calling format requirements for submissions.

==== Audio collection list file (input)====
The list file passed for feature extraction and indexing will be a simple ASCII list file. This file will contain one path per line with no header line, all paths will be absolute (full paths).

e.g.

/aDirectory/collectionFolder/b002342.wav
/aDirectory/collectionFolder/a005921.wav
...

==== Distance matrix output files ====
Participants should return one of two available output file formats, a full distance matrix or a sparse distance matrix. The sparse distance matrix format is preferred (as the dense distance matrices can be very large).

===== Sparse Distance Matrix =====
If computation or exhaustive search is a concern or not a normal output of the indexing algorithm employed, the sparse distance matric format detailed below may be used:

A simple ASCII file listing a name for the algorithm and the top 100 search results for every track in the collection.

This file should start with a header line with a name for the algorithm and should be followed by the results for one query per line, prefixed by the filename portion of the query path. This should be followed by a tab character and a tab separated, ordered list of the top 100 search results. Each result should include the result filename (e.g. a034728.wav) and the distance (e.g. 17.1 or 0.23) separated by a a comma.

<pre>
MyAlgorithm (my.email@address.com)
<example 1 filename>\t<result 1 name>,<result 1 distance>,\t<result 2 name>,<result 2 distance>, ... \t<result 100 name>,<result 100 distance>
<example 2 filename>\t<result 1 name>,<result 1 distance>,\t<result 2 name>,<result 2 distance>, ... \t<result 100 name>,<result 100 distance>
...
</pre>

which might look like:

<pre>
MyAlgorithm (my.email@address.com)
a009342.wav b229311.wav,0.16 a023821.wav,0.19 a001329,0.24 ... etc.
a009343.wav a661931.wav,0.12 a043322.wav,0.17 c002346,0.21 ... etc.
a009347.wav a671239.wav,0.13 c112393.wav,0.20 b083293,0.25 ... etc.
...
</pre>

The path to which this list file should be written must be accepted as a parameter on the command line.

===== Full Distance Matrix =====
Full distance matrix files should be generated in the the following format:

* A simple ASCII file listing a name for the algorithm on the first line,
* Numbered paths for each file appearing in the matrix, these can be in any order (i.e. the files don't have to be i the same order as they appeared in the list file) but should index into the columns/rows of of the distance matrix.
* A line beginning with 'Q/R' followed by a tab and tab separated list of the numbers 1 to N, where N is the files covered by the matrix.
* One line per file in the matrix give the distances of that files to each other file in the matrix. All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a single tab character. Obviously the diagonal of the matrix (distance or a track to itself) should be zero.

<pre>
Distance matrix header text with system name
1\t</path/to/audio/file/1.wav>
2\t</path/to/audio/file/2.wav>
3\t</path/to/audio/file/3.wav>
...
N\t</path/to/audio/file/N.wav>
Q/R\t1\t2\t3\t...\tN
1\t0.0\t<dist 1 to 2>\t<dist 1 to 3>\t...\t<dist 1 to N>
2\t<dist 2 to 1>\t0.0\t<dist 2 to 3>\t...\t<dist 2 to N>
3\t<dist 3 to 2>\t<dist 3 to 2>\t0.0\t...\t<dist 3 to N>
...\t...\t...\t...\t...\t...
N\t<dist N to 1>\t<dist N to 2>\t<dist N to 3>\t...\t0.0
</pre>

which might look like:

<pre>
Example distance matrix 0.1
1 /path/to/audio/file/1.wav
2 /path/to/audio/file/2.wav
3 /path/to/audio/file/3.wav
4 /path/to/audio/file/4.wav
Q/R 1 2 3 4
1 0.00000 1.24100 0.2e-4 0.42559
2 1.24100 0.00000 0.62640 0.23564
3 50.2e-4 0.62640 0.00000 0.38000
4 0.42559 0.23567 0.38000 0.00000
</pre>

==== Example submission calling formats ====
extractFeatures.sh /path/to/scratch/folder /path/to/collectionListFile.txt
Query.sh /path/to/scratch/folder /path/to/collectionListFile.txt /path/to/outputResultsFile.txt

or

doAudioSim.sh -numThreads 8 /path/to/scratch/folder /path/to/collectionListFile.txt /path/to/outputResultsFile.txt

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of
dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the
information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks,
hard limits on the runtime of submissions are specified.

A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Audio Tag Classification

2010-06-05T09:12:38Z

Kriswest: /* Submission opening date */

__TOC__

== Description ==
This task will compare various algorithms' abilities to associate descriptive tags with 10-second audio clips of songs. Two datasets are used to implement a pair of sub tasks, based on the MajorMiner and Mood tag datasets. This task is very much related to the other audio classification tasks, however, multiple tags may be applied to each example rather than single-label classification.

Algorithms will be evaluated both on their ability to apply binary classifications of tags to examples, but also on their ability to rank tags for a track by asking them to return an affinity score for each tag/track pair.

Audio tag classification was first run at MIREX 2008 [https://www.music-ir.org/mirex/2008/index.php/Audio_Tag_Classification https://www.music-ir.org/mirex/2008/index.php/Audio_Tag_Classification] and as a special MIREX task at 2009
[https://www.music-ir.org/mirex/2009/index.php/SpecialTagatuneEvaluation https://www.music-ir.org/mirex/2010/index.php/SpecialTagatuneEvaluation] .

=== Task specific mailing list ===
A specific mailing list is provided for the discussion of this task and related tasks ( [[2010:Audio Classification (Test/Train) tasks]], [[2010:Audio_Cover_Song_Identification]], [[2010:Audio_Tag_Classification]], [[2010:Audio_Music_Similarity_and_Retrieval]]) at: [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00 https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00]. If you wish to participate in any of these tasks please sign up to this mailing list as discussion of the task format and evaluation should be conducted there.

== Data ==
Two datasets will be used to evaluate tagging algorithms: The MajorMiner and Mood tag datasets.

=== MajorMiner Tag Dataset ===
The tags come from the [http://majorminer.org MajorMiner game].
All of the data is browseable via the [http://majorminer.org/search MajorMiner search] page.

The music consists of 2300 clips selected at random from 3900 tracks. Each clip is 10 seconds long. The 2300 clips represent a total of 1400 different tracks on 800 different albums by 500 different artists. To give a sense for the music collection, the following genre tags have been applied to these artists, albums, and tracks on Last.fm: electronica, rock, indie, alternative, pop, britpop, idm, new wave, hip-hop, singer-songwriter, trip-hop, post-punk, ambient, jazz.

The MajorMiner game has collected a total of about 73000 taggings, 12000 of which have been verified by at least two users. In these verified taggings, there are 43 tags that have been verified at least 35 times, for a total of about 9000 verified uses. These are the tags we will be using in this task.

Note that these data do not include strict negative labels. While many clips are tagged ''rock'', none are tagged ''not rock''. Frequently, however, a clip will be tagged many times without being tagged ''rock''. We take this as an indication that ''rock'' does not apply to that clip. More specifically, a negative example of a particular tag is a clip on which another tag has been verified, but the tag in question has not.

Here is a list of the top 50 tags along with an approximate number of times each has been verified, how many times it's been used in total, and how many different users have ever used it:

{| class="wikitable" style="margin: 1em auto 1em auto"
! Tag || Verified || Total || Users
|-
| drums || 962 || 3223 || 127
|-
| guitar || 845 || 3204 || 181
|-
| male || 724 || 2452 || 95
|-
| rock || 658 || 2619 || 198
|-
| synth || 498 || 1889 || 105
|-
| electronic || 490 || 1878 || 131
|-
| pop || 479 || 1761 || 151
|-
| bass || 417 || 1632 || 99
|-
| vocal || 355 || 1378 || 99
|-
| female || 342 || 1387 || 100
|-
| dance || 322 || 1244 || 115
|-
| techno || 246 || 943 || 104
|-
| piano || 179 || 826 || 120
|-
| electronica || 168 || 686 || 67
|-
| hip hop || 166 || 701 || 126
|-
| voice || 160 || 790 || 55
|-
| slow || 157 || 727 || 90
|-
| beat || 154 || 708 || 90
|-
| rap || 151 || 723 || 129
|-
| jazz || 136 || 735 || 154
|-
| 80s || 130 || 601 || 94
|-
| fast || 109 || 494 || 70
|-
| instrumental || 103 || 539 || 62
|-
| drum machine || 89 || 427 || 35
|-
| british || 81 || 383 || 60
|-
| country || 74 || 360 || 105
|-
| distortion || 73 || 366 || 55
|-
| saxophone || 70 || 316 || 86
|-
| house || 65 || 298 || 66
|-
| ambient || 61 || 335 || 78
|-
| soft || 61 || 351 || 58
|-
| silence || 57 || 200 || 35
|-
| r&b || 57 || 242 || 59
|-
| strings || 55 || 252 || 62
|-
| quiet || 54 || 261 || 57
|-
| solo || 53 || 268 || 56
|-
| keyboard || 53 || 424 || 41
|-
| punk || 51 || 242 || 76
|-
| horns || 48 || 204 || 38
|-
| drum and bass || 48 || 191 || 50
|-
| noise || 46 || 249 || 61
|-
| funk || 46 || 266 || 90
|-
| acoustic || 40 || 193 || 58
|-
| trumpet || 39 || 174 || 68
|-
| end || 38 || 178 || 36
|-
| loud || 37 || 218 || 62
|-
| organ || 35 || 169 || 46
|-
| metal || 35 || 178 || 64
|-
| folk || 33 || 195 || 58
|-
| trance || 33 || 226 || 49
|}

=== Mood Tag Dataset ===
The Mood tag dataset is derived from mood related tags on last.fm. All tags in this set are identified by a general affect lexicon (WordNet-Affect) and by human experts. Similar tags are grouped together to define a mood tag group and each song may belong to multiple mood tag groups.

There are 18 mood tag groups containing 135 unique tags. The dataset contains 3,469 unique songs. The following table lists the tag groups, their member tags and number of songs in each group:

{| class="wikitable" style="margin: 1em auto 1em auto"
! Group id || Tags || num. of tags || num. of songs
|-
| G12 || calm, comfort, quiet, serene, mellow, chill out, calm down, calming, chillout, comforting, content, cool down, mellow music, mellow rock, peace of mind, quietness, relaxation, serenity, solace, soothe, soothing, still, tranquil, tranquility, tranquility || 25 || 1,680
|-
| G15 || sad, sadness, unhappy, melancholic, melancholy, feeling sad, mood: sad - slightly, sad song || 8 || 1,178
|-
| G5 || happy, happiness, happy songs, happy music, glad, mood: happy || 6 || 749
|-
| G32 || romantic, romantic music || 2 || 619
|-
| G2 || upbeat, gleeful, high spirits, zest, enthusiastic, buoyancy, elation, mood: upbeat|| 8 || 543
|-
| G16 || depressed, blue, dark, depressive, dreary, gloom, darkness, depress, depression, depressing, gloomy || 11 || 471
|-
| G28 || anger, angry, choleric, fury, outraged, rage, angry music || 7 || 254
|-
| G17 || grief, heartbreak, mournful, sorrow, sorry, doleful, heartache, heartbreaking, heartsick, lachrymose, mourning, plaintive, regret, sorrowful || 14 || 183
|-
| G14 || dreamy || 1 || 146
|-
| G6 || cheerful, cheer up, festive, jolly, jovial, merry, cheer, cheering, cheery, get happy, rejoice, songs that are cheerful, sunny || 13 || 142
|-
| G8 || brooding, contemplative, meditative, reflective, broody, pensive, pondering, wistful || 8 || 116
|-
| G29 || aggression, aggressive || 2 || 115
|-
| G25 || angst, anxiety, anxious, jumpy, nervous, angsty || 6 || 80
|-
| G9 || confident, encouraging, encouragement, optimism, optimistic || 5 || 61
|-
| G7 || desire, hope, hopeful, mood: hopeful || 4 || 45
|-
| G11 || earnest, heartfelt || 2 || 40
|-
| G31 || pessimism, cynical, pessimistic, weltschmerz, cynical/sarcastic || 5 || 38
|-
| G1 || excitement, exciting, exhilarating, thrill, ardor, stimulating, thrilling, titillating || 8 || 30
|-
| TOTAL || || 135 || 6,490
|}

The songs are mostly from the USPOP collection, a detailed breakdown of the songs are listed in the following table:

{| class="wikitable" style="margin: 1em auto 1em auto"
! Collection || num. of songs in the dataset || percentage of songs in the dataset
|-
| USPOP || 2764 || 80%
|-
| Assorted pop || 366 || 10%
|-
| American music || 145 || 4%
|-
| Beatles || 128 || 4%
|-
| USCRAP || 40 || 1%
|-
| Metal music || 25 || 1%
|-
| Magnatune || 1 || 0%
|-
| TOTAL || 3469 || 100%
|}

Details on how the mood tag groups were derived are described in [https://www.music-ir.org/archive/papers/ISMIR2009_MoodClassification.pdf X. Hu, J. S. Downie, A.Ehmann, Lyric Text Mining in Music Mood Classification, In Proceedings of the 10th International Symposium on Music Information Retrieval (ISMIR), Oct. 2009, Kobe , Japan]

Details on how the songs were selected are available in the [https://www.music-ir.org/archive/papers/Mood_Multi_Tag_Data_Description.pdf description].

== Evaluation ==
Participating algorithms will be evaluated with 3-fold artist-filtered cross-validation. An introduction to the evaluation statistics computed is given in the following subsections.

=== Binary (Classification) Evaluation ===
Algorithms are evaluated on their performance at tag classification using F-measure. Results are also reported for simple accuracy, however, as this statistic is dominated by the negative example accuracy it is not a reliable indicator of performance (as a system that returns no tags for any example will achieve a high score on this statistic). However, the accuracies are also reported for positive and negative examples separately as these can help elucidate the behaviour of an algorithm (for example demonstrating if the system is under or over predicting).

=== Affinity (Ranking) Evaluation ===
Algorithms are evaluated on their performance at tag ranking using the Area Under the Receiver Operating Characteristic Curve (AUC-ROC). The affinity scores for each tag to be applied to a track are sorted prior to the computation of the AUC-ROC statistic, which gives higher scores to ranked tag sets where the correct tags appear towards the top of the set.

=== Ranking and significance testing ===
Additionally, more standard tests could be performed on the average classification accuracy, although the cross-tag variance tends to increase each algorithm's variance, interfering with significance tests without further handling. One test that can help resolve these issues is Friedman's ANOVA with Tukey-Kramer HSD.

We wish to compare a number of treatments/systems (the submissions) over a number of blocks/rows. We can either compute average classification accuracy and/or precision metrics over all the tags and use the cross validation folds as the blocks/rows - which will handle variance between different folds. However, we are more interested in considering each tag (averaged over all folds) or (perhaps better) each tag on each fold as a separate block.

The Friedman test should handle the variance between tags (caused by different difficulties of modeling each tag and different numbers of positive and negative examples per tag) by replacing the actual scores achieved by each system on each block (tag) with the rank achieved by that system on that tag amongst all the systems. Hence, we make the assumption that each tag (or combination of tag and fold) is of equal importance in the evaluation. This is an often used approach at TREC (Text Retrieval Conference) when considering retrieval results (where each query is of equal importance, but unequal variance/difficulty).

Tukey-Kramer Honestly Significant Difference multiple comparisons are made over the results of Friedman's ANOVA as this (and other tests, such as multiply applied Student's T-tests) can only safely tell you if one system is statistically significantly different from the rest. If you try to do the full NxN comparisons with such tests then the experiment wide alpha value is cumulative over all the tests. E.g. if we compared 12 systems at an alpha level of 0.05, a total of 66 pairwise comparisons are made and the chance of incorrectly rejecting the hypothesis of no difference in error rates is: 1 - (0.95^66) = 0.97 = 97%. This explanation is lifted from a paper by Tague-Sutcliffe and Blustein:

@article{taguesutcliffe1995sat,
title={A Statistical Analysis of the TREC-3 Data},
author={Tague-Sutcliffe, J. and Blustein, J.},
journal={Overview of the Third Text Retrieval Conference (Trec-3)},
year={1995},
publisher={DIANE Publishing}
}

For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see:

@InProceedings{jones2007hsj,
title={"Human Similarity Judgments: Implications for the Design of Formal Evaluations"},
author="M.C. Jones and J.S. Downie and A.F. Ehmann",
BOOKTITLE ="Proceedings of ISMIR 2007 International Society of Music Information Retrieval",
year="2007"
}

=== Runtime performance ===
In addition computation times for feature extraction and training/classification will be measured.

== Submission format ==
Submission to this task will have to conform to a specified format detailed below, which is very similar to the audio genre classification task, among others.

=== Audio formats ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 44 KHz
* Sample size: 16 bit
* Number of channels: 2 (stereo)
* Encoding: WAV (decoded from MP3 files by IMIRSEL)
* Duration: 10 second clips

=== Implementation details ===
Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. For feature extraction and classification this file will contain one path per line with no header line. For model training this file will contain one path per line, followed by a tab character and the tag label, again with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Algorithms should divide their feature extraction and training/classification into separate executables/scripts. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.

Multi-processor compute nodes (8 cores) will be used to run this task. Hence, participants should attempt to use parallelism where-ever possible. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.

=== I/O formats ===
In this section the input and output files used in this task are described as are the command line calling format requirements for submissions.

==== Feature extraction list file ====
The list file passed for feature extraction will be a simple ASCII list file. This file will contain one path per line with no header line.

I.e.
<example path and filename>

E.g.
/path/to/track1.wav
/path/to/track2.wav
...

==== Training list file ====
The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and a tag label, again with no header line.

I.e.

<example path and filename>\t<tag classification>\n

E.g.
/path/to/track1.wav drum
/path/to/track1.wav silence
...

In this way, the input file will represent the sparse ground truth matrix. While no line will be duplicated, multiple lines may contain the same path, one for each tag associated with that clip. Any tag that is not specified as applying to a clip does not apply to that clip. The ordering of the lines is arbitrary and should not be depended upon.

==== Test (classification) list file ====
The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.

I.e.
<example path and filename>

E.g.
/path/to/track1.wav
/path/to/track2.wav
...

==== Classification output files ====
Participating algorithms should produce '''two''' simple ASCII list files similar in format to the Training list file. The path to which each list file should be written must be accepted as a parameter on the command line.

===== Tag Affinity file =====
The first file will contain one path per line, followed by a tab character and the tag label, followed by another tab character and the affinity of that tag for that file, again with no header line.

I.e.:

<example path and filename>\t<tag classification>\t<affinity>\n

E.g.:

/data/file1.wav rock 0.9
/data/file1.wav guitar 0.7
/data/file1.wav vocal 0.3
/data/file2.wav rock 0.5
...

In this way, the output file will represent the sparse classification matrix. A path should be repeated on a separate line for each tag that the submission deems applies to it. If a (path, tag) pair is not specified, it will be assumed to have an affinity of 0. The ordering of the lines is not important and can be arbitrary.

The affinity will be used for retrieval evaluation metrics, and its only specification is that for a given tag, larger (closer to +infinity) numbers indicate that the tag is more appropriate to a clip than smaller (closer to -infinity) numbers. As submissions are asked to also return a binary relevance listing, submissions that do not compute an affinity should provide only the binary relevance listing file.

===== Binary relevance file =====
The second file to be produced is a binary version of the tag classifications, where a tag must be marked as relevant or not relevant to a track. This file will contain one path per line, followed by a tab character and the tag label, followed by another tab character and either a 1 or a 0 indicating the relevance of that tag for that file, again with no header line.

I.e.:

<example path and filename>\t<tag classification>\t<relevant? [0 | 1]>\n

E.g.:

/data/file1.wav rock 1
/data/file1.wav guitar 1
/data/file1.wav vocal 0
/data/file2.wav rock 1
...

If a (path, tag) pair is not specified, it will be assumed to be non-relevant (0). Any line with path but no numerical value will be assumed to be relevant (1).

Hence, the following is equivalent to the example above:

/data/file1.wav rock
/data/file1.wav guitar
/data/file2.wav rock

The ordering of the lines is not important and can be arbitrary.

=== Example submission calling formats ===
extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputAffinityFile.txt /path/to/outputBinaryRelevanceFile.txt

extractFeatures.sh -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
TrainAndClassify.sh -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputAffinityFile.txt /path/to/outputBinaryRelevanceFile.txt

extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
Train.sh /path/to/scratch/folder /path/to/trainListFile.txt
Classify.sh /path/to/scratch/folder /path/to/testListFile.txt /path/to/outputAffinityFile.txt /path/to/outputBinaryRelevanceFile.txt

myAlgo.sh -extract -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
myAlgo.sh -TrainAndClassify -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputAffinityFile.txt /path/to/outputBinaryRelevanceFile.txt

myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt
myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt
myAlgo.sh -classify /path/to/scratch/folder /path/to/testListFile.txt /path/to/outputAffinityFile.txt /path/to/outputBinaryRelevanceFile.txt

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed).

All submissions should include a README file including the following the information:

* Command line calling format for all executables
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
* Any required environments libraries and architectures (including version information) such as Matlab, Java, Python, Bash, Ruby etc.
* Any special notice regarding to running your algorithm

Note that the information that you place in the README file is extremely important in ensuring that your submission is evaluated properly.

=== Time and hardware limits ===
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be specified.

A hard limit of 72 hours will be imposed on the full execution of a submission on each dataset (to include feature extraction time and the 3 training/testing cycles required for the 3-fold cross-validated experiment.

These limits will likely be strictly imposed at MIREX 2010 (due to the very high level of participation that is expected).

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Audio Cover Song Identification

2010-06-05T09:12:27Z

Kriswest: /* Submission opening date */

__TOC__

==Description==
This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or "cover songs".

Within the a collection of pieces in the cover song datasets, there are embedded a number of different "original songs" or compositions each represented by a number of different "versions". The "cover songs" or "versions" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.

Using each of these version files in turn as as the "seed/query" file, we examine the returned ranked lists of items from each algorithm for the presence of the other versions of the "seed/query" file.

Two datasets are used in this task, the MIREX 2006 US Pop Music Cover Song dataset Audio Cover Song dataset the [http://www.mazurka.org.uk/ Mazurka dataset].

=== Task specific mailing list ===
In the past we have use a specific mailing list for the discussion of this task and related tasks (e.g., [[2010:Audio Classification (Train/Test) Tasks]], [[2010:Audio Cover Song Identification]], [[2010:Audio Tag Classification]], [[2010:Audio Music Similarity and Retrieval]]). This year, however, we are asking that all discussions take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have an question or comment, simply include the task name in the subject heading.

== Data ==
Two datasets will be used to evaluate cover song identification:

===US Pop Music Collection Cover Song (aka Mixed Collection)===
This is the "original" ACS collection. Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different "cover songs" each represented by 11 different "versions" for a total of 330 audio files.

Using each of these cover song files in turn as as the "seed/query" file, we will examine the returned lists of items for the presence of the other 10 versions of the "seed/query" file.

Collection statistics:
* 16bit, monophonic, 22.05khz, wav
* The "cover songs" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.
* Size: 1000 tracks
* Queries: 330 tracks

=== Sapp's Mazurka Collection Information ===
In addition to our original ACS dataset, we used the [http://www.mazurka.org.uk/ Mazurka.org dataset] put together by Craig Sapp. We randomly chose 11 versions from 49 mazurkas and ran it as a separate ACS subtask. Systems should return a distance matrix of 539x539 from which we located the ranks of each of the associated cover versions.

Collection statistics:
* 16bit, monophonic, 22.05khz, wav
* Size: 539 tracks
* Queries: 539 tracks

== Evaluation ==
The following evaluation metrics will be computed for each submission:
* Total number of covers identified in top 10
* Mean number of covers identified in top 10 (average performance)
* Mean (arithmetic) of Avg. Precisions
* Mean rank of first correctly identified cover

=== Ranking and significance testing ===
Friedman's ANOVA with Tukey-Kramer HSD will be run against the Average Precision summary data over the individual song groups to assess the significance of differences in performance and to rank the performances.

For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see:
@InProceedings{jones2007hsj,
title={"Human Similarity Judgements: Implications for the Design of Formal Evaluations"},
author="M.C. Jones and J.S. Downie and A.F. Ehmann",
BOOKTITLE ="Proceedings of ISMIR 2007 International Society of Music Information Retrieval",
year="2007"
}

=== Runtime performance ===
In addition computation times for feature extraction and training/classification will be measured.

== Submission Format ==
Submission to this task will have to conform to a specified format detailed below.

=== Implementation details ===
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.

=== I/O formats ===
=== Input Files ===

The feature extraction list file format will be of the form:

/path/to/audio/file/000.wav\n
/path/to/audio/file/001.wav\n
/path/to/audio/file/002.wav\n
...

The query list file format will be very similar, taking the form, and listing a subset of files from the feature extraction list file:

/path/to/audio/file/182.wav\n
/path/to/audio/file/245.wav\n
/path/to/audio/file/432.wav\n
...

For a total of ''<number of queries>'' rows -- query ids are assigned from the pool of ''<number of candidates>'' collection ids and should match the ids within the candidate collection.

Lines will be terminated by a '\n' character.

=== Output File ===
The only output will be a '''distance''' matrix file that is ''<number of queries>'' rows by ''<number of candidates>'' columns in the following format:

<pre>
Distance matrix header text with system name
1\t</path/to/audio/file/track1.wav>
2\t</path/to/audio/file/track2.wav>
3\t</path/to/audio/file/track3.wav>
4\t</path/to/audio/file/track4.wav>
...
N\t</path/to/audio/file/trackN.wav>
Q/R\t1\t2\t3\t4\t...\tN
1\t<dist 1 to 1>\t<dist 1 to 2>\t<dist 1 to 3>\t<dist 1 to 4>\t...\t<dist 1 to N>
3\t<dist 3 to 2>\t<dist 3 to 2>\t<dist 3 to 3>\t<dist 3 to 4>\t...\t<dist 3 to N>
</pre>

where N is <number of candidates> and the queries are drawn from this set (and bear the same track indexes if possible).

which might look like:

<pre>
Example distance matrix 0.1
1 /path/to/audio/file/track1.wav
2 /path/to/audio/file/track2.wav
3 /path/to/audio/file/track3.wav
4 /path/to/audio/file/track4.wav
5 /path/to/audio/file/track5.wav
Q/R 1 2 3 4 5
1 0.00000 1.24100 0.2e-4 0.42559 0.21313
3 50.2e-4 0.62640 0.00000 0.38000 0.15152
</pre>

Note that indexes of the queries refer back to the track list at the top of the distance matrix file to identify the query track. However, as long as you ensure that the query songs are listed in exactly the same order as they appear in the query list file you are passed we will be able to interpret the data.

All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a TAB.

To summarize, the distance matrix should be preceded by a system name, ''<number of candidates>'' rows of file paths and should be composed of ''<number of candidates>'' columns of distance (separated by tab characters) and ''<number of queries>'' rows (one for each original track query). Each row corresponds to a particular query song (the track to find covers of).

=== Command Line Calling Format ===

/path/to/submission <collection_list_file> <query_list_file> <working_directory> <output_file>
'''<collection_list_file>''': Text file containing ''<number of candidates>'' full path file names for the
''<number of candidates>'' audio files in the collection (including the ''<number of queries>''
query documents).
'''Example: /path/to/coversong/collection.txt'''
'''<query_list_file>''' : Text file containing the ''<number of queries>'' full path file names for the
''<number of queries>'' query documents.
'''Example: /path/to/coversong/queries.txt'''
'''<working_directory>''' : Full path to a temporary directory where submission will
have write access for caching features or calculations.
'''Example: /tmp/submission_id/'''
'''<output_file>''' : Full path to file where submission should output the similarity
matrix (''<number of candidates>'' header rows + ''<number of queries>'' x ''<number of candidates>'' data matrix).
'''Example: /path/to/coversong/results/submission_id.txt'''

E.g.
/path/to/m/submission.sh /path/to/feat_extract_file.txt /path/to/query_file.txt /path/to/scratch/dir /path/to/output_file.txt

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.

A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

== Submission opening date ==

Friday 4th June 2010

== Submission closing date ==
TBA

2010:Audio Classification (Train/Test) Tasks

2010-06-05T09:11:51Z

Kriswest: /* Submission opening date */

== Description ==
Many tasks in music classification can be characterized into a two-stage process: training classification models using labeled data and testing the models using new/unseen data. Therefore, we propose this "meta" task which includes various audio classification tasks that follow this Train/Test process. For MIREX 2010, five classification sub-tasks are included:

*Audio Artist Identification
*Audio Classical Composer Identification
*Audio US Pop Music Genre Classification
*Audio Latin Music Genre Classification
*Audio Mood Classification

All five classification tasks were conducted in previous MIREX runs (please see [[#Links to Previous MIREX Runs of These Classification Tasks]]). This page presents the evaluation of these tasks, including the datasets as well as the submission rules and formats.

=== Task specific mailing list ===
In the past we have use a specific mailing list for the discussion of this task and related tasks (e.g., [[2010:Audio Classification (Train/Test) Tasks]], [[2010:Audio Cover Song Identification]], [[2010:Audio Tag Classification]], [[2010:Audio Music Similarity and Retrieval]]). This year, however, we are asking that all discussions take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have an question or comment, simply include the task name in the subject heading.

== Data ==

=== Audio Artist Identification ===
This dataset requires algorithms to classify music audio according to the performing artist. The collection used at MIREX 2009 will be re-used.

Collection statistics:
* 3150 30-second 22.05kHz mono wav audio clips drawn from a collection US Pop music.
* 105 artists (30 clips per artist drawn from 3 albums).

=== Audio Classical Composer Identification ===
This dataset requires algorithms to classify music audio according to the composer of the track (drawn from a collection of performances of a variety of classical music genres). The collection used at MIREX 2009 will be re-used.

Collection statistics:
* 2772 30-second 22.05 kHz mono wav clips
* 11 "classical" composers (252 clips per composer), including:
** Bach
** Beethoven
** Brahms
** Chopin
** Dvorak
** Handel
** Haydn
** Mendelssohn
** Mozart
** Schubert
** Vivaldi

=== Audio US Pop Music Genre Classification ===
This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of US Pop music tracks). The MIREX 2007 Genre dataset will be re-used, which was drawn from the USPOP 2002 and USCRAP collections.

Collection statistics:
* 7000 30-second audio clips in 22.05kHz mono WAV format
* 10 genres (700 clips from each genre), including:
** Blues
** Jazz
** Country/Western
** Baroque
** Classical
** Romantic
** Electronica
** Hip-Hop
** Rock
** HardRock/Metal

=== Audio Latin Music Genre Classification ===
This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of Latin popular and dance music, sourced from Brazil and hand labeled by music experts). Carlos Silla's (cns2 (at) kent (dot) ac (dot) uk) Latin popular and dance music dataset [http://ismir2008.ismir.net/papers/ISMIR2008_106.pdf] will be re-used. This collection is likely to contain a greater number of styles of music that will be differentiated by rhythmic characteristics than the MIREX 2007 dataset.

Collection statistics:
* 3,227 audio files in 22.05kHz mono WAV format
* 10 Latin music genres, including:
** Axe
** Bachata
** Bolero
** Forro
** Gaucha
** Merengue
** Pagode
** Sertaneja
** Tango

=== Audio Mood Classification ===
This dataset requires algorithms to classify music audio according to the mood of the track (drawn from a collection of production msuic sourced from the APM collection [http://www.apmmusic.com]). The MIREX 2007 Mood Classification dataset [http://ismir2008.ismir.net/papers/ISMIR2008_263.pdf] will be re-used.

Collection statistics:
* 600 30 second audio clips in 22.05kHz mono WAV format selected from the APM collection [http://www.apmmusic.com], and labeled by human judges using the Evalutron6000 system.
* 5 mood categories [http://ismir2007.ismir.net/proceedings/ISMIR2007_p067_hu.pdf] each of which contains 120 clips:
**Cluster_1: passionate, rousing, confident,boisterous, rowdy
**Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured
**Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding
**Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry
**Cluster_5: aggressive, fiery,tense/anxious, intense, volatile,visceral

== Audio Formats ==
For all datasets, participating algorithms will have to read audio in the following format:

* Sample rate: 22 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV

== Evaluation ==
This section first describes evaluation methods common to all the datasets, then specifies settings unique to each of the tasks.

Participating algorithms will be evaluated with 3-fold cross validation. For '''Artist Identification''' and '''Classical Composer Classification''', album filtering will be used the test and training splits, i.e. training and test sets will contain tracks from different albums; for '''US Pop Genre Classification''' and '''Latin Genre Classification''', artist filtering will be used the test and training splits, i.e. training and test sets will contain different artists.

The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.

Classification accuracies will be tested for statistically significant differences using Friedman's Anova with Tukey-Kramer honestly significant difference (HSD) tests for multiple comparisons. This test will be used to rank the algorithms and to group them into sets of equivalent performance.

In addition computation times for feature extraction and training/classification will be measured.

== Submission Format ==
=== File I/O Format ===
The audio files to be used in these tasks will be specified in a simple ASCII list file. The formats for the list files are specified below:

==== Feature extraction list file ====
The list file passed for feature extraction will be a simple ASCII list file. This file will contain one path per line with no header line.
I.e.
<example path and filename>

E.g.
/path/to/track1.wav
/path/to/track2.wav
...

==== Training list file ====
The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the class (artist, genre or mood) label, again with no header line.

I.e.
<example path and filename>\t<class label>

E.g.
/path/to/track1.wav rock
/path/to/track2.wav blues
...

==== Test (classification) list file ====
The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.

I.e.
<example path and filename>

E.g.
/path/to/track1.wav
/path/to/track2.wav
...

==== Classification output file ====
Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.

I.e.
<example path and filename>\t<class label>

E.g.
/path/to/track1.wav classical
/path/to/track2.wav blues
...

=== Submission calling formats ===
Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.

Hence, participants should provide two executables or command line parameters for a single executable to run the two separate processes.

Executables will have to accept the paths to the aforementioned list files as command line parameters.

Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique file names will be assigned to each audio track.

==== Example submission calling formats ====

extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt

extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
Train.sh /path/to/scratch/folder /path/to/trainListFile.txt
Classify.sh /path/to/testListFile.txt /path/to/outputListFile.txt

myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt
myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt
myAlgo.sh -classify /path/to/testListFile.txt /path/to/outputListFile.txt

Multi-processor compute nodes will be used to run this task, however, we ask that submissions use no more than 4 cores (as we will be running a lot of submissions and will need to run some in parallel). Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 1, 2 or 4 thread/core configurations.

extractFeatures.sh -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
TrainAndClassify.sh -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt

myAlgo.sh -extract -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
myAlgo.sh -TrainAndClassify -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt

=== Packaging submissions ===

* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
* Be sure to follow the [[2006:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2010 Submission Instructions]]

All submissions should include a README file including the following the information:

* Command line calling format for all executables including examples
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
* Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
* Any special notice regarding to running your algorithm

Note that the information that you place in the README file is '''extremely''' important in ensuring that your submission is evaluated properly.

=== Time and hardware limits ===
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 24 hours will be imposed on feature extraction times.

A hard limit of 48 hours will be imposed on the 3 training/classification cycles, leading to a total runtime limit of 72 hours for each submission.

=== Submission opening date ===

Friday 4th June 2010

=== Submission closing date ===

TBA

== Links to Previous MIREX Runs of These Classification Tasks ==

=== Audio Artist Identification ===
[[2009:Audio Artist Identification|Artist Identification in MIREX 2009]] || [[2009:Audio Classical Composer Identification Results|Results(Classical Composer)]]

[[2008:Audio Artist Identification|Artist Identification in MIREX 2008]] || [[2008:Audio Classical Composer Identification Results|Results(Classical Composer)]] || [[2008:Audio_Artist_Identification_Results|Results(Artist Identification)]]

[[2007:Audio_Artist_Identification|Artist Identification in MIREX 2007]] || [[2007:Audio_Artist_Identification_Results|Results]]

[[2007:Audio_Classical_Composer_Identification|Classical Composer Identification in MIREX 2007]] || [[2007:Audio_Classical_Composer_Identification_Results|Results]]

[[2005:Audio_Artist_Identification|Artist Identification in MIREX 2005]] || [https://www.music-ir.org/evaluation/mirex-results/audio-artist/index.html Results]

[http://ismir2004.ismir.net/genre_contest/index.htm Audio Artist Identification in ISMIR2004 Audio Description Contest]

=== Audio Genre Classification ===
[[2009:Audio_Genre_Classification|Audio Genre Classification in MIREX 2009]] || [[2009:Audio_Genre_Classification_(Latin_Set)_Results|Results(Latin Set)]] || [[2009:Audio_Genre_Classification_(Mixed_Set)_Results|Results(Mixed Set)]]

[[2008:Audio_Genre_Classification|Audio Genre Classification in MIREX 2008]] || [[2008:Audio_Genre_Classification_Results|Results]]

[[2007:Audio_Genre_Classification|Audio Genre Classification in MIREX 2007]] || [[2007:Audio_Genre_Classification_Results|Results]]

[[2005:Audio_Genre_Classification|Audio Genre Classification in MIREX 2005]] || [https://www.music-ir.org/evaluation/mirex-results/audio-genre/index.html Results]

[http://ismir2004.ismir.net/genre_contest/index.htm Audio Artist Identification in ISMIR2004 Audio Description Contest]

=== Audio Mood Classification ===
[[2009:Audio_Music_Mood_Classification|Audio Mood Classification in MIREX 2009]] || [[2009:Audio_Music_Mood_Classification_Results|Results]]

[[2008:Audio_Music_Mood_Classification|Audio Mood Classification in MIREX 2008]] || [[2008:Audio_Music_Mood_Classification_Results|Results]]

[[2007:Audio_Music_Mood_Classification|Audio Mood Classification in MIREX 2007]] || [[2007:Audio_Music_Mood_Classification_Results|Results]]

2010:Audio Classification (Train/Test) Tasks

2010-06-03T15:46:54Z

Kriswest:

== Description ==
Many tasks in music classification can be characterized into a two-stage process: training classification models using labeled data and testing the models using new/unseen data. Therefore, we propose this "meta" task which includes various audio classification tasks that follow this Train/Test process. For MIREX 2010, five classification sub-tasks are included:

*Audio Artist Identification
*Audio Classical Composer Identification
*Audio US Pop Music Genre Classification
*Audio Latin Music Genre Classification
*Audio Mood Classification

All five classification tasks were conducted in previous MIREX runs (please see [[#Links to Previous MIREX Runs of These Classification Tasks]]). This page presents the evaluation of these tasks, including the datasets as well as the submission rules and formats.

=== Task specific mailing list ===
A specific mailing list is provided for the discussion of this task and related tasks ( [[2010:Audio Classification (Test/Train) tasks]], [[2010:Audio_Cover_Song_Identification]], [[2010:Audio_Tag_Classification]], [[2010:Audio_Music_Similarity_and_Retrieval]]) at: [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00 https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00]. If you wish to participate in any of these tasks please sign up to this mailing listas discussion of the task format and evaluation should be conducted there.

== Data ==

=== Audio Artist Identification ===
This dataset requires algorithms to classify music audio according to the performing artist. The collection used at MIREX 2009 will be re-used.

Collection statistics:
* 3150 30-second 22.05kHz mono wav audio clips drawn from a collection US Pop music.
* 105 artists (30 clips per artist drawn from 3 albums).

=== Audio Classical Composer Identification ===
This dataset requires algorithms to classify music audio according to the composer of the track (drawn from a collection of performances of a variety of classical music genres). The collection used at MIREX 2009 will be re-used.

Collection statistics:
* 2772 30-second 22.05 kHz mono wav clips
* 11 "classical" composers (252 clips per composer), including:
** Bach
** Beethoven
** Brahms
** Chopin
** Dvorak
** Handel
** Haydn
** Mendelssohn
** Mozart
** Schubert
** Vivaldi

=== Audio US Pop Music Genre Classification ===
This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of US Pop music tracks). The MIREX 2007 Genre dataset will be re-used, which was drawn from the USPOP 2002 and USCRAP collections.

Collection statistics:
* 7000 30-second audio clips in 22.05kHz mono WAV format
* 10 genres (700 clips from each genre), including:
** Blues
** Jazz
** Country/Western
** Baroque
** Classical
** Romantic
** Electronica
** Hip-Hop
** Rock
** HardRock/Metal

=== Audio Latin Music Genre Classification ===
This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of Latin popular and dance music, sourced from Brazil and hand labeled by music experts). Carlos Silla's (cns2 (at) kent (dot) ac (dot) uk) Latin popular and dance music dataset [http://ismir2008.ismir.net/papers/ISMIR2008_106.pdf] will be re-used. This collection is likely to contain a greater number of styles of music that will be differentiated by rhythmic characteristics than the MIREX 2007 dataset.

Collection statistics:
* 3,227 audio files in 22.05kHz mono WAV format
* 10 Latin music genres, including:
** Axe
** Bachata
** Bolero
** Forro
** Gaucha
** Merengue
** Pagode
** Sertaneja
** Tango

=== Audio Mood Classification ===
This dataset requires algorithms to classify music audio according to the mood of the track (drawn from a collection of production msuic sourced from the APM collection [http://www.apmmusic.com]). The MIREX 2007 Mood Classification dataset [http://ismir2008.ismir.net/papers/ISMIR2008_263.pdf] will be re-used.

Collection statistics:
* 600 30 second audio clips in 22.05kHz mono WAV format selected from the APM collection [http://www.apmmusic.com], and labeled by human judges using the Evalutron6000 system.
* 5 mood categories [http://ismir2007.ismir.net/proceedings/ISMIR2007_p067_hu.pdf] each of which contains 120 clips:
**Cluster_1: passionate, rousing, confident,boisterous, rowdy
**Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured
**Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding
**Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry
**Cluster_5: aggressive, fiery,tense/anxious, intense, volatile,visceral

== Audio Formats ==
For all datasets, participating algorithms will have to read audio in the following format:

* Sample rate: 22 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV

== Evaluation ==
This section first describes evaluation methods common to all the datasets, then specifies settings unique to each of the tasks.

Participating algorithms will be evaluated with 3-fold cross validation. For '''Artist Identification''' and '''Classical Composer Classification''', album filtering will be used the test and training splits, i.e. training and test sets will contain tracks from different albums; for '''US Pop Genre Classification''' and '''Latin Genre Classification''', artist filtering will be used the test and training splits, i.e. training and test sets will contain different artists.

The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.

Classification accuracies will be tested for statistically significant differences using Friedman's Anova with Tukey-Kramer honestly significant difference (HSD) tests for multiple comparisons. This test will be used to rank the algorithms and to group them into sets of equivalent performance.

In addition computation times for feature extraction and training/classification will be measured.

== Submission Format ==
=== File I/O Format ===
The audio files to be used in these tasks will be specified in a simple ASCII list file. The formats for the list files are specified below:

==== Feature extraction list file ====
The list file passed for feature extraction will be a simple ASCII list file. This file will contain one path per line with no header line.
I.e.
<example path and filename>

E.g.
/path/to/track1.wav
/path/to/track2.wav
...

==== Training list file ====
The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the class (artist, genre or mood) label, again with no header line.

I.e.
<example path and filename>\t<class label>

E.g.
/path/to/track1.wav rock
/path/to/track2.wav blues
...

==== Test (classification) list file ====
The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.

I.e.
<example path and filename>

E.g.
/path/to/track1.wav
/path/to/track2.wav
...

==== Classification output file ====
Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.

I.e.
<example path and filename>\t<class label>

E.g.
/path/to/track1.wav classical
/path/to/track2.wav blues
...

=== Submission calling formats ===
Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.

Hence, participants should provide two executables or command line parameters for a single executable to run the two separate processes.

Executables will have to accept the paths to the aforementioned list files as command line parameters.

Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique file names will be assigned to each audio track.

==== Example submission calling formats ====

extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt

extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
Train.sh /path/to/scratch/folder /path/to/trainListFile.txt
Classify.sh /path/to/testListFile.txt /path/to/outputListFile.txt

myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt
myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt
myAlgo.sh -classify /path/to/testListFile.txt /path/to/outputListFile.txt

Multi-processor compute nodes will be used to run this task, however, we ask that submissions use no more than 4 cores (as we will be running a lot of submissions and will need to run some in parallel). Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 1, 2 or 4 thread/core configurations.

extractFeatures.sh -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
TrainAndClassify.sh -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt

myAlgo.sh -extract -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
myAlgo.sh -TrainAndClassify -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt

=== Packaging submissions ===

* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
* Be sure to follow the [[2006:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2010 Submission Instructions]]

All submissions should include a README file including the following the information:

* Command line calling format for all executables including examples
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
* Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
* Any special notice regarding to running your algorithm

Note that the information that you place in the README file is '''extremely''' important in ensuring that your submission is evaluated properly.

=== Time and hardware limits ===
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 24 hours will be imposed on feature extraction times.

A hard limit of 48 hours will be imposed on the 3 training/classification cycles, leading to a total runtime limit of 72 hours for each submission.

=== Submission opening date ===

TBA

=== Submission closing date ===

TBA

== Links to Previous MIREX Runs of These Classification Tasks ==

=== Audio Artist Identification ===
[[2009:Audio Artist Identification|Artist Identification in MIREX 2009]] || [[2009:Audio Classical Composer Identification Results|Results(Classical Composer)]]

[[2008:Audio Artist Identification|Artist Identification in MIREX 2008]] || [[2008:Audio Classical Composer Identification Results|Results(Classical Composer)]] || [[2008:Audio_Artist_Identification_Results|Results(Artist Identification)]]

[[2007:Audio_Artist_Identification|Artist Identification in MIREX 2007]] || [[2007:Audio_Artist_Identification_Results|Results]]

[[2007:Audio_Classical_Composer_Identification|Classical Composer Identification in MIREX 2007]] || [[2007:Audio_Classical_Composer_Identification_Results|Results]]

[[2005:Audio_Artist_Identification|Artist Identification in MIREX 2005]] || [https://www.music-ir.org/evaluation/mirex-results/audio-artist/index.html Results]

[http://ismir2004.ismir.net/genre_contest/index.htm Audio Artist Identification in ISMIR2004 Audio Description Contest]

=== Audio Genre Classification ===
[[2009:Audio_Genre_Classification|Audio Genre Classification in MIREX 2009]] || [[2009:Audio_Genre_Classification_(Latin_Set)_Results|Results(Latin Set)]] || [[2009:Audio_Genre_Classification_(Mixed_Set)_Results|Results(Mixed Set)]]

[[2008:Audio_Genre_Classification|Audio Genre Classification in MIREX 2008]] || [[2008:Audio_Genre_Classification_Results|Results]]

[[2007:Audio_Genre_Classification|Audio Genre Classification in MIREX 2007]] || [[2007:Audio_Genre_Classification_Results|Results]]

[[2005:Audio_Genre_Classification|Audio Genre Classification in MIREX 2005]] || [https://www.music-ir.org/evaluation/mirex-results/audio-genre/index.html Results]

[http://ismir2004.ismir.net/genre_contest/index.htm Audio Artist Identification in ISMIR2004 Audio Description Contest]

=== Audio Mood Classification ===
[[2009:Audio_Music_Mood_Classification|Audio Mood Classification in MIREX 2009]] || [[2009:Audio_Music_Mood_Classification_Results|Results]]

[[2008:Audio_Music_Mood_Classification|Audio Mood Classification in MIREX 2008]] || [[2008:Audio_Music_Mood_Classification_Results|Results]]

[[2007:Audio_Music_Mood_Classification|Audio Mood Classification in MIREX 2007]] || [[2007:Audio_Music_Mood_Classification_Results|Results]]

2010:Audio Music Similarity and Retrieval

2010-06-02T14:39:18Z

Kriswest: /* Distance matrix output files */

== Description ==
As the size of digitial music collections grow, music similarity has an increasingly important role as an aid to music discovery. A music similarity system can help a music consumer find new music by finding the music that is most musically similar to specific query songs (or is nearest to songs that the consumer already likes).

This page presents the Audio Music Similarity Evaluation, including the submission rules and formats. Additionally background information can be found here that should help explain some of the reasoning behind the approach taken in the evaluation. The intention of the Music Audio Search track is to evaluate music similarity searches (A music search engine that takes a single song as a query aka Query-by-example), not playlist generation or music recommendation.

=== Task specific mailing list ===
A specific mailing list is provided for the discussion of this task and related tasks ( [[2010:Audio Classification (Test/Train) tasks]], [[2010:Audio_Cover_Song_Identification]], [[2010:Audio_Tag_Classification]], [[2010:Audio_Music_Similarity_and_Retrieval]]) at: [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00 https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00]. If you wish to participate in any of these tasks please sign up to this mailing listas discussion of the task format and evaluation should be conducted there.

== Data ==
Collection statistics: 7000 30-second audio clips drawn from 10 genres (700 clips from each genre).

The Genres that data was drawn from are:
*Blues
*Jazz
*Country/Western
*Baroque
*Classical
*Romantic
*Electronica
*Hip-Hop
*Rock
*HardRock/Metal

=== Audio formats ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 22 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV
* clip length: 30 secs from the middle of each file

== Evaluation ==
Two distinct evaluations will be performed
* Human Evaluation
* Objective statistics derived from the results lists

Note that at MIREX 2006 particpating algorithms were required to return full distance matrices showing the distance between all tracks, however, in subsequent years we have also supported sparse distance matrix format (detailed below) where only the distances of the top 100 results for each query in the collection are returned.

=== Human Evaluation ===
The primary evaluation will involve subjective judgments by human evaluators of the retrieved sets using IMIRSEL's Evalutron 6000 system. This year algorithms will be presented with the same 30 second preview clip that will be reviewed by the human evaluators.

* Evaluator question: Given a search based on track A, the following set of results was returned by all systems. Please place each returned track into one of three classes (not similar, somewhat similar, very similar) and provide an inidcation on a continuous scale of 0 - 10 of high similar the track is to the query.
* ~120 randomly selected queries, 5 results per query, 1 set of eyes, ~10 participating labs
* Higher number of queries preferred as IR research indicates variance is in queries
* The songs by the same artist as the query will be filtered out of each result list (artist-filtering) to avoid colouring an evaluators judgement (a cover song or song by the same artist in a result list is likely to reduce the relative ranking of other similar but independent songs - use of songs by the same artist may allow over-fitting to affect the results)
* It will be possible for researchers to use this data for other types of system comparisons after MIREX 2007 results have been finalized.
* Human evaluation to be designed and led by IMIRSEL following a similar format to that used at MIREX 2006
* Human evaluators will be drawn from the participating labs (and any volunteers from IMIRSEL or on the MIREX lists)

=== Objective Statistics derived from the distance matrix ===
Statistics of each distance matrix will be calculated including:

* Average % of Genre, Artist and Album matches in the top 5, 10, 20 & 50 results - Precision at 5, 10, 20 & 50
* Average % of Genre matches in the top 5, 10, 20 & 50 results after artist filtering of results
* Average % of available Genre, Artist and Album matches in the top 5, 10, 20 & 50 results - Recall at 5, 10, 20 & 50 (just normalising scores when less than 20 matches for an artist, album or genre are available in the database)
* Always similar - Maximum # times a file was in the top 5, 10, 20 & 50 results
* % File never similar (never in a top 5, 10, 20 & 50 result list)
* % of 'test-able' song triplets where triangular inequality holds
** Note that as we are not requiring full distance matrices this year we will only be testing triangles that are found in the sparse distance matrix.
* Plot of the "number of times similar curve" - plot of song number vs. number of times it appeared in a top 20 list with songs sorted according to number times it appeared in a top 20 list (to produce the curve). Systems with a sharp rise at the end of this plot have "hubs", while a long 'zero' tail shows many never similar results.

=== Runtimes ===
In addition computation times for feature extraction/Index-building and querying
will be measured.

== Submission format ==
Submission to this task will have to conform to a specified format detailed below.

=== Implementation details ===
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.

Submissions will have to output either a full distance matrix or a search results file with the top 100 search results for each track in the collection. This list of results will be used to extract the artist-filtered results to present to the human evaluators and will facilitate the computation of the objective statistics.

=== I/O formats ===
In this section the input and output files used in this task are described as
are the command line calling format requirements for submissions.

==== Audio collection list file (input)====
The list file passed for feature extraction and indexing will be a simple ASCII list file. This file will contain one path per line with no header line, all paths will be absolute (full paths).

e.g.

/aDirectory/collectionFolder/b002342.wav
/aDirectory/collectionFolder/a005921.wav
...

==== Distance matrix output files ====
Participants should return one of two available output file formats, a full distance matrix or a sparse distance matrix. The sparse distance matrix format is preferred (as the dense distance matrices can be very large).

===== Sparse Distance Matrix =====
If computation or exhaustive search is a concern or not a normal output of the indexing algorithm employed, the sparse distance matric format detailed below may be used:

A simple ASCII file listing a name for the algorithm and the top 100 search results for every track in the collection.

This file should start with a header line with a name for the algorithm and should be followed by the results for one query per line, prefixed by the filename portion of the query path. This should be followed by a tab character and a tab separated, ordered list of the top 100 search results. Each result should include the result filename (e.g. a034728.wav) and the distance (e.g. 17.1 or 0.23) separated by a a comma.

<pre>
MyAlgorithm (my.email@address.com)
<example 1 filename>\t<result 1 name>,<result 1 distance>,\t<result 2 name>,<result 2 distance>, ... \t<result 100 name>,<result 100 distance>
<example 2 filename>\t<result 1 name>,<result 1 distance>,\t<result 2 name>,<result 2 distance>, ... \t<result 100 name>,<result 100 distance>
...
</pre>

which might look like:

<pre>
MyAlgorithm (my.email@address.com)
a009342.wav b229311.wav,0.16 a023821.wav,0.19 a001329,0.24 ... etc.
a009343.wav a661931.wav,0.12 a043322.wav,0.17 c002346,0.21 ... etc.
a009347.wav a671239.wav,0.13 c112393.wav,0.20 b083293,0.25 ... etc.
...
</pre>

The path to which this list file should be written must be accepted as a parameter on the command line.

===== Full Distance Matrix =====
Full distance matrix files should be generated in the the following format:

* A simple ASCII file listing a name for the algorithm on the first line,
* Numbered paths for each file appearing in the matrix, these can be in any order (i.e. the files don't have to be i the same order as they appeared in the list file) but should index into the columns/rows of of the distance matrix.
* A line beginning with 'Q/R' followed by a tab and tab separated list of the numbers 1 to N, where N is the files covered by the matrix.
* One line per file in the matrix give the distances of that files to each other file in the matrix. All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a single tab character. Obviously the diagonal of the matrix (distance or a track to itself) should be zero.

<pre>
Distance matrix header text with system name
1\t</path/to/audio/file/1.wav>
2\t</path/to/audio/file/2.wav>
3\t</path/to/audio/file/3.wav>
...
N\t</path/to/audio/file/N.wav>
Q/R\t1\t2\t3\t...\tN
1\t0.0\t<dist 1 to 2>\t<dist 1 to 3>\t...\t<dist 1 to N>
2\t<dist 2 to 1>\t0.0\t<dist 2 to 3>\t...\t<dist 2 to N>
3\t<dist 3 to 2>\t<dist 3 to 2>\t0.0\t...\t<dist 3 to N>
...\t...\t...\t...\t...\t...
N\t<dist N to 1>\t<dist N to 2>\t<dist N to 3>\t...\t0.0
</pre>

which might look like:

<pre>
Example distance matrix 0.1
1 /path/to/audio/file/1.wav
2 /path/to/audio/file/2.wav
3 /path/to/audio/file/3.wav
4 /path/to/audio/file/4.wav
Q/R 1 2 3 4
1 0.00000 1.24100 0.2e-4 0.42559
2 1.24100 0.00000 0.62640 0.23564
3 50.2e-4 0.62640 0.00000 0.38000
4 0.42559 0.23567 0.38000 0.00000
</pre>

==== Example submission calling formats ====
extractFeatures.sh /path/to/scratch/folder /path/to/collectionListFile.txt
Query.sh /path/to/scratch/folder /path/to/collectionListFile.txt /path/to/outputResultsFile.txt

or

doAudioSim.sh -numThreads 8 /path/to/scratch/folder /path/to/collectionListFile.txt /path/to/outputResultsFile.txt

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of
dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the
information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks,
hard limits on the runtime of submissions are specified.

A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

== Submission opening date ==
TBA

== Submission closing date ==
TBA

2009:Music Structure Segmentation Results

2010-06-01T20:41:01Z

Kriswest: /* Individual Participant Results */

==Introduction==
This task concerns itself with analyzing the structure of music audio files, and labeling the corresponding segments, e.g. {verse, chorus, bridge, etc}, {A, B, C, etc.}. A more detailed description can be found at the task page [[2009:Structural_Segmentation]]. The dataset consists of 297 popular music songs.

===General Legend===
====Team ID====

'''ANO1''' = [https://www.music-ir.org/mirex/abstracts/2009/Ano.pdf Anonymous] 
'''ANO2''' = [https://www.music-ir.org/mirex/abstracts/2009/Ano.pdf Anonymous] 
'''MND''' = [https://www.music-ir.org/mirex/abstracts/2009/ACD_SS_mauch.pdf Matthias Mauch, Katy Noland, Simon Dixon] 
'''PK''' = [https://www.music-ir.org/mirex/abstracts/2009/PK.pdf Jouni Paulus, Anssi Klapuri] 
'''GP''' = [https://www.music-ir.org/mirex/abstracts/2009/Peeters_2009_MIREX_structure.pdf Geoffroy Peeters ] 

====Evaluation Measures====
'''overSegScore''' - normalised conditional entropy based over-segmentation score, S_o from [http://ismir2008.ismir.net/papers/ISMIR2008_219.pdf Lukashevich ISMIR2008] 
'''underSegScore''' - normalised conditional entropy based under-segmentation score, S_u from [http://ismir2008.ismir.net/papers/ISMIR2008_219.pdf Lukashevich ISMIR2008] 
'''pwF''' - frame pair clustering F-measure from [http://dx.doi.org/10.1109/TASL.2007.910781 Levy & Sandler TASLP2008] 
'''pwPrecision''' - frame pair clustering precision rate from [http://dx.doi.org/10.1109/TASL.2007.910781 Levy & Sandler TASLP2008] 
'''pwRecall''' - frame pair clustering recall rate from [http://dx.doi.org/10.1109/TASL.2007.910781 Levy & Sandler TASLP2008] 
'''R''' - Rand clustering index from [http://www.springerlink.com/content/x64124718341j1j0/fulltext.pdf Hubert & Arabie, "Comparing partitions", Journal of Classification, 1985] 
'''Fmeasure@[0.5, 3]s''' - segment boundary recovery evaluation measure. claimed boundary is accepted if it is within the specified window length from a true boundary, overall F-measure for boundary recovery 
'''precRate@[0.5, 3]s''' - segment boundary recovery precision rate 
'''recRate@[0.5, 3]s''' - segment boundary recovery recall rate 
'''medianTrue2claim''' - median distance from an annotated segment boundary to the closest found boundary, seconds 
'''medianClaim2true''' - median distance from a found segment boundary to the closest annotated one, seconds 

The calculation of the measures is described in [[2009:Structural_Segmentation#Evaluation_Measures]].

===MIREX 2009 Music Structure Summary Results - Mean of all Measures===

<csv>2009/structure/structure.summary.csv</csv>

===MIREX 2009 Music Structure Summary Runtime Data===
<csv>2009/structure/structure.runtime.csv</csv>

===Individual Participant Results===
*[[2009:Music_Structure_Segmentation_Results:_AN01]]
*[[2009:Music_Structure_Segmentation_Results:_AN02]]
*[[2009:Music_Structure_Segmentation_Results:_GP]]
*[[2009:Music_Structure_Segmentation_Results:_MND]]
*[[2009:Music_Structure_Segmentation_Results:_PK]]

2009:Music Structure Segmentation Results

2010-06-01T20:40:39Z

Kriswest: /* Individual Participant Results */

==Introduction==
This task concerns itself with analyzing the structure of music audio files, and labeling the corresponding segments, e.g. {verse, chorus, bridge, etc}, {A, B, C, etc.}. A more detailed description can be found at the task page [[2009:Structural_Segmentation]]. The dataset consists of 297 popular music songs.

===General Legend===
====Team ID====

'''ANO1''' = [https://www.music-ir.org/mirex/abstracts/2009/Ano.pdf Anonymous] 
'''ANO2''' = [https://www.music-ir.org/mirex/abstracts/2009/Ano.pdf Anonymous] 
'''MND''' = [https://www.music-ir.org/mirex/abstracts/2009/ACD_SS_mauch.pdf Matthias Mauch, Katy Noland, Simon Dixon] 
'''PK''' = [https://www.music-ir.org/mirex/abstracts/2009/PK.pdf Jouni Paulus, Anssi Klapuri] 
'''GP''' = [https://www.music-ir.org/mirex/abstracts/2009/Peeters_2009_MIREX_structure.pdf Geoffroy Peeters ] 

====Evaluation Measures====
'''overSegScore''' - normalised conditional entropy based over-segmentation score, S_o from [http://ismir2008.ismir.net/papers/ISMIR2008_219.pdf Lukashevich ISMIR2008] 
'''underSegScore''' - normalised conditional entropy based under-segmentation score, S_u from [http://ismir2008.ismir.net/papers/ISMIR2008_219.pdf Lukashevich ISMIR2008] 
'''pwF''' - frame pair clustering F-measure from [http://dx.doi.org/10.1109/TASL.2007.910781 Levy & Sandler TASLP2008] 
'''pwPrecision''' - frame pair clustering precision rate from [http://dx.doi.org/10.1109/TASL.2007.910781 Levy & Sandler TASLP2008] 
'''pwRecall''' - frame pair clustering recall rate from [http://dx.doi.org/10.1109/TASL.2007.910781 Levy & Sandler TASLP2008] 
'''R''' - Rand clustering index from [http://www.springerlink.com/content/x64124718341j1j0/fulltext.pdf Hubert & Arabie, "Comparing partitions", Journal of Classification, 1985] 
'''Fmeasure@[0.5, 3]s''' - segment boundary recovery evaluation measure. claimed boundary is accepted if it is within the specified window length from a true boundary, overall F-measure for boundary recovery 
'''precRate@[0.5, 3]s''' - segment boundary recovery precision rate 
'''recRate@[0.5, 3]s''' - segment boundary recovery recall rate 
'''medianTrue2claim''' - median distance from an annotated segment boundary to the closest found boundary, seconds 
'''medianClaim2true''' - median distance from a found segment boundary to the closest annotated one, seconds 

The calculation of the measures is described in [[2009:Structural_Segmentation#Evaluation_Measures]].

===MIREX 2009 Music Structure Summary Results - Mean of all Measures===

<csv>2009/structure/structure.summary.csv</csv>

===MIREX 2009 Music Structure Summary Runtime Data===
<csv>2009/structure/structure.runtime.csv</csv>

===Individual Participant Results===
*[[2009:Music_Structure_Segmentation_Results:_AN01]]
*[[Music_Structure_Segmentation_Results:_AN02]]
*[[Music_Structure_Segmentation_Results:_GP]]
*[[Music_Structure_Segmentation_Results:_MND]]
*[[Music_Structure_Segmentation_Results:_PK]]

2010:Audio Music Similarity and Retrieval

2010-05-28T22:10:28Z

Kriswest: /* Sparse Distance Matrix */

== Description ==
As the size of digitial music collections grow, music similarity has an increasingly important role as an aid to music discovery. A music similarity system can help a music consumer find new music by finding the music that is most musically similar to specific query songs (or is nearest to songs that the consumer already likes).

This page presents the Audio Music Similarity Evaluation, including the submission rules and formats. Additionally background information can be found here that should help explain some of the reasoning behind the approach taken in the evaluation. The intention of the Music Audio Search track is to evaluate music similarity searches (A music search engine that takes a single song as a query aka Query-by-example), not playlist generation or music recommendation.

=== Task specific mailing list ===
A specific mailing list is provided for the discussion of this task and related tasks ( [[2010:Audio Classification (Test/Train) tasks]], [[2010:Audio_Cover_Song_Identification]], [[2010:Audio_Tag_Classification]], [[2010:Audio_Music_Similarity_and_Retrieval]]) at: [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00 https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00]. If you wish to participate in any of these tasks please sign up to this mailing listas discussion of the task format and evaluation should be conducted there.

== Data ==
Collection statistics: 7000 30-second audio clips drawn from 10 genres (700 clips from each genre).

The Genres that data was drawn from are:
*Blues
*Jazz
*Country/Western
*Baroque
*Classical
*Romantic
*Electronica
*Hip-Hop
*Rock
*HardRock/Metal

=== Audio formats ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 22 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV
* clip length: 30 secs from the middle of each file

== Evaluation ==
Two distinct evaluations will be performed
* Human Evaluation
* Objective statistics derived from the results lists

Note that at MIREX 2006 particpating algorithms were required to return full distance matrices showing the distance between all tracks, however, in subsequent years we have also supported sparse distance matrix format (detailed below) where only the distances of the top 100 results for each query in the collection are returned.

=== Human Evaluation ===
The primary evaluation will involve subjective judgments by human evaluators of the retrieved sets using IMIRSEL's Evalutron 6000 system. This year algorithms will be presented with the same 30 second preview clip that will be reviewed by the human evaluators.

* Evaluator question: Given a search based on track A, the following set of results was returned by all systems. Please place each returned track into one of three classes (not similar, somewhat similar, very similar) and provide an inidcation on a continuous scale of 0 - 10 of high similar the track is to the query.
* ~120 randomly selected queries, 5 results per query, 1 set of eyes, ~10 participating labs
* Higher number of queries preferred as IR research indicates variance is in queries
* The songs by the same artist as the query will be filtered out of each result list (artist-filtering) to avoid colouring an evaluators judgement (a cover song or song by the same artist in a result list is likely to reduce the relative ranking of other similar but independent songs - use of songs by the same artist may allow over-fitting to affect the results)
* It will be possible for researchers to use this data for other types of system comparisons after MIREX 2007 results have been finalized.
* Human evaluation to be designed and led by IMIRSEL following a similar format to that used at MIREX 2006
* Human evaluators will be drawn from the participating labs (and any volunteers from IMIRSEL or on the MIREX lists)

=== Objective Statistics derived from the distance matrix ===
Statistics of each distance matrix will be calculated including:

* Average % of Genre, Artist and Album matches in the top 5, 10, 20 & 50 results - Precision at 5, 10, 20 & 50
* Average % of Genre matches in the top 5, 10, 20 & 50 results after artist filtering of results
* Average % of available Genre, Artist and Album matches in the top 5, 10, 20 & 50 results - Recall at 5, 10, 20 & 50 (just normalising scores when less than 20 matches for an artist, album or genre are available in the database)
* Always similar - Maximum # times a file was in the top 5, 10, 20 & 50 results
* % File never similar (never in a top 5, 10, 20 & 50 result list)
* % of 'test-able' song triplets where triangular inequality holds
** Note that as we are not requiring full distance matrices this year we will only be testing triangles that are found in the sparse distance matrix.
* Plot of the "number of times similar curve" - plot of song number vs. number of times it appeared in a top 20 list with songs sorted according to number times it appeared in a top 20 list (to produce the curve). Systems with a sharp rise at the end of this plot have "hubs", while a long 'zero' tail shows many never similar results.

=== Runtimes ===
In addition computation times for feature extraction/Index-building and querying
will be measured.

== Submission format ==
Submission to this task will have to conform to a specified format detailed below.

=== Implementation details ===
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.

Submissions will have to output either a full distance matrix or a search results file with the top 100 search results for each track in the collection. This list of results will be used to extract the artist-filtered results to present to the human evaluators and will facilitate the computation of the objective statistics.

=== I/O formats ===
In this section the input and output files used in this task are described as
are the command line calling format requirements for submissions.

==== Audio collection list file (input)====
The list file passed for feature extraction and indexing will be a simple ASCII list file. This file will contain one path per line with no header line, all paths will be absolute (full paths).

e.g.

/aDirectory/collectionFolder/b002342.wav
/aDirectory/collectionFolder/a005921.wav
...

==== Distance matrix output files ====
Participants should return one of two available output file formats, a full distance matrix or a sparse distance matrix.

===== Full Distance Matrix =====
Full distance matrix files should be generated in the the following format:

* A simple ASCII file listing a name for the algorithm on the first line,
* Numbered paths for each file appearing in the matrix, these can be in any order (i.e. the files don't have to be i the same order as they appeared in the list file) but should index into the columns/rows of of the distance matrix.
* A line beginning with 'Q/R' followed by a tab and tab separated list of the numbers 1 to N, where N is the files covered by the matrix.
* One line per file in the matrix give the distances of that files to each other file in the matrix. All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a single tab character. Obviously the diagonal of the matrix (distance or a track to itself) should be zero.

<pre>
Distance matrix header text with system name
1\t</path/to/audio/file/1.wav>
2\t</path/to/audio/file/2.wav>
3\t</path/to/audio/file/3.wav>
...
N\t</path/to/audio/file/N.wav>
Q/R\t1\t2\t3\t...\tN
1\t0.0\t<dist 1 to 2>\t<dist 1 to 3>\t...\t<dist 1 to N>
2\t<dist 2 to 1>\t0.0\t<dist 2 to 3>\t...\t<dist 2 to N>
3\t<dist 3 to 2>\t<dist 3 to 2>\t0.0\t...\t<dist 3 to N>
...\t...\t...\t...\t...\t...
N\t<dist N to 1>\t<dist N to 2>\t<dist N to 3>\t...\t0.0
</pre>

which might look like:

<pre>
Example distance matrix 0.1
1 /path/to/audio/file/1.wav
2 /path/to/audio/file/2.wav
3 /path/to/audio/file/3.wav
4 /path/to/audio/file/4.wav
Q/R 1 2 3 4
1 0.00000 1.24100 0.2e-4 0.42559
2 1.24100 0.00000 0.62640 0.23564
3 50.2e-4 0.62640 0.00000 0.38000
4 0.42559 0.23567 0.38000 0.00000
</pre>

===== Sparse Distance Matrix =====
If computation or exhaustive search is a concern or not a normal output of the indexing algorithm employed, the sparse distance matric format detailed below may be used:

A simple ASCII file listing a name for the algorithm and the top 100 search results for every track in the collection.

This file should start with a header line with a name for the algorithm and should be followed by the results for one query per line, prefixed by the filename portion of the query path. This should be followed by a tab character and a tab separated, ordered list of the top 100 search results. Each result should include the result filename (e.g. a034728.wav) and the distance (e.g. 17.1 or 0.23) separated by a a comma.

<pre>
MyAlgorithm (my.email@address.com)
<example 1 filename>\t<result 1 name>,<result 1 distance>,\t<result 2 name>,<result 2 distance>, ... \t<result 100 name>,<result 100 distance>
<example 2 filename>\t<result 1 name>,<result 1 distance>,\t<result 2 name>,<result 2 distance>, ... \t<result 100 name>,<result 100 distance>
...
</pre>

which might look like:

<pre>
MyAlgorithm (my.email@address.com)
a009342.wav b229311.wav,0.16 a023821.wav,0.19 a001329,0.24 ... etc.
a009343.wav a661931.wav,0.12 a043322.wav,0.17 c002346,0.21 ... etc.
a009347.wav a671239.wav,0.13 c112393.wav,0.20 b083293,0.25 ... etc.
...
</pre>

The path to which this list file should be written must be accepted as a parameter on the command line.

==== Example submission calling formats ====
extractFeatures.sh /path/to/scratch/folder /path/to/collectionListFile.txt
Query.sh /path/to/scratch/folder /path/to/collectionListFile.txt /path/to/outputResultsFile.txt

or

doAudioSim.sh -numThreads 8 /path/to/scratch/folder /path/to/collectionListFile.txt /path/to/outputResultsFile.txt

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of
dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the
information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks,
hard limits on the runtime of submissions are specified.

A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

== Submission opening date ==
TBA

== Submission closing date ==
TBA

2010:Symbolic Melodic Similarity

2010-05-28T17:13:26Z

Kriswest: Light reorganisation and copy changes

== Description ==
The goal of SMS is to retrieve the most similar items from a collection of symbolic pieces, given a symbolic query, and rank them by melodic similarity. There will be only 1 task this year which comprises a set of six "base" monophonic MIDI queries to be matched against a monophonic MIDI collection.

Each system will be given a query and is asked to return the 10 most melodically similar songs from those taken from the Essen Collection (5274 pieces in the MIDI format; see [http://www.esac-data.org/ ESAC Data Homepage] for more information). For each of the six "base" queries, we have created four classes of error-mutations, thus the query set comprises the following query classes:

# No errors (i.e., "base")
# One note deleted
# One note inserted
# One interval enlarged
# One interval compressed

Each system will be asked to return the top ten items for each of the 30 total queries. That is to say, 6(base queries) X 5(versions) = 30 query/candidate lists to be returned.

== Data ==
* 5,274 tunes belonging to the Essen folksong collection. The tunes are in standard MIDI file format. [http://www.ldc.usb.ve/~cgomez/essen.tar.gz Download] (< 1 MB)

==Evaluation ==

The 2010 SMS task replicates the 2007 task. After the algorithms have been submitted, their results will be pooled for every query, and human evaluators, using the Evalutron 6000 system, will asked to judge the relevance of the matches to the queries.

For each query (and its four mutations), the returned results (candidates) from all systems will be anonymously grouped together (query set) for evaluation by the human graders. The graders will be provided with only the "base" perfect version against which to evaluate the candidates and thus did not know whether the candidates came from a perfect or mutated query. We expect that each query/candidate set will be evaluated by one individual grader. Using the Evalutron 6000 system, the graders will give each query/candidate pair two types of scores. Graders will be asked to provide one "BROAD" categorical score with three categories: NS,SS,VS as explained below, and one "FINE" score (in the range from 0 to 10).

For more information, do take a look at the [[2007:Symbolic_Melodic_Similarity_Results |2007 SMS Results Page]].

== Submission Format ==

=== Input ===

Parameters: 
- the name of a directory containing about 5,000 MIDI files containing monophonic folk songs and 
- the name of one MIDI file containing a monophonic query.

E.g.
myAlgo.sh /path/to/folder/withMIDIfile/ /path/to/query.mid

The program will be called once for each query.

=== Output ===

A list of the names of the 10 most similar matching MIDI files, ordered by melodic similarity. Write the file name in separate lines, without empty lines in between.

E.g.
query1.mid song242.mid song213.mid song1242.mid ...
query2.mid song5454.mid song423.mid song454.mid ...
...

E.g.
query1.mid,song242.mid,song213.mid,song1242.mid ...
query2.mid,song5454.mid,song423.mid,song454.mid ...
...

=== Packaging submissions ===

* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
* Be sure to follow the [[2006:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2010 Submission Instructions]]

All submissions should include a README file including the following the information:

* Command line calling format for all executables including examples
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
* Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
* Any special notice regarding to running your algorithm

Note that the information that you place in the README file is '''extremely''' important in ensuring that your submission is evaluated properly.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 24 hours will be imposed on feature extraction times.

A hard limit of 48 hours will be imposed on the 3 training/classification cycles, leading to a total runtime limit of 72 hours for each submission.

== Submission opening date ==
TBA

== Submission closing date ==
TBA

2010:Multiple Fundamental Frequency Estimation & Tracking

2010-05-28T17:05:42Z

Kriswest: teaking layout and text for consistency with other proposals

2010:Audio Chord Estimation

2010-05-28T16:30:59Z

Kriswest: Proofing and reordering full proposal text

== Description ==
This task requires participants to extract or transcribe a sequence of chords from an audio music recording. For many applications in music information retrieval, extracting the harmonic structure of an audio track is very desirable, for example for segmenting pieces into characteristic segments, for finding similar pieces, or for semantic analysis of music.

The extraction of the harmonic structure requires the detection of as many chords as possible in a piece. That includes the characterisation of chords with a key and type as well as a chronological order with onset and duration of the chords.

Although some publications are available on this topic [1,2,3,4,5], comparison of the results is difficult, because different measures are used to assess the performance. To overcome this problem an accurately defined methodology is needed. This includes a repertory of the findable chords, a defined test set along with ground truth and unambiguous calculation rules to measure the performance.

== Data ==
Two datasets are used to evaluate chord transcription accuracy:

=== Beatles dataset ===
Christopher Harte`s Beatles dataset consisting of annotations of 12 Beatles albums.

The text annotation procedure of musical chords that was used to produce this dataset is presented in [6].

=== Queen and Zweieck dataset ===
Matthias Mauch's Queen and Zweieck dataset consisting of 38 songs from Queen and Zweieck.

===Example ground-truth file ===
The ground-truth files take the form:

...
41.2631021 44.2456460 B
44.2456460 45.7201130 E
45.7201130 47.2061900 E:7/3
47.2061900 48.6922670 A
48.6922670 50.1551240 A:min/b3
...

== Evaluation ==

=== Segmentation Score ===

The segmentation score will be calculated using directional hamming distance as described in [8]. An over-segmentation value (m) and an under-segmentation value (f) will be calculated and the final segmentation score will be calculated using the worst case from these two i.e:

segmentation score = 1 - max(m,f)

m and f are not independent of each other so combining them this way ensures that a good score in one does not hide a bad score in the other. The combined segmentation score can take values between 0 and 1 with 0 being the worst and 1 being the best result.--[[User:Chrish|Chrish]] 17:05, 9 September 2009 (UTC)

=== Frame-based recall ===

For recall evaluation, we may define a different chord dictionary for each level of evaluation (dyads, triads, tetrads etc). Each dictionary is a text file containing chord shorthands / interval lists of the chords that will be considered in that evaluation. The following dictionaries are proposed:

For dyad comparison of major/minor chords only:

N 
X:maj 
X:min 

For comparison of standard triad chords:

N 
X:maj 
X:min 
X:aug 
X:dim 
X:sus2 
X:sus4 

For comparison of tetrad (quad) chords:

N 
X:maj 
X:min 
X:aug 
X:dim 
X:sus2 
X:sus4 
X:maj7 
X:7 
X:maj(9) 
X:aug(7) 
X:min(7) 
X:min7 
X:min(9) 
X:dim(7) 
X:hdim7 
X:sus4(7) 
X:sus4(b7) 
X:dim7 

For each evaluation level, the ground truth annotation is compared against the dictionary. Any chord label not belonging to the current dictionary will be replaced with an "X" in a local copy of the annotation and will not be included in the recall calculation.

Note that the level of comparison in terms of intervals can be varied. For example, in a triad evaluation we can consider the first three component intervals in the chord so that a major (1,3,5) and a major7 (1,3,5,7) will be considered the same chord. For a tetrad (quad) evaluation, we would consider the first 4 intervals so major and major7 would then be considered to be different chords.

For the maj/min evaluation (using the first example dictionary), using an interval comparison of 2 (dyad) will compare only the first two intervals of each chord label. This would map augmented and diminished chords to major and minor respectively (and any other symbols that had a major 3rd or minor 3rd as their first interval). Using an interval comparison of 3 with the same dictionary would keep only those chords that have major and minor triads as their first 3 intervals so augmented and diminished chords would be removed from the evaluation.

After the annotation has been "filtered" using a given dictionary, it can be compared against the machine generated estimates output by the algorithm under test. The chord sequences described in the annotation and estimate text files are sampled at a given frame rate (in this case 10ms per frame) to give two sequences of chord frames which may be compared directly with each other. For calculating a hit or a miss, the chord labels from the current frame in each sequence will be compared. Chord comparison is done by converting each chord label into an ordered list of pitch classes then comparing the two lists element by element. If the lists match to the required number of intervals then a hit is recorded, otherwise the estimate is considered a miss. It should be noted that, by converting to pitch classes in the comparison, this evaluation ignores enharmonic pitch and interval spellings so the following chords (slightly silly example just for illustration) will all evaluate as identical:

C:maj = Dbb:maj = C#:(b1,b3,#4)

Basic recall calculation algorithm:

1) filter annotated transcription using chord dictionary for a defined number of intervals

2) sample annotated transcription and machine estimated transcription at 10ms intervals to create a sequence of annotation frames and estimate frames

3) start at the first frame

4) get chord label for current annotation frame and estimate frame

5) check annotation label: 

IF symbol is 'X' (i.e. non-dictionary) 

THEN ignore frame (record number of ignored frames) 

ELSE compare annotated/estimated chords for the predefined number of intervals 
increment hit count if chords match 

ENDIF

6) increment frame count

7) go back to 4 until final chord frame
--[[User:Chrish|Chrish]] 17:05, 9 September 2009 (UTC)

== Submission Format ==

=== Audio Format ===
Audio tracks will be encoded as 44.1 kHz 16bit mono WAV files.

=== I/O Format ===
The expected output chord transcription file for participating algorithms is that proposed by Christopher Harte [6].

Hence, algorithms should output text files with a similar format to that used in the ground truth transcriptions. That is to say, they should be flat text files with chord segment labels and times arranged thus:

start_time end_time chord_label

with elements separated by white spaces, times given in seconds, chord labels corresponding to the syntax described in [6] and one chord segment per line.

The chord root is given as a natural (A|B|C|D|E|F|G) followed by optional sharp or flat modifiers (#|b). For the evaluation process we may assume enharmonic equivalence for chord roots. For a given chord type on root X, the chord labels can be given as a list of intervals or as a shorthand notation as shown in the following table:

{|border="1" cellpadding="5" cellspacing="0" align="center"
|-
!NAME
!INTERVALS
!SHORTHAND
|-
|-*Triads:
|-
|-
|major
|X:(1,3,5)
|X or X:maj
|-
|-
|minor
|X:(1,b3,5)
|X:min
|-
|-
|diminished
|X:(1,b3,b5)
|X:dim
|-
|-
|augmented
|X:(1,3,#5)
|X:aug
|-
|-
|suspended4
|X:(1,4,5)
|X:sus4
|-
|-
|possible 6th triad:
|
|
|-
|-
|suspended2
|X:(1,2,5)
|X:sus2
|-
|-
|*Quads:
|
|
|-
|-
|major-major7
|X:(1,3,5,7)
|X:maj7
|-
|-
|major-minor7
|X:(1,3,5,b7)
|X:7
|-
|-
|major-add9
|X:(1,3,5,9)
|X:maj(9)
|-
|-
|major-major7-#5
|X:(1,3,#5,7)
|X:aug(7)
|-
|-
|minor-major7
|X:(1,b3,5,7)
|X:min(7)
|-
|-
|minor-minor7
|X:(1,b3,5,b7)
|X:min7
|-
|-
|minor-add9
|X:(1,b3,5,9)
|X:min(9)
|-
|-
|minor 7/b5 (ambiguous - could be either of the following)
|
|
|-
|-
|minor-major7-b5
|X:(1,b3,b5,7)
|X:dim(7)
|-
|-
|minor-minor7-b5 (a half diminished-7th)
|X:(1,b3,b5,b7)
|X:hdim7
|-
|-
|sus4-major7
|X:(1,4,5,7)
|X:sus4(7)
|-
|-
|sus4-minor7
|X:(1,4,5,b7)
|X:sus4(b7)
|-
|-
|omitted from list on wiki:
|
|
|-
|-
|diminished7
|X:(1,b3,b5,bb7)
|X:dim7
|-
|-
|No Chord
|N
|
|}

Please note that two things have changed in the syntax since it was originally described in [6]. The first change is that the root is no longer implied as a voiced element of a chord so a C major chord (notes C, E and G) should be written C:(1,3,5) instead of just C:(3,5) if using the interval list representation. As before, the labels C and C:maj are equivalent to C:(1,3,5). The second change is that the shorthand label "sus2" (intervals 1,2,5) has been added to the available shorthand list.--[[User:Chrish|Chrish]] 17:05, 9 September 2009 (UTC)

However, we still accept participants who would only like to be evaluated on major/minor chords and want to use the MIREX 2008 format which is an integer chord id on range 0-24, where values 0-11 denote the C major, C# major, ..., B major and 12-23 denote the C minor, C# minor, ..., B minor and 24 denotes silence or no-chord segments

=== Command line calling format ===

Submissions have to conform to the specified format below:

''extractFeaturesAndTrain "/path/to/trainFileList.txt" "/path/to/scratch/dir" ''

Where fileList.txt has the paths to each wav file. The features extracted on this stage can be stored under "/path/to/scratch/dir"
The ground truth files for the supervised learning will be in the same path with a ".txt" extension at the end. For example for "/path/to/trainFile1.wav", there will be a corresponding ground truth file called "/path/to/trainFile1.wav.txt" .

For testing:

''doChordID.sh "/path/to/testFileList.txt" "/path/to/scratch/dir" "/path/to/results/dir" ''

If there is no training, you can ignore the second argument here. In the results directory, there should be one file for each testfile with same name as the test file + .txt .

Programs can use their working directory if they need to keep temporary cache files or internal debuggin info. Stdout and stderr will be logged.

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed).

All submissions should include a README file including the following information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.

A hard limit of 24 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

== Submission opening date ==
TBA

== Submission closing date ==
TBA

== Bibliography ==

1. Harte,C.A. and Sandler,M.B.(2005). '''Automatic chord identification using a quantised chromagram.''' Proceedings of 118th Audio Engineering Society's Convention.

2. Sailer,C. and Rosenbauer K.(2006). '''A bottom-up approach to chord detection.''' Proceedings of International Computer Music Conference 2006.

3. Shenoy,A. and Wang,Y.(2005). '''Key, chord, and rythm tracking of popular music recordings.''' Computer Music Journal 29(3), 75-86.

4. Sheh,A. and Ellis,D.P.W.(2003). '''Chord segmentation and recognition using em-trained hidden markov models.''' Proceedings of 4th International Conference on Music Information Retrieval.

5. Yoshioka,T. et al.(2004). '''Automatic Chord Transcription with concurrent recognition of chord symbols and boundaries.''' Proceedings of 5th International Conference on Music Information Retrieval.

6. Harte,C. and Sandler,M. and Abdallah,S. and G├│mez,E.(2005). '''Symbolic representation of musical chords: a proposed syntax for text annotations.''' Proceedings of 6th International Conference on Music Information Retrieval.

7. Papadopoulos,H. and Peeters,G.(2007). '''Large-scale study of chord estimation algorithms based on chroma representation and HMM.''' Proceedings of 5th International Conference on Content-Based Multimedia Indexing.

8. Samer Abdallah, Katy Noland, Mark Sandler, Michael Casey & Christophe Rhodes: '''Theory and Evaluation of a Bayesian Music Structure Extractor''' (pp. 420-425) Proc. 6th International Conference on Music Information Retrieval, ISMIR 2005.

2010:Audio Chord Estimation

2010-05-28T16:15:39Z

Kriswest: /* Data */

== Description ==

The text of this section is copied from the 2009 page. This task was first run in 2008. Please add your comments and discussions for 2010.

For many applications in music information retrieval, extracting the harmonic structure is very desirable, for example for segmenting pieces into characteristic segments, for finding similar pieces, or for semantic analysis of music.

The extraction of the harmonic structure requires the detection of as many chords as possible in a piece. That includes the characterisation of chords with a key and type as well as a chronological order with onset and duration of the chords.

Although some publications are available on this topic [1,2,3,4,5], comparison of the results is difficult, because different measures are used to assess the performance. To overcome this problem an accurately defined methodology is needed. This includes a repertory of the findable chords, a defined test set along with ground truth and unambiguous calculation rules to measure the performance.

Regarding this we suggest to introduced the new evaluation task ''Audio Chord Detection''.

== Data ==
Two datasets are used to evaluate chord transcription accuracy:

=== Beatles dataset ===
Christopher Harte`s Beatles dataset consisting of annotations of 12 Beatles albums.

The text annotation procedure of musical chords that was used to produce this dataset is presented in [6].

=== Queen and Zweieck dataset ===
Matthias Mauch's Queen and Zweieck dataset consisting of 38 songs from Queen and Zweieck.

== I/O Format ==

This year I/O format needs to be changed to evaluate on all triads an quads.
We are planning to use the format suggested by Christopher Harte [6].
The chord root is given as a natural (A|B|C|D|E|F|G) followed by optional sharp or flat modifiers (#|b). For the evaluation process we may assume enharmonic equivalence for chord roots. For a given chord type on root X, the chord labels can be given as a list of intervals or as a shorthand notation as shown in the following table:

{|border="1" cellpadding="5" cellspacing="0" align="center"
|-
!NAME
!INTERVALS
!SHORTHAND
|-
|-*Triads:
|-
|-
|major
|X:(1,3,5)
|X or X:maj
|-
|-
|minor
|X:(1,b3,5)
|X:min
|-
|-
|diminished
|X:(1,b3,b5)
|X:dim
|-
|-
|augmented
|X:(1,3,#5)
|X:aug
|-
|-
|suspended4
|X:(1,4,5)
|X:sus4
|-
|-
|possible 6th triad:
|
|
|-
|-
|suspended2
|X:(1,2,5)
|X:sus2
|-
|-
|*Quads:
|
|
|-
|-
|major-major7
|X:(1,3,5,7)
|X:maj7
|-
|-
|major-minor7
|X:(1,3,5,b7)
|X:7
|-
|-
|major-add9
|X:(1,3,5,9)
|X:maj(9)
|-
|-
|major-major7-#5
|X:(1,3,#5,7)
|X:aug(7)
|-
|-
|minor-major7
|X:(1,b3,5,7)
|X:min(7)
|-
|-
|minor-minor7
|X:(1,b3,5,b7)
|X:min7
|-
|-
|minor-add9
|X:(1,b3,5,9)
|X:min(9)
|-
|-
|minor 7/b5 (ambiguous - could be either of the following)
|
|
|-
|-
|minor-major7-b5
|X:(1,b3,b5,7)
|X:dim(7)
|-
|-
|minor-minor7-b5 (a half diminished-7th)
|X:(1,b3,b5,b7)
|X:hdim7
|-
|-
|sus4-major7
|X:(1,4,5,7)
|X:sus4(7)
|-
|-
|sus4-minor7
|X:(1,4,5,b7)
|X:sus4(b7)
|-
|-
|omitted from list on wiki:
|
|
|-
|-
|diminished7
|X:(1,b3,b5,bb7)
|X:dim7
|-
|-
|No Chord
|N
|
|}

However, we still accept participants who would only like to be evaluated on major/minor and want to use last year`s format which is an integer chord id on range 0-24, where values 0-11 denote the C major, C# major, ..., B major and 12-23 denote the C minor, C# minor, ..., B minor and 24 denotes silence or no-chord segments

== Evaluation ==

Algorithms should output text files with a similar format to that used in the ground truth transcriptions. That is to say, they should be flat text files with chord segment labels and times arranged thus:

start_time end_time chord_label

with elements separated by white spaces, times given in seconds, chord labels corresponding to the syntax described in [6] and one chord segment per line.

Please note that two things have changed in the syntax since it was originally described in [6]. The first change is that the root is no longer implied as a voiced element of a chord so a C major chord (notes C, E and G) should be written C:(1,3,5) instead of just C:(3,5) if using the interval list representation. As before, the labels C and C:maj are equivalent to C:(1,3,5). The second change is that the shorthand label "sus2" (intervals 1,2,5) has been added to the available shorthand list.--[[User:Chrish|Chrish]] 17:05, 9 September 2009 (UTC)

=== Segmentation Score ===

The segmentation score will be calculated using directional hamming distance as described in [8]. An over-segmentation value (m) and an under-segmentation value (f) will be calculated and the final segmentation score will be calculated using the worst case from these two i.e:

segmentation score = 1 - max(m,f)

m and f are not independent of each other so combining them this way ensures that a good score in one does not hide a bad score in the other. The combined segmentation score can take values between 0 and 1 with 0 being the worst and 1 being the best result.--[[User:Chrish|Chrish]] 17:05, 9 September 2009 (UTC)

=== Frame-based recall ===

For recall evaluation, we may define a different chord dictionary for each level of evaluation (dyads, triads, tetrads etc). Each dictionary is a text file containing chord shorthands / interval lists of the chords that will be considered in that evaluation. The following dictionaries are proposed:

For dyad comparison of major/minor chords only:

N 
X:maj 
X:min 

For comparison of standard triad chords:

N 
X:maj 
X:min 
X:aug 
X:dim 
X:sus2 
X:sus4 

For comparison of tetrad (quad) chords:

N 
X:maj 
X:min 
X:aug 
X:dim 
X:sus2 
X:sus4 
X:maj7 
X:7 
X:maj(9) 
X:aug(7) 
X:min(7) 
X:min7 
X:min(9) 
X:dim(7) 
X:hdim7 
X:sus4(7) 
X:sus4(b7) 
X:dim7 

For each evaluation level, the ground truth annotation is compared against the dictionary. Any chord label not belonging to the current dictionary will be replaced with an "X" in a local copy of the annotation and will not be included in the recall calculation.

Note that the level of comparison in terms of intervals can be varied. For example, in a triad evaluation we can consider the first three component intervals in the chord so that a major (1,3,5) and a major7 (1,3,5,7) will be considered the same chord. For a tetrad (quad) evaluation, we would consider the first 4 intervals so major and major7 would then be considered to be different chords.

For the maj/min evaluation (using the first example dictionary), using an interval comparison of 2 (dyad) will compare only the first two intervals of each chord label. This would map augmented and diminished chords to major and minor respectively (and any other symbols that had a major 3rd or minor 3rd as their first interval). Using an interval comparison of 3 with the same dictionary would keep only those chords that have major and minor triads as their first 3 intervals so augmented and diminished chords would be removed from the evaluation.

After the annotation has been "filtered" using a given dictionary, it can be compared against the machine generated estimates output by the algorithm under test. The chord sequences described in the annotation and estimate text files are sampled at a given frame rate (in this case 10ms per frame) to give two sequences of chord frames which may be compared directly with each other. For calculating a hit or a miss, the chord labels from the current frame in each sequence will be compared. Chord comparison is done by converting each chord label into an ordered list of pitch classes then comparing the two lists element by element. If the lists match to the required number of intervals then a hit is recorded, otherwise the estimate is considered a miss. It should be noted that, by converting to pitch classes in the comparison, this evaluation ignores enharmonic pitch and interval spellings so the following chords (slightly silly example just for illustration) will all evaluate as identical:

C:maj = Dbb:maj = C#:(b1,b3,#4)

Basic recall calculation algorithm:

1) filter annotated transcription using chord dictionary for a defined number of intervals

2) sample annotated transcription and machine estimated transcription at 10ms intervals to create a sequence of annotation frames and estimate frames

3) start at the first frame

4) get chord label for current annotation frame and estimate frame

5) check annotation label: 

IF symbol is 'X' (i.e. non-dictionary) 

THEN ignore frame (record number of ignored frames) 

ELSE compare annotated/estimated chords for the predefined number of intervals 
increment hit count if chords match 

ENDIF

6) increment frame count

7) go back to 4 until final chord frame
--[[User:Chrish|Chrish]] 17:05, 9 September 2009 (UTC)

== Submission Format ==

Submissions have to conform to the specified format below:

''extractFeaturesAndTrain "/path/to/trainFileList.txt" "/path/to/scratch/dir" ''

Where fileList.txt has the paths to each wav file. The features extracted on this stage can be stored under "/path/to/scratch/dir"
The ground truth files for the supervised learning will be in the same path with a ".txt" extension at the end. For example for "/path/to/trainFile1.wav", there will be a corresponding ground truth file called "/path/to/trainFile1.wav.txt" .

For testing:

''doChordID.sh "/path/to/testFileList.txt" "/path/to/scratch/dir" "/path/to/results/dir" ''

If there is no training, you can ignore the second argument here. In the results directory, there should be one file for each testfile with same name as the test file + .txt .

Programs can use their working directory if they need to keep temporary cache files or internal debuggin info. Stdout and stderr will be logged.

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of
dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the
information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks,
hard limits on the runtime of submissions are specified.

A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

== Submission opening date ==
TBA

== Submission closing date ==
TBA

== Bibliography ==

1.Harte,C.A. and Sandler,M.B.(2005). '''Automatic chord identification using a quantised chromagram.''' Proceedings of 118th Audio Engineering Society's Convention.

2.Sailer,C. and Rosenbauer K.(2006). '''A bottom-up approach to chord detection.''' Proceedings of International Computer Music Conference 2006.

3.Shenoy,A. and Wang,Y.(2005). '''Key, chord, and rythm tracking of popular music recordings.''' Computer Music Journal 29(3), 75-86.

4.Sheh,A. and Ellis,D.P.W.(2003). '''Chord segmentation and recognition using em-trained hidden markov models.''' Proceedings of 4th International Conference on Music Information Retrieval.

5.Yoshioka,T. et al.(2004). '''Automatic Chord Transcription with concurrent recognition of chord symbols and boundaries.''' Proceedings of 5th International Conference on Music Information Retrieval.

6.Harte,C. and Sandler,M. and Abdallah,S. and G├│mez,E.(2005). '''Symbolic representation of musical chords: a proposed syntax for text annotations.''' Proceedings of 6th International Conference on Music Information Retrieval.

7.Papadopoulos,H. and Peeters,G.(2007). '''Large-scale study of chord estimation algorithms based on chroma representation and HMM.''' Proceedings of 5th International Conference on Content-Based Multimedia Indexing.

8.Samer Abdallah, Katy Noland, Mark Sandler, Michael Casey & Christophe Rhodes: '''Theory and Evaluation of a Bayesian Music Structure Extractor''' (pp. 420-425) Proc. 6th International Conference on Music Information Retrieval, ISMIR 2005.

2010:Audio Classification (Train/Test) Tasks

2010-05-25T08:58:12Z

Kriswest: /* Audio US Pop Music Genre Classification */

== Description ==
Many tasks in music classification can be characterized into a two-stage process: training classification models using labeled data and testing the models using new/unseen data. Therefore, we propose this "meta" task which includes various audio classification tasks that follow this Train/Test process. For MIREX 2010, five classification sub-tasks are included:

*Audio Artist Identification
*Audio Classical Composer Identification
*Audio US Pop Music Genre Classification
*Audio Latin Music Genre Classification
*Audio Mood Classification

All five classification tasks were conducted in previous MIREX runs (please see [[#Links to Previous MIREX Runs of These Classification Tasks]]). This page presents the evaluation of these tasks, including the datasets as well as the submission rules and formats.

=== Task specific mailing list ===
A specific mailing list is provided for the discussion of this task and related tasks ( [[2010:Audio Classification (Test/Train) tasks]], [[2010:Audio_Cover_Song_Identification]], [[2010:Audio_Tag_Classification]], [[2010:Audio_Music_Similarity_and_Retrieval]]) at: [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00 https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00]. If you wish to participate in any of these tasks please sign up to this mailing listas discussion of the task format and evaluation should be conducted there.

== Data ==

=== Audio Artist Identification ===
This dataset requires algorithms to classify music audio according to the performing artist. The collection used at MIREX 2009 will be re-used.

Collection statistics:
* 3150 30-second 22.05kHz mono wav audio clips drawn from a collection US Pop music.
* 105 artists (30 clips per artist drawn from 3 albums).

=== Audio Classical Composer Identification ===
This dataset requires algorithms to classify music audio according to the composer of the track (drawn from a collection of performances of a variety of classical music genres). The collection used at MIREX 2009 will be re-used.

Collection statistics:
* 2772 30-second 22.05 kHz mono wav clips
* 11 "classical" composers (252 clips per composer), including:
** Bach
** Beethoven
** Brahms
** Chopin
** Dvorak
** Handel
** Haydn
** Mendelssohn
** Mozart
** Schubert
** Vivaldi

=== Audio US Pop Music Genre Classification ===
This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of US Pop music tracks). The MIREX 2007 Genre dataset will be re-used, which was drawn from the USPOP 2002 and USCRAP collections.

Collection statistics:
* 7000 30-second audio clips in 22.05kHz mono WAV format
* 10 genres (700 clips from each genre), including:
** Blues
** Jazz
** Country/Western
** Baroque
** Classical
** Romantic
** Electronica
** Hip-Hop
** Rock
** HardRock/Metal

=== Latin Music Genre Classification ===
This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of Latin popular and dance music, sourced from Brazil and hand labeled by music experts). Carlos Silla's (cns2 (at) kent (dot) ac (dot) uk) Latin popular and dance music dataset [http://ismir2008.ismir.net/papers/ISMIR2008_106.pdf] will be re-used. This collection is likely to contain a greater number of styles of music that will be differentiated by rhythmic characteristics than the MIREX 2007 dataset.

Collection statistics:
* 3,227 audio files in 22.05kHz mono WAV format
* 10 Latin music genres, including:
** Axe
** Bachata
** Bolero
** Forro
** Gaucha
** Merengue
** Pagode
** Sertaneja
** Tango

=== Audio Mood Classification ===
This dataset requires algorithms to classify music audio according to the mood of the track (drawn from a collection of production msuic sourced from the APM collection [www.apmmusic.com]). The MIREX 2007 Mood Classification dataset will be re-used.

Collection statistics:
* 600 30 second audio clips in 22.05kHz mono WAV format selected from the APM collection [http://www.apmmusic.com], and labeled by human judges using the Evalutron6000 system.
* 5 mood categories each of which contains 120 clips:
**Cluster_1: passionate, rousing, confident,boisterous, rowdy
**Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured
**Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding
**Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry
**Cluster_5: aggressive, fiery,tense/anxious, intense, volatile,visceral

== Audio Formats ==
For all datasets, participating algorithms will have to read audio in the following format:

* Sample rate: 22 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV

== Evaluation ==
This section first describes evaluation methods common to all the datasets, then specifies settings unique to each of the tasks.

Participating algorithms will be evaluated with 3-fold cross validation. For '''Artist Identification''', album filtering will be used the test and training splits, i.e. training and test sets will contain tracks from different albums; for '''Genre Classification''', artist filtering will be used the test and training splits, i.e. training and test sets will contain different artists.

The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.

Classification accuracies will be tested for statistically significant differences using Friedman's Anova with Tukey-Kramer honestly significant difference (HSD) tests for multiple comparisons. This test will be used to rank the algorithms and to group them into sets of equivalent performance.

In addition computation times for feature extraction and training/classification will be measured.

== Submission Format ==
=== File I/O Format ===
The audio files to be used in these tasks will be specified in a simple ASCII list file. The formats for the list files are specified below:

==== Feature extraction list file ====
The list file passed for feature extraction will be a simple ASCII list file. This file will contain one path per line with no header line.
I.e.
<example path and filename>

E.g.
/path/to/track1.wav
/path/to/track2.wav
...

==== Training list file ====
The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the class (artist, genre or mood) label, again with no header line.

I.e.
<example path and filename>\t<class label>

E.g.
/path/to/track1.wav rock
/path/to/track2.wav blues
...

==== Test (classification) list file ====
The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.

I.e.
<example path and filename>

E.g.
/path/to/track1.wav
/path/to/track2.wav
...

==== Classification output file ====
Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.

I.e.
<example path and filename>\t<class label>

E.g.
/path/to/track1.wav classical
/path/to/track2.wav blues
...

=== Submission calling formats ===
Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.

Hence, participants should provide two executables or command line parameters for a single executable to run the two separate processes.

Executables will have to accept the paths to the aforementioned list files as command line parameters.

Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique file names will be assigned to each audio track.

==== Example submission calling formats ====

extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt

extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
Train.sh /path/to/scratch/folder /path/to/trainListFile.txt
Classify.sh /path/to/testListFile.txt /path/to/outputListFile.txt

myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt
myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt
myAlgo.sh -classify /path/to/testListFile.txt /path/to/outputListFile.txt

Multi-processor compute nodes will be used to run this task, however, we ask that submissions use no more than 4 cores (as we will be running a lot of submissions and will need to run some in parallel). Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 1, 2 or 4 thread/core configurations.

extractFeatures.sh -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
TrainAndClassify.sh -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt

myAlgo.sh -extract -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
myAlgo.sh -TrainAndClassify -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt

=== Packaging submissions ===

* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
* Be sure to follow the [[2006:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2010 Submission Instructions]]

All submissions should include a README file including the following the information:

* Command line calling format for all executables including examples
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
* Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
* Any special notice regarding to running your algorithm

Note that the information that you place in the README file is '''extremely''' important in ensuring that your submission is evaluated properly.

=== Time and hardware limits ===
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 24 hours will be imposed on feature extraction times.

A hard limit of 48 hours will be imposed on the 3 training/classification cycles, leading to a total runtime limit of 72 hours for each submission.

=== Submission opening date ===

TBA

=== Submission closing date ===

TBA

== Links to Previous MIREX Runs of These Classification Tasks ==

=== Audio Artist Identification ===
[[2009:Audio Artist Identification|Artist Identification in MIREX 2009]] || [[2009:Audio Classical Composer Identification Results|Results(Classical Composer)]]

[[2008:Audio Artist Identification|Artist Identification in MIREX 2008]] || [[2008:Audio Classical Composer Identification Results|Results(Classical Composer)]] || [[2008:Audio_Artist_Identification_Results|Results(Artist Identification)]]

[[2007:Audio_Artist_Identification|Artist Identification in MIREX 2007]] || [[2007:Audio_Artist_Identification_Results|Results]]

[[2007:Audio_Classical_Composer_Identification|Classical Composer Identification in MIREX 2007]] || [[2007:Audio_Classical_Composer_Identification_Results|Results]]

[[2005:Audio_Artist_Identification|Artist Identification in MIREX 2005]] || [https://www.music-ir.org/evaluation/mirex-results/audio-artist/index.html Results]

[http://ismir2004.ismir.net/genre_contest/index.htm Audio Artist Identification in ISMIR2004 Audio Description Contest]

=== Audio Genre Classification ===
[[2009:Audio_Genre_Classification|Audio Genre Classification in MIREX 2009]] || [[2009:Audio_Genre_Classification_(Latin_Set)_Results|Results(Latin Set)]] || [[2009:Audio_Genre_Classification_(Mixed_Set)_Results|Results(Mixed Set)]]

[[2008:Audio_Genre_Classification|Audio Genre Classification in MIREX 2008]] || [[2008:Audio_Genre_Classification_Results|Results]]

[[2007:Audio_Genre_Classification|Audio Genre Classification in MIREX 2007]] || [[2007:Audio_Genre_Classification_Results|Results]]

[[2005:Audio_Genre_Classification|Audio Genre Classification in MIREX 2005]] || [https://www.music-ir.org/evaluation/mirex-results/audio-genre/index.html Results]

[http://ismir2004.ismir.net/genre_contest/index.htm Audio Artist Identification in ISMIR2004 Audio Description Contest]

=== Audio Mood Classification ===
[[2009:Audio_Music_Mood_Classification|Audio Mood Classification in MIREX 2009]] || [[2009:Audio_Music_Mood_Classification_Results|Results]]

[[2008:Audio_Music_Mood_Classification|Audio Mood Classification in MIREX 2008]] || [[2008:Audio_Music_Mood_Classification_Results|Results]]

[[2007:Audio_Music_Mood_Classification|Audio Mood Classification in MIREX 2007]] || [[2007:Audio_Music_Mood_Classification_Results|Results]]

MIREX HOME

2010-05-24T21:06:45Z

Kriswest: /* Introduction */

__TOC__

== Introduction ==
The Music Information Retrieval Evaluation eXchange (MIREX) is an annual evaluation campaign for Music Information Retrieval (MIR) algorithms, coupled to the [http://www.ismir.net International Society (and Conference) for Music Information Retrieval (ISMIR)]. MIREX is hosted by the [https://www.music-ir.org/evaluation/ International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL)] at the [http://www.lis.illinois.edu/ Graduate School of Library Information Sciences (GSLIS)] which is part of the [http://www.illinois.edu/ University of Illinois at Urbana-Champaign (UIUC)].

Current and Future MIREXs are in part facilitated by the work of the [https://nema.lis.illinois.edu/drupal/ Networked Environment for Music Analysis (NEMA) project]. The NEMA projects aims to automate and expose the workings of MIREX and MIR experimentation/evaluation to the MIR community, helping to deal with issues of collection/data/code sharing within the community by handling issues relating to copyright and IP restrictions by allowing MIR researchers to work (remotely) with resources without having to obtain licenses to the content/code.

MIR tasks evaluated at past MIREXs include:
* Audio Test/Train tasks
** Audio Artist Identification
** Audio Genre Classification
** Audio Music Mood Classification
** Audio Classical Composer Identification
* Symbolic Genre Classification
* Audio Onset detection
* Audio Key detection
* Symbolic key detection
* Audio Tag Classification
* Audio Cover Song Identification
* Real-time Audio to Score Alignment (a.k.a Score Following)
* Query by Singing/Humming
* Multiple Fundamental Frequency Estimation & Tracking
* Audio Chord Estimation
* Audio Melody Extraction
* Query by Tapping
* Audio Beat Tracking
* Audio Music Similarity and Retrieval
* Symbolic Music Similarity and Retrieval
* Structural Segmentation
* Audio Drum Detection
* Audio Tempo Extraction

== Current MIREX Wiki (2010) ==
You can view the current 2010 content here: [[2010:Main_Page]]

== Recent Changes ==
We recently have merged all current and previous iterations of the MIREX wiki into a single wiki installation to make it easier to manage. All the pages, images, abstracts, and images have been migrated, but some links and images may still be broken. We're currently manually inspecting all pages, but would appreciate your help in correcting any errors you see.

Content on the wiki is now organized into mediawiki namespaces, one for each year. You can view the current 2010 content here: [[2010:Main_Page]]

Similarly for previous content.
* [[2009:Main_Page]]
* [[2008:Main_Page]]
* [[2007:Main_Page]]
* [[2006:Main_Page]]
* [[2005:Main_Page]]

All links to older wiki content will be redirected to this new wiki, and should take you to the correct page on the new installation, but please update any bookmarks or links you may have which point into current or old wiki content.

MIREX HOME

2010-05-24T21:06:20Z

Kriswest: Changing submissions list

__TOC__

== Introduction ==
The Music Information Retrieval Evaluation eXchange (MIREX) is an annual evaluation campaign for Music Information Retrieval (MIR) algorithms, coupled to the [http://www.ismir.net International Society (and Conference) for Music Information Retrieval (ISMIR)]. MIREX is hosted by the [https://www.music-ir.org/evaluation/ International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL)] at the [http://www.lis.illinois.edu/ Graduate School of Library Information Sciences (GSLIS)] which is part of the [http://www.illinois.edu/ University of Illinois at Urbana-Champaign (UIUC)].

Current and Future MIREXs are in part facilitated by the work of the [https://nema.lis.illinois.edu/drupal/ Networked Environment for Music Analysis (NEMA) project]. The NEMA projects aims to automate and expose the workings of MIREX and MIR experimentation/evaluation to the MIR community, helping to deal with issues of collection/data/code sharing within the community by handling issues relating to copyright and IP restrictions by allowing MIR researchers to work (remotely) with resources without having to obtain licenses to the content/code.

MIR tasks evaluated at past MIREXs include:
* Audio Test/Train tasks
** Audio Artist Identification
** Audio Genre Classification
** Audio Music Mood Classification
** Audio Classical Composer Identification
* Symbolic Genre Classification
* Audio Onset detection
* Audio Key detection
* Symbolic key detection
* Audio Tag Classification
* Audio Cover Song Identification
* Real-time Audio to Score Alignment (a.k.a Score Following)
* Query by Singing/Humming
* Multiple Fundamental Frequency Estimation & Tracking
* Audio Chord Estimation
* Audio Melody Extraction
* Query by Tapping
* Audio Beat Tracking
* Audio Music Similarity and Retrieval
* Symbolic Music Similarity and Retrieval
* Structural Segmentation=
* Audio Drum Detection
* Audio Tempo Extraction

== Current MIREX Wiki (2010) ==
You can view the current 2010 content here: [[2010:Main_Page]]

== Recent Changes ==
We recently have merged all current and previous iterations of the MIREX wiki into a single wiki installation to make it easier to manage. All the pages, images, abstracts, and images have been migrated, but some links and images may still be broken. We're currently manually inspecting all pages, but would appreciate your help in correcting any errors you see.

Content on the wiki is now organized into mediawiki namespaces, one for each year. You can view the current 2010 content here: [[2010:Main_Page]]

Similarly for previous content.
* [[2009:Main_Page]]
* [[2008:Main_Page]]
* [[2007:Main_Page]]
* [[2006:Main_Page]]
* [[2005:Main_Page]]

All links to older wiki content will be redirected to this new wiki, and should take you to the correct page on the new installation, but please update any bookmarks or links you may have which point into current or old wiki content.

2010:Audio Cover Song Identification

2010-05-24T20:16:06Z

Kriswest:

__TOC__

==Description==
This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or "cover songs".

Within the a collection of pieces in the cover song datasets, there are embedded a number of different "original songs" or compositions each represented by a number of different "versions". The "cover songs" or "versions" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.

Using each of these version files in turn as as the "seed/query" file, we examine the returned ranked lists of items from each algorithm for the presence of the other versions of the "seed/query" file.

Two datasets are used in this task, the MIREX 2006 US Pop Music Cover Song dataset Audio Cover Song dataset the [http://www.mazurka.org.uk/ Mazurka dataset].

=== Task specific mailing list ===
A specific mailing list is provided for the discussion of this task and related tasks ( [[2010:Audio Classification (Test/Train) tasks]], [[2010:Audio_Cover_Song_Identification]], [[2010:Audio_Tag_Classification]], [[2010:Audio_Music_Similarity_and_Retrieval]]) at: [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00 https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00]. If you wish to participate in any of these tasks please sign up to this mailing listas discussion of the task format and evaluation should be conducted there.

== Data ==
Two datasets will be used to evaluate cover song identification:

===US Pop Music Collection Cover Song (aka Mixed Collection)===
This is the "original" ACS collection. Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different "cover songs" each represented by 11 different "versions" for a total of 330 audio files.

Using each of these cover song files in turn as as the "seed/query" file, we will examine the returned lists of items for the presence of the other 10 versions of the "seed/query" file.

Collection statistics:
* 16bit, monophonic, 22.05khz, wav
* The "cover songs" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.
* Size: 1000 tracks
* Queries: 330 tracks

=== Sapp's Mazurka Collection Information ===
In addition to our original ACS dataset, we used the [http://www.mazurka.org.uk/ Mazurka.org dataset] put together by Craig Sapp. We randomly chose 11 versions from 49 mazurkas and ran it as a separate ACS subtask. Systems should return a distance matrix of 539x539 from which we located the ranks of each of the associated cover versions.

Collection statistics:
* 16bit, monophonic, 22.05khz, wav
* Size: 539 tracks
* Queries: 539 tracks

== Evaluation ==
The following evaluation metrics will be computed for each submission:
* Total number of covers identified in top 10
* Mean number of covers identified in top 10 (average performance)
* Mean (arithmetic) of Avg. Precisions
* Mean rank of first correctly identified cover

=== Ranking and significance testing ===
Friedman's ANOVA with Tukey-Kramer HSD will be run against the Average Precision summary data over the individual song groups to assess the significance of differences in performance and to rank the performances.

For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see:
@InProceedings{jones2007hsj,
title={"Human Similarity Judgements: Implications for the Design of Formal Evaluations"},
author="M.C. Jones and J.S. Downie and A.F. Ehmann",
BOOKTITLE ="Proceedings of ISMIR 2007 International Society of Music Information Retrieval",
year="2007"
}

=== Runtime performance ===
In addition computation times for feature extraction and training/classification will be measured.

== Submission Format ==
Submission to this task will have to conform to a specified format detailed below.

=== Implementation details ===
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.

=== I/O formats ===
=== Input Files ===

The feature extraction list file format will be of the form:

/path/to/audio/file/000.wav\n
/path/to/audio/file/001.wav\n
/path/to/audio/file/002.wav\n
...

The query list file format will be very similar, taking the form, and listing a subset of files from the feature extraction list file:

/path/to/audio/file/182.wav\n
/path/to/audio/file/245.wav\n
/path/to/audio/file/432.wav\n
...

For a total of ''<number of queries>'' rows -- query ids are assigned from the pool of ''<number of candidates>'' collection ids and should match the ids within the candidate collection.

Lines will be terminated by a '\n' character.

=== Output File ===
The only output will be a '''distance''' matrix file that is ''<number of queries>'' rows by ''<number of candidates>'' columns in the following format:

<pre>
Distance matrix header text with system name
1\t</path/to/audio/file/track1.wav>
2\t</path/to/audio/file/track2.wav>
3\t</path/to/audio/file/track3.wav>
4\t</path/to/audio/file/track4.wav>
...
N\t</path/to/audio/file/trackN.wav>
Q/R\t1\t2\t3\t4\t...\tN
1\t<dist 1 to 1>\t<dist 1 to 2>\t<dist 1 to 3>\t<dist 1 to 4>\t...\t<dist 1 to N>
3\t<dist 3 to 2>\t<dist 3 to 2>\t<dist 3 to 3>\t<dist 3 to 4>\t...\t<dist 3 to N>
</pre>

where N is <number of candidates> and the queries are drawn from this set (and bear the same track indexes if possible).

which might look like:

<pre>
Example distance matrix 0.1
1 /path/to/audio/file/track1.wav
2 /path/to/audio/file/track2.wav
3 /path/to/audio/file/track3.wav
4 /path/to/audio/file/track4.wav
5 /path/to/audio/file/track5.wav
Q/R 1 2 3 4 5
1 0.00000 1.24100 0.2e-4 0.42559 0.21313
3 50.2e-4 0.62640 0.00000 0.38000 0.15152
</pre>

Note that indexes of the queries refer back to the track list at the top of the distance matrix file to identify the query track. However, as long as you ensure that the query songs are listed in exactly the same order as they appear in the query list file you are passed we will be able to interpret the data.

All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a TAB.

To summarize, the distance matrix should be preceded by a system name, ''<number of candidates>'' rows of file paths and should be composed of ''<number of candidates>'' columns of distance (separated by tab characters) and ''<number of queries>'' rows (one for each original track query). Each row corresponds to a particular query song (the track to find covers of).

=== Command Line Calling Format ===

/path/to/submission <collection_list_file> <query_list_file> <working_directory> <output_file>
'''<collection_list_file>''': Text file containing ''<number of candidates>'' full path file names for the
''<number of candidates>'' audio files in the collection (including the ''<number of queries>''
query documents).
'''Example: /path/to/coversong/collection.txt'''
'''<query_list_file>''' : Text file containing the ''<number of queries>'' full path file names for the
''<number of queries>'' query documents.
'''Example: /path/to/coversong/queries.txt'''
'''<working_directory>''' : Full path to a temporary directory where submission will
have write access for caching features or calculations.
'''Example: /tmp/submission_id/'''
'''<output_file>''' : Full path to file where submission should output the similarity
matrix (''<number of candidates>'' header rows + ''<number of queries>'' x ''<number of candidates>'' data matrix).
'''Example: /path/to/coversong/results/submission_id.txt'''

E.g.
/path/to/m/submission.sh /path/to/feat_extract_file.txt /path/to/query_file.txt /path/to/scratch/dir /path/to/output_file.txt

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.

A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

== Submission opening date ==
TBA

== Submission closing date ==
TBA

2010:Audio Cover Song Identification

2010-05-24T15:39:23Z

Kriswest: tweaking data section and a few other bits

=2010 AUDIO COVER SONG IDENTIFICATION TASK OVERVIEW=

The text of this section is copied from the 2009 page. Please add your comments and discussions for 2010.

The Audio Cover Song task was a new task for MIREX 2006 and was last run in 2008. It was closely related to the [[2010:Audio Music Similarity and Retrieval]] (AMS) task as the cover songs were embedded in the Audio Music Similarity and Retrieval test collection.

==Description==
Within the a collection of pieces in the cover song datasets, there are embedded a number of different "original songs" or compositions each represented by a number of different "versions". The "cover songs" or "versions" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.

Using each of these version files in turn as as the "seed/query" file, we will examine the returned lists of items for the presence of the other versions of the "seed/query" file.

On top of the previous Audio Cover Song dataset, we are going to use the [http://www.mazurka.org.uk/ Mazurka dataset]. We are going to randomly choose 11 versions from 49 mazurkas and run it as a separate subtask. The I/O format will be the same as previous years. Systems will return a distance matrix of 539x539.

=== Task specific mailing list ===
A specific mailing list is provided for the discussion of this task and related tasks ( [[2010:Audio Classification (Test/Train) tasks]], [[2010:Audio_Cover_Song_Identification]], [[2010:Audio_Tag_Classification]], [[2010:Audio_Music_Similarity_and_Retrieval]]) at: [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00 https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00]. If you wish to participate in any of these tasks please sign up to this mailing listas discussion of the task format and evaluation should be conducted there.

== Data ==
Two datasets will be used to evaluate cover song identification:

===US Pop Music Collection (aka Mixed Collection)===
This is the "original" ACS collection. Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different "cover songs" each represented by 11 different "versions" for a total of 330 audio files.

Using each of these cover song files in turn as as the "seed/query" file, we will examine the returned lists of items for the presence of the other 10 versions of the "seed/query" file.

Collection statistics:
* 16bit, monophonic, 22.05khz, wav
* The "cover songs" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.
* Size: 1000 tracks
* Queries: 330 tracks

=== Sapp's Mazurka Collection Information ===
In addition to our original ACS dataset, we used the [http://www.mazurka.org.uk/ Mazurka.org dataset] put together by Craig Sapp. We randomly chose 11 versions from 49 mazurkas and ran it as a separate ACS subtask. Systems should return a distance matrix of 539x539 from which we located the ranks of each of the associated cover versions.

Collection statistics:
* 16bit, monophonic, 22.05khz, wav
* Size: 539 tracks
* Queries: 539 tracks

== Evaluation ==
The following evaluation metrics will be computed for each submission:
* Total number of covers identified in top 10
* Mean number of covers identified in top 10 (average performance)
* Mean (arithmetic) of Avg. Precisions
* Mean rank of first correctly identified cover

=== Ranking and significance testing ===
Friedman's ANOVA with Tukey-Kramer HSD will be run against the Average Precision summary data over the individual song groups to assess the significance of differences in performance and to rank the performances.

For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see:
@InProceedings{jones2007hsj,
title={"Human Similarity Judgements: Implications for the Design of Formal Evaluations"},
author="M.C. Jones and J.S. Downie and A.F. Ehmann",
BOOKTITLE ="Proceedings of ISMIR 2007 International Society of Music Information Retrieval",
year="2007"
}

=== Runtime performance ===
In addition computation times for feature extraction and training/classification will be measured.

== Submission Format ==
Submission to this task will have to conform to a specified format detailed below.

=== Implementation details ===
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.

=== I/O formats ===
=== Input Files ===

The feature extraction list file format will be of the form:

/path/to/audio/file/000.wav\n
/path/to/audio/file/001.wav\n
/path/to/audio/file/002.wav\n
...

The query list file format will be very similar, taking the form, and listing a subset of files from the feature extraction list file:

/path/to/audio/file/182.wav\n
/path/to/audio/file/245.wav\n
/path/to/audio/file/432.wav\n
...

For a total of ''<number of queries>'' rows -- query ids are assigned from the pool of ''<number of candidates>'' collection ids and should match the ids within the candidate collection.

Lines will be terminated by a '\n' character.

=== Output File ===
The only output will be a '''distance''' matrix file that is ''<number of queries>'' rows by ''<number of candidates>'' columns in the following format:

<pre>
Distance matrix header text with system name
1\t</path/to/audio/file/track1.wav>
2\t</path/to/audio/file/track2.wav>
3\t</path/to/audio/file/track3.wav>
4\t</path/to/audio/file/track4.wav>
...
N\t</path/to/audio/file/trackN.wav>
Q/R\t1\t2\t3\t4\t...\tN
1\t<dist 1 to 1>\t<dist 1 to 2>\t<dist 1 to 3>\t<dist 1 to 4>\t...\t<dist 1 to N>
3\t<dist 3 to 2>\t<dist 3 to 2>\t<dist 3 to 3>\t<dist 3 to 4>\t...\t<dist 3 to N>
</pre>

where N is <number of candidates> and the queries are drawn from this set (and bear the same track indexes if possible).

which might look like:

<pre>
Example distance matrix 0.1
1 /path/to/audio/file/track1.wav
2 /path/to/audio/file/track2.wav
3 /path/to/audio/file/track3.wav
4 /path/to/audio/file/track4.wav
5 /path/to/audio/file/track5.wav
Q/R 1 2 3 4 5
1 0.00000 1.24100 0.2e-4 0.42559 0.21313
3 50.2e-4 0.62640 0.00000 0.38000 0.15152
</pre>

Note that indexes of the queries refer back to the track list at the top of the distance matrix file to identify the query track. However, as long as you ensure that the query songs are listed in exactly the same order as they appear in the query list file you are passed we will be able to interpret the data.

All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a TAB.

To summarize, the distance matrix should be preceded by a system name, ''<number of candidates>'' rows of file paths and should be composed of ''<number of candidates>'' columns of distance (separated by tab characters) and ''<number of queries>'' rows (one for each original track query). Each row corresponds to a particular query song (the track to find covers of).

=== Command Line Calling Format ===

/path/to/submission <collection_list_file> <query_list_file> <working_directory> <output_file>
'''<collection_list_file>''': Text file containing ''<number of candidates>'' full path file names for the
''<number of candidates>'' audio files in the collection (including the ''<number of queries>''
query documents).
'''Example: /path/to/coversong/collection.txt'''
'''<query_list_file>''' : Text file containing the ''<number of queries>'' full path file names for the
''<number of queries>'' query documents.
'''Example: /path/to/coversong/queries.txt'''
'''<working_directory>''' : Full path to a temporary directory where submission will
have write access for caching features or calculations.
'''Example: /tmp/submission_id/'''
'''<output_file>''' : Full path to file where submission should output the similarity
matrix (''<number of candidates>'' header rows + ''<number of queries>'' x ''<number of candidates>'' data matrix).
'''Example: /path/to/coversong/results/submission_id.txt'''

E.g.
/path/to/m/submission.sh /path/to/feat_extract_file.txt /path/to/query_file.txt /path/to/scratch/dir /path/to/output_file.txt

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.

A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

== Submission opening date ==
TBA

== Submission closing date ==
TBA

MIREX HOME

2010-05-24T15:17:03Z

Kriswest: better link to 2010 wiki

__TOC__

== Introduction ==
The Music Information Retrieval Evaluation eXchange (MIREX) is an annual evaluation campaign for Music Information Retrieval (MIR) algorithms, coupled to the [http://www.ismir.net International Society (and Conference) for Music Information Retrieval (ISMIR)]. MIREX is hosted by the [https://www.music-ir.org/evaluation/ International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL)] at the [http://www.lis.illinois.edu/ Graduate School of Library Information Sciences (GSLIS)] which is part of the [http://www.illinois.edu/ University of Illinois at Urbana-Champaign (UIUC)].

Current and Future MIREXs are in part facilitated by the work of the [https://nema.lis.illinois.edu/drupal/ Networked Environment for Music Analysis (NEMA) project]. The NEMA projects aims to automate and expose the workings of MIREX and MIR experimentation/evaluation to the MIR community, helping to deal with issues of collection/data/code sharing within the community by handling issues relating to copyright and IP restrictions by allowing MIR researchers to work (remotely) with resources without having to obtain licenses to the content/code.

MIR tasks evaluated at past MIREXs include:
* Audio Test/Train tasks
** Audio Artist Identification
** Audio Genre Classification
** Audio Music Mood Classification
** Audio Classical Composer Identification
* Audio Onset detection
* Audio Key detection
* Audio Tag Classification
* Audio Cover Song Identification
* Real-time Audio to Score Alignment (a.k.a Score Following)
* Query by Singing/Humming
* Multiple Fundamental Frequency Estimation & Tracking
* Audio Chord Estimation
* Audio Melody Extraction
* Query by Tapping
* Audio Beat Tracking
* Audio Music Similarity and Retrieval
* Structural Segmentation

== Current MIREX Wiki (2010) ==
You can view the current 2010 content here: [[2010:Main_Page]]

== Recent Changes ==
We recently have merged all current and previous iterations of the MIREX wiki into a single wiki installation to make it easier to manage. All the pages, images, abstracts, and images have been migrated, but some links and images may still be broken. We're currently manually inspecting all pages, but would appreciate your help in correcting any errors you see.

Content on the wiki is now organized into mediawiki namespaces, one for each year. You can view the current 2010 content here: [[2010:Main_Page]]

Similarly for previous content.
* [[2009:Main_Page]]
* [[2008:Main_Page]]
* [[2007:Main_Page]]
* [[2006:Main_Page]]
* [[2005:Main_Page]]

All links to older wiki content will be redirected to this new wiki, and should take you to the correct page on the new installation, but please update any bookmarks or links you may have which point into current or old wiki content.

2010:Audio Cover Song Identification

2010-05-24T14:32:17Z

Kriswest: /* Output File */

2010:Audio Cover Song Identification

2010-05-24T13:11:02Z

Kriswest: better dist mat definition

2010:Audio Music Similarity and Retrieval

2010-05-24T13:06:22Z

Kriswest:

== Description ==
As the size of digitial music collections grow, music similarity has an increasingly important role as an aid to music discovery. A music similarity system can help a music consumer find new music by finding the music that is most musically similar to specific query songs (or is nearest to songs that the consumer already likes).

This page presents the Audio Music Similarity Evaluation, including the submission rules and formats. Additionally background information can be found here that should help explain some of the reasoning behind the approach taken in the evaluation. The intention of the Music Audio Search track is to evaluate music similarity searches (A music search engine that takes a single song as a query aka Query-by-example), not playlist generation or music recommendation.

=== Task specific mailing list ===
A specific mailing list is provided for the discussion of this task and related tasks ( [[2010:Audio Classification (Test/Train) tasks]], [[2010:Audio_Cover_Song_Identification]], [[2010:Audio_Tag_Classification]], [[2010:Audio_Music_Similarity_and_Retrieval]]) at: [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00 https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00]. If you wish to participate in any of these tasks please sign up to this mailing listas discussion of the task format and evaluation should be conducted there.

== Data ==
Collection statistics: 7000 30-second audio clips drawn from 10 genres (700 clips from each genre).

The Genres that data was drawn from are:
*Blues
*Jazz
*Country/Western
*Baroque
*Classical
*Romantic
*Electronica
*Hip-Hop
*Rock
*HardRock/Metal

=== Audio formats ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 22 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV
* clip length: 30 secs from the middle of each file

== Evaluation ==
Two distinct evaluations will be performed
* Human Evaluation
* Objective statistics derived from the results lists

Note that at MIREX 2006 particpating algorithms were required to return full distance matrices showing the distance between all tracks, however, in subsequent years we have also supported sparse distance matrix format (detailed below) where only the distances of the top 100 results for each query in the collection are returned.

=== Human Evaluation ===
The primary evaluation will involve subjective judgments by human evaluators of the retrieved sets using IMIRSEL's Evalutron 6000 system. This year algorithms will be presented with the same 30 second preview clip that will be reviewed by the human evaluators.

* Evaluator question: Given a search based on track A, the following set of results was returned by all systems. Please place each returned track into one of three classes (not similar, somewhat similar, very similar) and provide an inidcation on a continuous scale of 0 - 10 of high similar the track is to the query.
* ~120 randomly selected queries, 5 results per query, 1 set of eyes, ~10 participating labs
* Higher number of queries preferred as IR research indicates variance is in queries
* The songs by the same artist as the query will be filtered out of each result list (artist-filtering) to avoid colouring an evaluators judgement (a cover song or song by the same artist in a result list is likely to reduce the relative ranking of other similar but independent songs - use of songs by the same artist may allow over-fitting to affect the results)
* It will be possible for researchers to use this data for other types of system comparisons after MIREX 2007 results have been finalized.
* Human evaluation to be designed and led by IMIRSEL following a similar format to that used at MIREX 2006
* Human evaluators will be drawn from the participating labs (and any volunteers from IMIRSEL or on the MIREX lists)

=== Objective Statistics derived from the distance matrix ===
Statistics of each distance matrix will be calculated including:

* Average % of Genre, Artist and Album matches in the top 5, 10, 20 & 50 results - Precision at 5, 10, 20 & 50
* Average % of Genre matches in the top 5, 10, 20 & 50 results after artist filtering of results
* Average % of available Genre, Artist and Album matches in the top 5, 10, 20 & 50 results - Recall at 5, 10, 20 & 50 (just normalising scores when less than 20 matches for an artist, album or genre are available in the database)
* Always similar - Maximum # times a file was in the top 5, 10, 20 & 50 results
* % File never similar (never in a top 5, 10, 20 & 50 result list)
* % of 'test-able' song triplets where triangular inequality holds
** Note that as we are not requiring full distance matrices this year we will only be testing triangles that are found in the sparse distance matrix.
* Plot of the "number of times similar curve" - plot of song number vs. number of times it appeared in a top 20 list with songs sorted according to number times it appeared in a top 20 list (to produce the curve). Systems with a sharp rise at the end of this plot have "hubs", while a long 'zero' tail shows many never similar results.

=== Runtimes ===
In addition computation times for feature extraction/Index-building and querying
will be measured.

== Submission format ==
Submission to this task will have to conform to a specified format detailed below.

=== Implementation details ===
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.

Submissions will have to output either a full distance matrix or a search results file with the top 100 search results for each track in the collection. This list of results will be used to extract the artist-filtered results to present to the human evaluators and will facilitate the computation of the objective statistics.

=== I/O formats ===
In this section the input and output files used in this task are described as
are the command line calling format requirements for submissions.

==== Audio collection list file (input)====
The list file passed for feature extraction and indexing will be a simple ASCII list file. This file will contain one path per line with no header line, all paths will be absolute (full paths).

e.g.

/aDirectory/collectionFolder/b002342.wav
/aDirectory/collectionFolder/a005921.wav
...

==== Distance matrix output files ====
Participants should return one of two available output file formats, a full distance matrix or a sparse distance matrix.

===== Full Distance Matrix =====
Full distance matrix files should be generated in the the following format:

* A simple ASCII file listing a name for the algorithm on the first line,
* Numbered paths for each file appearing in the matrix, these can be in any order (i.e. the files don't have to be i the same order as they appeared in the list file) but should index into the columns/rows of of the distance matrix.
* A line beginning with 'Q/R' followed by a tab and tab separated list of the numbers 1 to N, where N is the files covered by the matrix.
* One line per file in the matrix give the distances of that files to each other file in the matrix. All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a single tab character. Obviously the diagonal of the matrix (distance or a track to itself) should be zero.

<pre>
Distance matrix header text with system name
1\t</path/to/audio/file/1.wav>
2\t</path/to/audio/file/2.wav>
3\t</path/to/audio/file/3.wav>
...
N\t</path/to/audio/file/N.wav>
Q/R\t1\t2\t3\t...\tN
1\t0.0\t<dist 1 to 2>\t<dist 1 to 3>\t...\t<dist 1 to N>
2\t<dist 2 to 1>\t0.0\t<dist 2 to 3>\t...\t<dist 2 to N>
3\t<dist 3 to 2>\t<dist 3 to 2>\t0.0\t...\t<dist 3 to N>
...\t...\t...\t...\t...\t...
N\t<dist N to 1>\t<dist N to 2>\t<dist N to 3>\t...\t0.0
</pre>

which might look like:

<pre>
Example distance matrix 0.1
1 /path/to/audio/file/1.wav
2 /path/to/audio/file/2.wav
3 /path/to/audio/file/3.wav
4 /path/to/audio/file/4.wav
Q/R 1 2 3 4
1 0.00000 1.24100 0.2e-4 0.42559
2 1.24100 0.00000 0.62640 0.23564
3 50.2e-4 0.62640 0.00000 0.38000
4 0.42559 0.23567 0.38000 0.00000
</pre>

===== Sparse Distance Matrix =====
If computation or exhaustive search is a concern or not a normal output of the indexing algorithm employed, the sparse distance matric format detailed below may be used:

A simple ASCII file listing a name for the algorithm and the top 100 search results for every track in the collection.

This file should start with a header line with a name for the algorithm and should be followed by the results for one query per line, prefixed by the filename portion of the query path. This should be followed by a tab character and a tab separated, ordered list of the top 100 search results. Each result should include the result filename (e.g. a034728.wav) and the distance (e.g. 17.1 or 0.23) separated by a a comma.

<pre>
MyAlgorithm (my.email@address.com)
<example 1 filename>\t<result 1 name>,<result 1 distance>,\t<result 2 name>,<result 2 distance>, ... \t<result 100 name>,<result 100 distance>
<example 1 filename>\t<result 1 name>,<result 1 distance>,\t<result 2 name>,<result 2 distance>, ... \t<result 100 name>,<result 100 distance>
...
</pre>

which might look like:

<pre>
MyAlgorithm (my.email@address.com)
a009342.wav b229311.wav,0.16 a023821.wav,0.19 a001329,0.24 ... etc.
a009343.wav a661931.wav,0.12 a043322.wav,0.17 c002346,0.21 ... etc.
a009347.wav a671239.wav,0.13 c112393.wav,0.20 b083293,0.25 ... etc.
...
</pre>

The path to which this list file should be written must be accepted as a parameter on the command line.

==== Example submission calling formats ====
extractFeatures.sh /path/to/scratch/folder /path/to/collectionListFile.txt
Query.sh /path/to/scratch/folder /path/to/collectionListFile.txt /path/to/outputResultsFile.txt

or

doAudioSim.sh -numThreads 8 /path/to/scratch/folder /path/to/collectionListFile.txt /path/to/outputResultsFile.txt

=== Packaging submissions ===
All submissions should be statically linked to all libraries (the presence of
dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the
information:

* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Time and hardware limits ==
Due to the potentially high number of particpants in this and other audio tasks,
hard limits on the runtime of submissions are specified.

A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

== Submission opening date ==
TBA

== Submission closing date ==
TBA