MIREX Wiki - User contributions [en]

2013:MIREX2013 Results

2013-11-30T18:54:58Z

J. Ashley Burgoyne: /* Other Tasks */

==OVERALL RESULTS POSTERS ==

This page is under construction.

[https://www.music-ir.org/mirex/results/2013/mirex_2013_poster.pdf MIREX 2013 Overall Results Posters (PDF)]

==Results by Task ==

===Train-Test Task Set===
* [https://www.music-ir.org/nema_out/mirex2013/results/act/composer_report/ Audio Classical Composer Identification Results ]  
* [https://www.music-ir.org/nema_out/mirex2013/results/act/latin_report/ Audio Latin Genre Classification Results ]  
* [https://www.music-ir.org/nema_out/mirex2013/results/act/mood_report/index.html Audio Music Mood Classification Results ]  
* [https://www.music-ir.org/nema_out/mirex2013/results/act/mixed_report/ Audio Mixed Popular Genre Classification Results ]  
===Other Tasks===

* Audio Beat Tracking Results
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/abt/dav/ DAV Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/abt/maz/ MAZ Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/abt/mck/ MCK Dataset]  
* Audio Chord Detection Results
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/ace/mrx09/index.html MIREX ’09 Dataset (old style)]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/ace/bill/index.html Billboard ’12 Dataset (old style)]  
** [[2013:Audio_Chord_Estimation_Results_MIREX_2009 | MIREX ’09 Dataset]]  
** [[2013:Audio_Chord_Estimation_Results_Billboard_2012 | Billboard ’12 Dataset]]  
** [[2013:Audio_Chord_Estimation_Results_Billboard_2013 | Billboard ’13 Dataset]]  
* [https://nema.lis.illinois.edu/nema_out/mirex2013/results/akd/ Audio Key Detection Results]  
* Audio Melody Extraction Results
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/ame/adc04/ ADC04 Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/ame/mrx05/ MIREX05 Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/ame/ind08/ INDIAN08 Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/ame/mrx09_0db/ MIREX09 0dB Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/ame/mrx09_m5db/ MIREX09 -5dB Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/ame/mrx09_p5db/ MIREX09 +5dB Dataset]  
* [[2013:Audio_Music_Similarity_and_Retrieval_Results | Audio Music Similarity and Retrieval Results]]
* [https://nema.lis.illinois.edu/nema_out/mirex2013/results/aod/ Audio Onset Detection Results]  
* Audio Tag Classification Results
** Major Miner Tag dataset
*** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/atg/subtask1_report/bin/ Binary relevance (classification evaluation)]  
*** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/atg/subtask1_report/aff/ Affinity estimation evaluation]  
** Mood Tag dataset
*** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/atg/subtask2_report/bin/ Binary relevance (classification evaluation)]  
*** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/atg/subtask2_report/aff/ Affinity estimation evaluation]  
* [https://nema.lis.illinois.edu/nema_out/mirex2013/results/ate/ Audio Tempo Estimation Results]  
* [[2013:Multiple_Fundamental_Frequency_Estimation_&_Tracking_Results | Multiple Fundamental Frequency Estimation & Tracking Results]]
* Music Structure Segmentation Results
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/struct/mrx09/ MIREX09 dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/struct/mrx10_1/ RWC dataset - Quaero (MIREX10) Ground-truth]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/struct/mrx10_2/ RWC dataset - Original RWC Ground-truth]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/struct/sal/ SALAMI dataset]  
* Query-by-Singing/Humming Results
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/qbsh/qbsh_task1_hidden/ Hidden Jang Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/qbsh/qbsh_task1a_jang/ Jang Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/qbsh/qbsh_task1b_thinkit/ ThinkIt Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/qbsh/qbsh_task1c_ioacas/ IOACAS Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/qbsh/qbsh_task2_jang/ Subtask2 Dataset]  
* Query-by-Tapping Results
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/qbt/qbt_task1_jang/ Jang Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/qbt/qbt_task1_hsiao/ HSIAO Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2013/results/qbt/qbt_task2_jang/ Subtask2 Dataset]  
*[[2013:Real-time_Audio_to_Score_Alignment_(a.k.a._Score_Following)_Results | Real-time Audio to Score Alignment (a.k.a. Score Following) Results ]]
* [[2013:Symbolic_Melodic_Similarity_Results | Symbolic Melodic Similarity Results]]
* [[2013:Discovery of Repeated Themes & Sections Results | Discovery of Repeated Themes & Sections Results]]
* [[2013:Audio Cover Song Identification Results]]

2013:Audio Chord Estimation Results Billboard 2013

2013-11-30T18:37:46Z

J. Ashley Burgoyne: Created page with "==Introduction== This year, we have started a new evaluation battery for audio chord estimation. This page contains the results of these new evaluations for a special subset of ..."

==Introduction==

This year, we have started a new evaluation battery for audio chord estimation. This page contains the results of these new evaluations for a special subset of the ''Billboard'' dataset from McGill University that has never been made available to the public. Further subsets have been withheld to support the ACE task through MIREX 2015.

==Why evaluate differently?==

* Researchers interested in automatic chord estimation have been dissatisfied with the traditional evaluation techniques used for this task at MIREX.

* Numerous alternatives have been proposed in the literature (Harte, 2010; Mauch, 2010; Pauwels & Peeters, 2013).

* At ISMIR 2010 in Utrecht, a group discussed alternatives and developed the [[The_Utrecht_Agreement_on_Chord_Evaluation | Utrecht Agreement]] for updating the task, but until this year, nobody had implemented any of the suggestions.

==What’s new?==

===More precise recall estimation===

* MIREX typically uses ''chord symbol recall'' (CSR) to estimate how well the predicted chords match the ground truth: the total duration of segments where the predictions match the ground truth divided by the total duration of the song.

* In previous years, MIREX has used an approximate CSR by sampling both the ground-truth and the automatic annotations every 10 ms.

* Following Harte (2010), we view the ground-truth and estimated annotations instead as continuous segmentations of the audio because (1) this is more precise and also (2) more computationally efficient.

* Moreover, because pieces of music come in a wide variety of lengths, we believe it is better to weight the CSR by the length of the song. This final number is referred to as the ''weighted chord symbol recall'' (WCSR).

===Advanced chord vocabularies===

* We computed WCSR with five different chord vocabulary mappings:
# Chord root note only;
# Major and minor;
# Seventh chords;
# Major and minor with inversions; and
# Seventh chords with inversions.

* With the exception of no-chords, calculating the vocabulary mapping involves examining the root note, the bass note, and the relative interval structure of the chord labels.

* A mapping exists if both the root notes and bass notes match, and the structure of the output label is the largest possible subset of the input label given the vocabulary.

* For instance, in the major and minor case, G:7(#9) is mapped to G:maj because the interval set of G:maj, {1,3,5}, is a subset of the interval set of the G:7(#9), {1,3,5,b7,#9}. In the seventh-chord case, G:7(#9) is mapped to G:7 instead because the interval set of G:7 {1, 3, 5, b7} is also a subset of G:7(#9) but is larger than G:maj.

* Our recommendations are motivated by the frequencies of chord qualities in the ''Billboard'' corpus of American popular music (Burgoyne et al., 2011).
{| class="wikitable"
|+ Most Frequent Chord Qualities in the ''Billboard'' Corpus
|-
! Quality
! Freq.
! Cum. Freq.
|-
| maj
| align="right"| 52
| align="right"| 52
|-
| min
| align="right"| 13
| align="right"| 65
|-
| 7
| align="right"| 10
| align="right"| 75
|-
| min7
| align="right"| 8
| align="right"| 83
|-
| maj7
| align="right"| 3
| align="right"| 86
|}

===Evaluation of segmentation===

* The chord transcription literature includes several other evaluation metrics, which mainly focus on the segmentation of the transcription.

* We propose to include the directional Hamming distance in the evaluation. The directional Hamming distance is calculated by finding for each annotated segment the maximally overlapping segment in the other annotation, and then summing the differences (Abdallah et al., 2005; Mauch, 2010).

* Depending on the order of application, the directional Hamming distance yields a measure of over- or under-segmentation. To keep the scaling consistent with WCSR values (1.0 is best and 0.0 is worst), we report 1 – over-segmentation and 1 – under-segmentation, as well as the harmonic mean of these values (cf. Harte, 2010).

===Comparative Statistics===

* ''coming soon...''

==Submissions==

{| class="wikitable"
!
! Abstract
! Contributors
|-
| CB3
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CB3.pdf PDF]
| Taemin Cho & Juan P. Bello
|-
| CB4
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CB4.pdf PDF]
| Taemin Cho & Juan P. Bello
|-
| CF2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CF2.pdf PDF]
| Chris Cannam, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, Mark Levy, Massimiliano Zanoni, Dan Stowell & Luís A. Figueira
|-
| KO1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/KO1.pdf PDF]
| Maksim Khadkevich & Maurizio Omologo
|-
| KO2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/KO2.pdf PDF]
| Maksim Khadkevich & Maurizio Omologo
|-
| NG1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NG1.pdf PDF]
| Nikolay Glazyrin
|-
| NG2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NG2.pdf PDF]
| Nikolay Glazyrin
|-
| NMSD1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NMSD1.pdf PDF]
| Yizhao Ni, Matt Mcvicar, Raul Santos-Rodriguez & Tijl De Bie
|-
| NMSD2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NMSD2.pdf PDF]
| Yizhao Ni, Matt Mcvicar, Raul Santos-Rodriguez & Tijl De Bie
|-
| PP3
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/PP3.pdf PDF]
| Johan Pauwels & Geoffroy Peeters
|-
| PP4
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/PP4.pdf PDF]
| Johan Pauwels & Geoffroy Peeters
|-
| SB8
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/SB8.pdf PDF]
| Nikolaas Steenbergen & John Ashley Burgoyne
|}

==Results==

===Summary===

All figures can be interpreted as percentages and range from 0 (worst) to 100 (best). The table is sorted on WCSR for the major-minor vocabulary. Algorithms that conducted training are marked with an asterisk; all others were submitted pre-trained.

<csv>2013/ace/billboard13.csv</csv>

===Comparative Statistics===

* ''coming soon...''

===Complete Results===

More detailed about the performance of the algorithms, including per-song performance and the breakdown of the WCSR calculations, is available from this archive:

* [https://music-ir.org/mirex/results/2013/ace/BillboardTest2013.zip BillboardTest2013.zip]

===Algorithmic Output===

The recognition output and the ground-truth files are available from this archive:

* [https://music-ir.org/mirex/results/2013/ace/BillboardTest2013Output.zip BillboardTest2013Output.zip]

We hope to generate a graphical comparison of all algorithms against the ground truth early in 2014.

2013:Audio Chord Estimation Results Billboard 2012

2013-11-30T18:33:45Z

==Introduction==

This year, we have started a new evaluation battery for audio chord estimation. This page contains the results of these new evaluations for an abridged version of the ''Billboard'' dataset from McGill University, including a representative sample of American popular music from the 1950s through the 1990s, as used for MIREX 2012.

==Why evaluate differently?==

* Researchers interested in automatic chord estimation have been dissatisfied with the traditional evaluation techniques used for this task at MIREX.

* Numerous alternatives have been proposed in the literature (Harte, 2010; Mauch, 2010; Pauwels & Peeters, 2013).

* At ISMIR 2010 in Utrecht, a group discussed alternatives and developed the [[The_Utrecht_Agreement_on_Chord_Evaluation | Utrecht Agreement]] for updating the task, but until this year, nobody had implemented any of the suggestions.

==What’s new?==

===More precise recall estimation===

* MIREX typically uses ''chord symbol recall'' (CSR) to estimate how well the predicted chords match the ground truth: the total duration of segments where the predictions match the ground truth divided by the total duration of the song.

* In previous years, MIREX has used an approximate CSR by sampling both the ground-truth and the automatic annotations every 10 ms.

* Following Harte (2010), we view the ground-truth and estimated annotations instead as continuous segmentations of the audio because (1) this is more precise and also (2) more computationally efficient.

* Moreover, because pieces of music come in a wide variety of lengths, we believe it is better to weight the CSR by the length of the song. This final number is referred to as the ''weighted chord symbol recall'' (WCSR).

===Advanced chord vocabularies===

* We computed WCSR with five different chord vocabulary mappings:
# Chord root note only;
# Major and minor;
# Seventh chords;
# Major and minor with inversions; and
# Seventh chords with inversions.

* With the exception of no-chords, calculating the vocabulary mapping involves examining the root note, the bass note, and the relative interval structure of the chord labels.

* A mapping exists if both the root notes and bass notes match, and the structure of the output label is the largest possible subset of the input label given the vocabulary.

* For instance, in the major and minor case, G:7(#9) is mapped to G:maj because the interval set of G:maj, {1,3,5}, is a subset of the interval set of the G:7(#9), {1,3,5,b7,#9}. In the seventh-chord case, G:7(#9) is mapped to G:7 instead because the interval set of G:7 {1, 3, 5, b7} is also a subset of G:7(#9) but is larger than G:maj.

* Our recommendations are motivated by the frequencies of chord qualities in the ''Billboard'' corpus of American popular music (Burgoyne et al., 2011).
{| class="wikitable"
|+ Most Frequent Chord Qualities in the ''Billboard'' Corpus
|-
! Quality
! Freq.
! Cum. Freq.
|-
| maj
| align="right"| 52
| align="right"| 52
|-
| min
| align="right"| 13
| align="right"| 65
|-
| 7
| align="right"| 10
| align="right"| 75
|-
| min7
| align="right"| 8
| align="right"| 83
|-
| maj7
| align="right"| 3
| align="right"| 86
|}

===Evaluation of segmentation===

* The chord transcription literature includes several other evaluation metrics, which mainly focus on the segmentation of the transcription.

* We propose to include the directional Hamming distance in the evaluation. The directional Hamming distance is calculated by finding for each annotated segment the maximally overlapping segment in the other annotation, and then summing the differences (Abdallah et al., 2005; Mauch, 2010).

* Depending on the order of application, the directional Hamming distance yields a measure of over- or under-segmentation. To keep the scaling consistent with WCSR values (1.0 is best and 0.0 is worst), we report 1 – over-segmentation and 1 – under-segmentation, as well as the harmonic mean of these values (cf. Harte, 2010).

===Comparative Statistics===

* ''coming soon...''

==Submissions==

{| class="wikitable"
!
! Abstract
! Contributors
|-
| CB3
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CB3.pdf PDF]
| Taemin Cho & Juan P. Bello
|-
| CB4
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CB4.pdf PDF]
| Taemin Cho & Juan P. Bello
|-
| CF2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CF2.pdf PDF]
| Chris Cannam, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, Mark Levy, Massimiliano Zanoni, Dan Stowell & Luís A. Figueira
|-
| KO1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/KO1.pdf PDF]
| Maksim Khadkevich & Maurizio Omologo
|-
| KO2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/KO2.pdf PDF]
| Maksim Khadkevich & Maurizio Omologo
|-
| NG1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NG1.pdf PDF]
| Nikolay Glazyrin
|-
| NG2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NG2.pdf PDF]
| Nikolay Glazyrin
|-
| NMSD1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NMSD1.pdf PDF]
| Yizhao Ni, Matt Mcvicar, Raul Santos-Rodriguez & Tijl De Bie
|-
| NMSD2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NMSD2.pdf PDF]
| Yizhao Ni, Matt Mcvicar, Raul Santos-Rodriguez & Tijl De Bie
|-
| PP3
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/PP3.pdf PDF]
| Johan Pauwels & Geoffroy Peeters
|-
| PP4
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/PP4.pdf PDF]
| Johan Pauwels & Geoffroy Peeters
|-
| SB8
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/SB8.pdf PDF]
| Nikolaas Steenbergen & John Ashley Burgoyne
|}

==Results==

===Summary===

All figures can be interpreted as percentages and range from 0 (worst) to 100 (best). The table is sorted on WCSR for the major-minor vocabulary. Algorithms that conducted training are marked with an asterisk; all others were submitted pre-trained.

<csv>2013/ace/billboard12.csv</csv>

===Comparative Statistics===

* ''coming soon...''

===Complete Results===

More detailed about the performance of the algorithms, including per-song performance and the breakdown of the WCSR calculations, is available from this archive:

* [https://music-ir.org/mirex/results/2013/ace/BillboardTest2012.zip BillboardTest2012.zip]

===Algorithmic Output===

The recognition output and the ground-truth files are available from this archive:

* [https://music-ir.org/mirex/results/2013/ace/BillboardTest2012Output.zip BillboardTest2012Output.zip]

We hope to generate a graphical comparison of all algorithms against the ground truth early in 2014.

2013:Audio Chord Estimation Results MIREX 2009

2013-11-30T18:29:38Z

J. Ashley Burgoyne: /* Algorithmic Output */

==Introduction==

This year, we have started a new evaluation battery for audio chord estimation. This page contains the results of these new evaluations for the Isophonics dataset, a.k.a. the MIREX 2009 dataset. It comprises the collected Beatles, Queen, and Zweieck datasets from Queen Mary, University of London, and has been used for audio chord estimation in MIREX for many years.

==Why evaluate differently?==

* Researchers interested in automatic chord estimation have been dissatisfied with the traditional evaluation techniques used for this task at MIREX.

* Numerous alternatives have been proposed in the literature (Harte, 2010; Mauch, 2010; Pauwels & Peeters, 2013).

* At ISMIR 2010 in Utrecht, a group discussed alternatives and developed the [[The_Utrecht_Agreement_on_Chord_Evaluation | Utrecht Agreement]] for updating the task, but until this year, nobody had implemented any of the suggestions.

==What’s new?==

===More precise recall estimation===

* MIREX typically uses ''chord symbol recall'' (CSR) to estimate how well the predicted chords match the ground truth: the total duration of segments where the predictions match the ground truth divided by the total duration of the song.

* In previous years, MIREX has used an approximate CSR by sampling both the ground-truth and the automatic annotations every 10 ms.

* Following Harte (2010), we view the ground-truth and estimated annotations instead as continuous segmentations of the audio because (1) this is more precise and also (2) more computationally efficient.

* Moreover, because pieces of music come in a wide variety of lengths, we believe it is better to weight the CSR by the length of the song. This final number is referred to as the ''weighted chord symbol recall'' (WCSR).

===Advanced chord vocabularies===

* We computed WCSR with five different chord vocabulary mappings:
# Chord root note only;
# Major and minor;
# Seventh chords;
# Major and minor with inversions; and
# Seventh chords with inversions.

* With the exception of no-chords, calculating the vocabulary mapping involves examining the root note, the bass note, and the relative interval structure of the chord labels.

* A mapping exists if both the root notes and bass notes match, and the structure of the output label is the largest possible subset of the input label given the vocabulary.

* For instance, in the major and minor case, G:7(#9) is mapped to G:maj because the interval set of G:maj, {1,3,5}, is a subset of the interval set of the G:7(#9), {1,3,5,b7,#9}. In the seventh-chord case, G:7(#9) is mapped to G:7 instead because the interval set of G:7 {1, 3, 5, b7} is also a subset of G:7(#9) but is larger than G:maj.

* Our recommendations are motivated by the frequencies of chord qualities in the ''Billboard'' corpus of American popular music (Burgoyne et al., 2011).
{| class="wikitable"
|+ Most Frequent Chord Qualities in the ''Billboard'' Corpus
|-
! Quality
! Freq.
! Cum. Freq.
|-
| maj
| align="right"| 52
| align="right"| 52
|-
| min
| align="right"| 13
| align="right"| 65
|-
| 7
| align="right"| 10
| align="right"| 75
|-
| min7
| align="right"| 8
| align="right"| 83
|-
| maj7
| align="right"| 3
| align="right"| 86
|}

===Evaluation of segmentation===

* The chord transcription literature includes several other evaluation metrics, which mainly focus on the segmentation of the transcription.

* We propose to include the directional Hamming distance in the evaluation. The directional Hamming distance is calculated by finding for each annotated segment the maximally overlapping segment in the other annotation, and then summing the differences (Abdallah et al., 2005; Mauch, 2010).

* Depending on the order of application, the directional Hamming distance yields a measure of over- or under-segmentation. To keep the scaling consistent with WCSR values (1.0 is best and 0.0 is worst), we report 1 – over-segmentation and 1 – under-segmentation, as well as the harmonic mean of these values (cf. Harte, 2010).

===Comparative Statistics===

* ''coming soon...''

==Submissions==

{| class="wikitable"
!
! Abstract
! Contributors
|-
| CB3
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CB3.pdf PDF]
| Taemin Cho & Juan P. Bello
|-
| CB4
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CB4.pdf PDF]
| Taemin Cho & Juan P. Bello
|-
| CF2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CF2.pdf PDF]
| Chris Cannam, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, Mark Levy, Massimiliano Zanoni, Dan Stowell & Luís A. Figueira
|-
| KO1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/KO1.pdf PDF]
| Maksim Khadkevich & Maurizio Omologo
|-
| KO2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/KO2.pdf PDF]
| Maksim Khadkevich & Maurizio Omologo
|-
| NG1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NG1.pdf PDF]
| Nikolay Glazyrin
|-
| NG2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NG2.pdf PDF]
| Nikolay Glazyrin
|-
| NMSD1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NMSD1.pdf PDF]
| Yizhao Ni, Matt Mcvicar, Raul Santos-Rodriguez & Tijl De Bie
|-
| NMSD2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NMSD2.pdf PDF]
| Yizhao Ni, Matt Mcvicar, Raul Santos-Rodriguez & Tijl De Bie
|-
| PP3
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/PP3.pdf PDF]
| Johan Pauwels & Geoffroy Peeters
|-
| PP4
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/PP4.pdf PDF]
| Johan Pauwels & Geoffroy Peeters
|-
| SB8
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/SB8.pdf PDF]
| Nikolaas Steenbergen & John Ashley Burgoyne
|}

==Results==

===Summary===

All figures can be interpreted as percentages and range from 0 (worst) to 100 (best). The table is sorted on WCSR for the major-minor vocabulary. Algorithms that conducted training are marked with an asterisk; all others were submitted pre-trained.

<csv>2013/ace/mirex09.csv</csv>

===Comparative Statistics===

* ''coming soon...''

===Complete Results===

More detailed about the performance of the algorithms, including per-song performance and the breakdown of the WCSR calculations, is available from this archive:

* [https://music-ir.org/mirex/results/2013/ace/MirexChord2009.zip MirexChord2009.zip]

===Algorithmic Output===

The recognition output and the ground-truth files are available from this archive:

* [https://music-ir.org/mirex/results/2013/ace/MirexChord2009Output.zip MirexChord2009Output.zip]

We hope to generate a graphical comparison of all algorithms against the ground truth early in 2014.

2013:Audio Chord Estimation Results MIREX 2009

2013-11-30T18:25:40Z

J. Ashley Burgoyne: /* Complete Results */

2013:Audio Chord Estimation Results MIREX 2009

2013-11-30T18:24:21Z

J. Ashley Burgoyne: /* Complete Results */

==Introduction==

This year, we have started a new evaluation battery for audio chord estimation. This page contains the results of these new evaluations for the Isophonics dataset, a.k.a. the MIREX 2009 dataset. It comprises the collected Beatles, Queen, and Zweieck datasets from Queen Mary, University of London, and has been used for audio chord estimation in MIREX for many years.

==Why evaluate differently?==

* Researchers interested in automatic chord estimation have been dissatisfied with the traditional evaluation techniques used for this task at MIREX.

* Numerous alternatives have been proposed in the literature (Harte, 2010; Mauch, 2010; Pauwels & Peeters, 2013).

* At ISMIR 2010 in Utrecht, a group discussed alternatives and developed the [[The_Utrecht_Agreement_on_Chord_Evaluation | Utrecht Agreement]] for updating the task, but until this year, nobody had implemented any of the suggestions.

==What’s new?==

===More precise recall estimation===

* MIREX typically uses ''chord symbol recall'' (CSR) to estimate how well the predicted chords match the ground truth: the total duration of segments where the predictions match the ground truth divided by the total duration of the song.

* In previous years, MIREX has used an approximate CSR by sampling both the ground-truth and the automatic annotations every 10 ms.

* Following Harte (2010), we view the ground-truth and estimated annotations instead as continuous segmentations of the audio because (1) this is more precise and also (2) more computationally efficient.

* Moreover, because pieces of music come in a wide variety of lengths, we believe it is better to weight the CSR by the length of the song. This final number is referred to as the ''weighted chord symbol recall'' (WCSR).

===Advanced chord vocabularies===

* We computed WCSR with five different chord vocabulary mappings:
# Chord root note only;
# Major and minor;
# Seventh chords;
# Major and minor with inversions; and
# Seventh chords with inversions.

* With the exception of no-chords, calculating the vocabulary mapping involves examining the root note, the bass note, and the relative interval structure of the chord labels.

* A mapping exists if both the root notes and bass notes match, and the structure of the output label is the largest possible subset of the input label given the vocabulary.

* For instance, in the major and minor case, G:7(#9) is mapped to G:maj because the interval set of G:maj, {1,3,5}, is a subset of the interval set of the G:7(#9), {1,3,5,b7,#9}. In the seventh-chord case, G:7(#9) is mapped to G:7 instead because the interval set of G:7 {1, 3, 5, b7} is also a subset of G:7(#9) but is larger than G:maj.

* Our recommendations are motivated by the frequencies of chord qualities in the ''Billboard'' corpus of American popular music (Burgoyne et al., 2011).
{| class="wikitable"
|+ Most Frequent Chord Qualities in the ''Billboard'' Corpus
|-
! Quality
! Freq.
! Cum. Freq.
|-
| maj
| align="right"| 52
| align="right"| 52
|-
| min
| align="right"| 13
| align="right"| 65
|-
| 7
| align="right"| 10
| align="right"| 75
|-
| min7
| align="right"| 8
| align="right"| 83
|-
| maj7
| align="right"| 3
| align="right"| 86
|}

===Evaluation of segmentation===

* The chord transcription literature includes several other evaluation metrics, which mainly focus on the segmentation of the transcription.

* We propose to include the directional Hamming distance in the evaluation. The directional Hamming distance is calculated by finding for each annotated segment the maximally overlapping segment in the other annotation, and then summing the differences (Abdallah et al., 2005; Mauch, 2010).

* Depending on the order of application, the directional Hamming distance yields a measure of over- or under-segmentation. To keep the scaling consistent with WCSR values (1.0 is best and 0.0 is worst), we report 1 – over-segmentation and 1 – under-segmentation, as well as the harmonic mean of these values (cf. Harte, 2010).

===Comparative Statistics===

* ''coming soon...''

==Submissions==

{| class="wikitable"
!
! Abstract
! Contributors
|-
| CB3
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CB3.pdf PDF]
| Taemin Cho & Juan P. Bello
|-
| CB4
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CB4.pdf PDF]
| Taemin Cho & Juan P. Bello
|-
| CF2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CF2.pdf PDF]
| Chris Cannam, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, Mark Levy, Massimiliano Zanoni, Dan Stowell & Luís A. Figueira
|-
| KO1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/KO1.pdf PDF]
| Maksim Khadkevich & Maurizio Omologo
|-
| KO2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/KO2.pdf PDF]
| Maksim Khadkevich & Maurizio Omologo
|-
| NG1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NG1.pdf PDF]
| Nikolay Glazyrin
|-
| NG2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NG2.pdf PDF]
| Nikolay Glazyrin
|-
| NMSD1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NMSD1.pdf PDF]
| Yizhao Ni, Matt Mcvicar, Raul Santos-Rodriguez & Tijl De Bie
|-
| NMSD2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NMSD2.pdf PDF]
| Yizhao Ni, Matt Mcvicar, Raul Santos-Rodriguez & Tijl De Bie
|-
| PP3
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/PP3.pdf PDF]
| Johan Pauwels & Geoffroy Peeters
|-
| PP4
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/PP4.pdf PDF]
| Johan Pauwels & Geoffroy Peeters
|-
| SB8
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/SB8.pdf PDF]
| Nikolaas Steenbergen & John Ashley Burgoyne
|}

==Results==

===Summary===

All figures can be interpreted as percentages and range from 0 (worst) to 100 (best). The table is sorted on WCSR for the major-minor vocabulary. Algorithms that conducted training are marked with an asterisk; all others were submitted pre-trained.

<csv>2013/ace/mirex09.csv</csv>

===Comparative Statistics===

* ''coming soon...''

===Complete Results===

More detailed about the performance of the algorithms, including per-song performance and the breakdown of the WCSR calculations, is available from this archive:

* [[http://

===Algorithmic Output===

* ''coming soon...''

2013:Audio Chord Estimation Results MIREX 2009

2013-11-30T18:08:02Z

J. Ashley Burgoyne: /* Advanced chord vocabularies */

==Introduction==

This year, we have started a new evaluation battery for audio chord estimation. This page contains the results of these new evaluations for the Isophonics dataset, a.k.a. the MIREX 2009 dataset. It comprises the collected Beatles, Queen, and Zweieck datasets from Queen Mary, University of London, and has been used for audio chord estimation in MIREX for many years.

==Why evaluate differently?==

* Researchers interested in automatic chord estimation have been dissatisfied with the traditional evaluation techniques used for this task at MIREX.

* Numerous alternatives have been proposed in the literature (Harte, 2010; Mauch, 2010; Pauwels & Peeters, 2013).

* At ISMIR 2010 in Utrecht, a group discussed alternatives and developed the [[The_Utrecht_Agreement_on_Chord_Evaluation | Utrecht Agreement]] for updating the task, but until this year, nobody had implemented any of the suggestions.

==What’s new?==

===More precise recall estimation===

* MIREX typically uses ''chord symbol recall'' (CSR) to estimate how well the predicted chords match the ground truth: the total duration of segments where the predictions match the ground truth divided by the total duration of the song.

* In previous years, MIREX has used an approximate CSR by sampling both the ground-truth and the automatic annotations every 10 ms.

* Following Harte (2010), we view the ground-truth and estimated annotations instead as continuous segmentations of the audio because (1) this is more precise and also (2) more computationally efficient.

* Moreover, because pieces of music come in a wide variety of lengths, we believe it is better to weight the CSR by the length of the song. This final number is referred to as the ''weighted chord symbol recall'' (WCSR).

===Advanced chord vocabularies===

* We computed WCSR with five different chord vocabulary mappings:
# Chord root note only;
# Major and minor;
# Seventh chords;
# Major and minor with inversions; and
# Seventh chords with inversions.

* With the exception of no-chords, calculating the vocabulary mapping involves examining the root note, the bass note, and the relative interval structure of the chord labels.

* A mapping exists if both the root notes and bass notes match, and the structure of the output label is the largest possible subset of the input label given the vocabulary.

* For instance, in the major and minor case, G:7(#9) is mapped to G:maj because the interval set of G:maj, {1,3,5}, is a subset of the interval set of the G:7(#9), {1,3,5,b7,#9}. In the seventh-chord case, G:7(#9) is mapped to G:7 instead because the interval set of G:7 {1, 3, 5, b7} is also a subset of G:7(#9) but is larger than G:maj.

* Our recommendations are motivated by the frequencies of chord qualities in the ''Billboard'' corpus of American popular music (Burgoyne et al., 2011).
{| class="wikitable"
|+ Most Frequent Chord Qualities in the ''Billboard'' Corpus
|-
! Quality
! Freq.
! Cum. Freq.
|-
| maj
| align="right"| 52
| align="right"| 52
|-
| min
| align="right"| 13
| align="right"| 65
|-
| 7
| align="right"| 10
| align="right"| 75
|-
| min7
| align="right"| 8
| align="right"| 83
|-
| maj7
| align="right"| 3
| align="right"| 86
|}

===Evaluation of segmentation===

* The chord transcription literature includes several other evaluation metrics, which mainly focus on the segmentation of the transcription.

* We propose to include the directional Hamming distance in the evaluation. The directional Hamming distance is calculated by finding for each annotated segment the maximally overlapping segment in the other annotation, and then summing the differences (Abdallah et al., 2005; Mauch, 2010).

* Depending on the order of application, the directional Hamming distance yields a measure of over- or under-segmentation. To keep the scaling consistent with WCSR values (1.0 is best and 0.0 is worst), we report 1 – over-segmentation and 1 – under-segmentation, as well as the harmonic mean of these values (cf. Harte, 2010).

===Comparative Statistics===

* ''coming soon...''

==Submissions==

{| class="wikitable"
!
! Abstract
! Contributors
|-
| CB3
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CB3.pdf PDF]
| Taemin Cho & Juan P. Bello
|-
| CB4
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CB4.pdf PDF]
| Taemin Cho & Juan P. Bello
|-
| CF2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CF2.pdf PDF]
| Chris Cannam, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, Mark Levy, Massimiliano Zanoni, Dan Stowell & Luís A. Figueira
|-
| KO1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/KO1.pdf PDF]
| Maksim Khadkevich & Maurizio Omologo
|-
| KO2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/KO2.pdf PDF]
| Maksim Khadkevich & Maurizio Omologo
|-
| NG1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NG1.pdf PDF]
| Nikolay Glazyrin
|-
| NG2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NG2.pdf PDF]
| Nikolay Glazyrin
|-
| NMSD1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NMSD1.pdf PDF]
| Yizhao Ni, Matt Mcvicar, Raul Santos-Rodriguez & Tijl De Bie
|-
| NMSD2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NMSD2.pdf PDF]
| Yizhao Ni, Matt Mcvicar, Raul Santos-Rodriguez & Tijl De Bie
|-
| PP3
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/PP3.pdf PDF]
| Johan Pauwels & Geoffroy Peeters
|-
| PP4
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/PP4.pdf PDF]
| Johan Pauwels & Geoffroy Peeters
|-
| SB8
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/SB8.pdf PDF]
| Nikolaas Steenbergen & John Ashley Burgoyne
|}

==Results==

===Summary===

All figures can be interpreted as percentages and range from 0 (worst) to 100 (best). The table is sorted on WCSR for the major-minor vocabulary. Algorithms that conducted training are marked with an asterisk; all others were submitted pre-trained.

<csv>2013/ace/mirex09.csv</csv>

===Comparative Statistics===

* ''coming soon...''

===Complete Results===

* ''coming soon...''

===Algorithmic Output===

* ''coming soon...''

2013:Audio Chord Estimation Results MIREX 2009

2013-11-30T18:07:35Z

J. Ashley Burgoyne: /* Submissions */

==Introduction==

This year, we have started a new evaluation battery for audio chord estimation. This page contains the results of these new evaluations for the Isophonics dataset, a.k.a. the MIREX 2009 dataset. It comprises the collected Beatles, Queen, and Zweieck datasets from Queen Mary, University of London, and has been used for audio chord estimation in MIREX for many years.

==Why evaluate differently?==

* Researchers interested in automatic chord estimation have been dissatisfied with the traditional evaluation techniques used for this task at MIREX.

* Numerous alternatives have been proposed in the literature (Harte, 2010; Mauch, 2010; Pauwels & Peeters, 2013).

* At ISMIR 2010 in Utrecht, a group discussed alternatives and developed the [[The_Utrecht_Agreement_on_Chord_Evaluation | Utrecht Agreement]] for updating the task, but until this year, nobody had implemented any of the suggestions.

==What’s new?==

===More precise recall estimation===

* MIREX typically uses ''chord symbol recall'' (CSR) to estimate how well the predicted chords match the ground truth: the total duration of segments where the predictions match the ground truth divided by the total duration of the song.

* In previous years, MIREX has used an approximate CSR by sampling both the ground-truth and the automatic annotations every 10 ms.

* Following Harte (2010), we view the ground-truth and estimated annotations instead as continuous segmentations of the audio because (1) this is more precise and also (2) more computationally efficient.

* Moreover, because pieces of music come in a wide variety of lengths, we believe it is better to weight the CSR by the length of the song. This final number is referred to as the ''weighted chord symbol recall'' (WCSR).

===Advanced chord vocabularies===

* We computed WCSR with five different chord vocabulary mappings:
# Chord root note only;
# Major and minor;
# Seventh chords;
# Major and minor with inversions; and
# Seventh chords with inversions.

* With the exception of no-chords, calculating the vocabulary mapping involves examining the root note, the bass note, and the relative interval structure of the chord labels.

* A mapping exists if both the root notes and bass notes match, and the structure of the output label is the largest possible subset of the input label given the vocabulary.

* For instance, in the major and minor case, G:7(#9) is mapped to G:maj because the interval set of G:maj, {1,3,5}, is a subset of the interval set of the G:7(#9), {1,3,5,b7,#9}. In the seventh-chord case, G:7(#9) is mapped to G:7 instead because the interval set of G:7 {1, 3, 5, b7} is also a subset of G:7(#9) but is larger than G:maj.

* Our recommendations are motivated by the frequencies of chord qualities in the ''Billboard'' corpus of American popular music (Burgoyne et al., 2011).
{| class="wikitable"
|+ Most Frequent Chord Qualities in the ''Billboard'' Corpus
|- style="background: yellow"
! Quality
! Freq.
! Cum. Freq.
|-
| maj
| align="right"| 52
| align="right"| 52
|-
| min
| align="right"| 13
| align="right"| 65
|-
| 7
| align="right"| 10
| align="right"| 75
|-
| min7
| align="right"| 8
| align="right"| 83
|-
| maj7
| align="right"| 3
| align="right"| 86
|}

===Evaluation of segmentation===

* The chord transcription literature includes several other evaluation metrics, which mainly focus on the segmentation of the transcription.

* We propose to include the directional Hamming distance in the evaluation. The directional Hamming distance is calculated by finding for each annotated segment the maximally overlapping segment in the other annotation, and then summing the differences (Abdallah et al., 2005; Mauch, 2010).

* Depending on the order of application, the directional Hamming distance yields a measure of over- or under-segmentation. To keep the scaling consistent with WCSR values (1.0 is best and 0.0 is worst), we report 1 – over-segmentation and 1 – under-segmentation, as well as the harmonic mean of these values (cf. Harte, 2010).

===Comparative Statistics===

* ''coming soon...''

==Submissions==

{| class="wikitable"
!
! Abstract
! Contributors
|-
| CB3
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CB3.pdf PDF]
| Taemin Cho & Juan P. Bello
|-
| CB4
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CB4.pdf PDF]
| Taemin Cho & Juan P. Bello
|-
| CF2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/CF2.pdf PDF]
| Chris Cannam, Matthias Mauch, Matthew E. P. Davies, Simon Dixon, Christian Landone, Katy Noland, Mark Levy, Massimiliano Zanoni, Dan Stowell & Luís A. Figueira
|-
| KO1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/KO1.pdf PDF]
| Maksim Khadkevich & Maurizio Omologo
|-
| KO2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/KO2.pdf PDF]
| Maksim Khadkevich & Maurizio Omologo
|-
| NG1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NG1.pdf PDF]
| Nikolay Glazyrin
|-
| NG2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NG2.pdf PDF]
| Nikolay Glazyrin
|-
| NMSD1
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NMSD1.pdf PDF]
| Yizhao Ni, Matt Mcvicar, Raul Santos-Rodriguez & Tijl De Bie
|-
| NMSD2
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/NMSD2.pdf PDF]
| Yizhao Ni, Matt Mcvicar, Raul Santos-Rodriguez & Tijl De Bie
|-
| PP3
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/PP3.pdf PDF]
| Johan Pauwels & Geoffroy Peeters
|-
| PP4
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/PP4.pdf PDF]
| Johan Pauwels & Geoffroy Peeters
|-
| SB8
| style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2013/SB8.pdf PDF]
| Nikolaas Steenbergen & John Ashley Burgoyne
|}

==Results==

===Summary===

All figures can be interpreted as percentages and range from 0 (worst) to 100 (best). The table is sorted on WCSR for the major-minor vocabulary. Algorithms that conducted training are marked with an asterisk; all others were submitted pre-trained.

<csv>2013/ace/mirex09.csv</csv>

===Comparative Statistics===

* ''coming soon...''

===Complete Results===

* ''coming soon...''

===Algorithmic Output===

* ''coming soon...''

2013:Audio Chord Estimation Results MIREX 2009

2013-11-30T17:42:53Z