2019:Multiple Fundamental Frequency Estimation & Tracking Results - Su Dataset

From MIREX Wiki

Introduction

Since 2015, a newly annotated polyphonic dataset has been added to this task. This dataset contains a wider range of real-world music in comparison to the old dataset used from 2009. Specifically, the new dataset contains 3 clips of piano solo, 3 clips of string quartet, 2 clips of piano quintet, and 2 clips of violin sonata (violin with piano accompaniment), all of which are selected from real-world recordings. The length of each clip is between 20 and 30 seconds. The dataset is annotated by the method described in the following paper:

Li Su and Yi-Hsuan Yang, "Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription," in Int. Symp. Computer Music Multidisciplinary Research (CMMR), June 2015.

As also mentioned in the paper, we tried our best to calibrate the errors (mostly the mismatch between onset and offset time stamps) in the preliminary annotation by human labor. Since there are still potential errors of annotation that we didn’t find, we decide to make the data and the annotation publicly available after the announcement of MIREX result this year. Specifically, we encourage every participant to help us check the annotation. The result of each competing algorithm will be updated based on the revised annotation. We hope that this can let the participants get more detailed information about the behaviors of the algorithm performing on the dataset. Moreover, in this way we can join our efforts to create a better dataset for the research on multiple-F0 estimation and tracking.

General Legend

Sub code Submission name Abstract Contributors
AR2 qhear PDF Anton Runov
BK1 (piano subtask) PianoTranscriptor.2019 PDF Sebastian Böck, Rainer Kelz
CB1 Silvet PDF Chris Cannam, Emmanouil Benetos
CB2 Silvet Live PDF Chris Cannam, Emmanouil Benetos
HH2 (task1) mffet v.1 PDF Huang Hsiang-Yu
KN3 (piano subtask) AR_BEAM PDF Taegyun Kwon, Juhan Nam, Dasaem Jung
KNJ1 (piano subtask) AR_SIMPLE PDF Taegyun Kwon, Juhan Nam, Dasaem Jung
KY1 (task2) MetaAI_Ensemble_SingleDomain PDF Changhyun Kim, Sangeon Yong
KY2 (task2) MetaAI_SingleModel_MultiDomain PDF Changhyun Kim, Sangeon Yong
SBJ1-4 (task1) SBJ1-4 PDF Peter Steiner, Peter Birkholz, Azarakhsh Jalalvand
YK1 (task2) MetaAI_Ensemble_SingleDomain PDF Sangeon Yong, Changhyun Kim
YK2 (task2) MetaAI_SingleModel_SingleDomain PDF Sangeon Yong, Changhyun Kim

Task 1: Multiple Fundamental Frequency Estimation (MF0E)

MF0E Overall Summary Results

Detailed Results

Precision Recall Accuracy Etot Esubs Emiss Efa
AR2 0.631 0.429 0.406 0.628 0.211 0.360 0.057
CB1 0.617 0.236 0.234 0.773 0.150 0.614 0.009
CB2 0.586 0.224 0.221 0.788 0.162 0.614 0.011
HH2 0.065 0.062 0.053 1.086 0.745 0.193 0.148
SBJ1 0.683 0.399 0.385 0.637 0.153 0.448 0.036
SBJ2 0.689 0.436 0.419 0.604 0.159 0.404 0.040
SBJ3 0.647 0.465 0.439 0.594 0.201 0.334 0.059
SBJ4 0.652 0.483 0.455 0.578 0.200 0.317 0.062

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Accuracy Etot Esubs Emiss Efa
AR2 0.722 0.492 0.465 0.565 0.149 0.360 0.057
CB1 0.735 0.284 0.281 0.725 0.102 0.614 0.009
CB2 0.736 0.284 0.280 0.728 0.102 0.614 0.011
HH2 0.159 0.154 0.130 0.995 0.654 0.193 0.148
SBJ1 0.752 0.438 0.423 0.598 0.113 0.448 0.036
SBJ2 0.753 0.476 0.457 0.564 0.120 0.404 0.040
SBJ3 0.718 0.515 0.487 0.544 0.151 0.334 0.059
SBJ4 0.719 0.531 0.501 0.530 0.152 0.317 0.062

download these results as csv

Individual Results Files for Task 1

AR2= Anton Runov
CB1= Chris Cannam, Emmanouil Benetos
CB2= Chris Cannam, Emmanouil Benetos
HH2= Huang Hsiang-Yu
SBJ1= Peter Steiner, Peter Birkholz, Azarakhsh Jalalvand
SBJ2= Peter Steiner, Peter Birkholz, Azarakhsh Jalalvand
SBJ3= Peter Steiner, Peter Birkholz, Azarakhsh Jalalvand
SBJ4= Peter Steiner, Peter Birkholz, Azarakhsh Jalalvand


Info about the filenames

The first two letters of the filename represent the music type:

PQ = piano quintet, PS = piano solo, SQ = string quartet, VS = violin sonata (with piano accompaniment)

Run Times

Friedman tests for Multiple Fundamental Frequency Estimation (MF0E)

The Friedman test was run in MATLAB to test significant differences amongst systems with regard to the performance (accuracy) on individual files.

Tukey-Kramer HSD Multi-Comparison

TeamID TeamID Lowerbound Mean Upperbound Significance
SBJ4 SBJ3 -2.2202 1.1000 4.4202 FALSE
SBJ4 SBJ2 -1.3202 2.0000 5.3202 FALSE
SBJ4 AR2 -1.0202 2.3000 5.6202 FALSE
SBJ4 SBJ1 0.1798 3.5000 6.8202 TRUE
SBJ4 CB1 1.2798 4.6000 7.9202 TRUE
SBJ4 CB2 2.0798 5.4000 8.7202 TRUE
SBJ4 HH2 3.3798 6.7000 10.0202 TRUE
SBJ3 SBJ2 -2.4202 0.9000 4.2202 FALSE
SBJ3 AR2 -2.1202 1.2000 4.5202 FALSE
SBJ3 SBJ1 -0.9202 2.4000 5.7202 FALSE
SBJ3 CB1 0.1798 3.5000 6.8202 TRUE
SBJ3 CB2 0.9798 4.3000 7.6202 TRUE
SBJ3 HH2 2.2798 5.6000 8.9202 TRUE
SBJ2 AR2 -3.0202 0.3000 3.6202 FALSE
SBJ2 SBJ1 -1.8202 1.5000 4.8202 FALSE
SBJ2 CB1 -0.7202 2.6000 5.9202 FALSE
SBJ2 CB2 0.0798 3.4000 6.7202 TRUE
SBJ2 HH2 1.3798 4.7000 8.0202 TRUE
AR2 SBJ1 -2.1202 1.2000 4.5202 FALSE
AR2 CB1 -1.0202 2.3000 5.6202 FALSE
AR2 CB2 -0.2202 3.1000 6.4202 FALSE
AR2 HH2 1.0798 4.4000 7.7202 TRUE
SBJ1 CB1 -2.2202 1.1000 4.4202 FALSE
SBJ1 CB2 -1.4202 1.9000 5.2202 FALSE
SBJ1 HH2 -0.1202 3.2000 6.5202 FALSE
CB1 CB2 -2.5202 0.8000 4.1202 FALSE
CB1 HH2 -1.2202 2.1000 5.4202 FALSE
CB2 HH2 -2.0202 1.3000 4.6202 FALSE

download these results as csv

2019 Su Accuracy Per Song Friedman Mean Rankstask1.friedman.Friedman Mean Ranks.png

Task 2:Note Tracking (NT)

NT Mixed Set Overall Summary Results

This subtask is evaluated in two different ways. In the first setup , a returned note is assumed correct if its onset is within +-50ms of a ref note and its F0 is within +- quarter tone of the corresponding reference note, ignoring the returned offset values. In the second setup, on top of the above requirements, a correct returned note is required to have an offset value within 20% of the ref notes duration around the ref note`s offset, or within 50ms whichever is larger.

AR2 CB1 CB2 KY1 KY2 YK1 YK2
Ave. F-Measure Onset-Offset 0.0856 0.0614 0.0491 0.1262 0.1175 0.1317 0.1235
Ave. F-Measure Onset Only 0.2742 0.2280 0.1653 0.3979 0.4047 0.4422 0.4412
Ave. F-Measure Chroma 0.1018 0.0771 0.0707 0.1420 0.1345 0.1524 0.1489
Ave. F-Measure Onset Only Chroma 0.3041 0.2676 0.2089 0.4141 0.4233 0.4682 0.4690

download these results as csv

Detailed Results

Precision Recall Ave. F-measure Ave. Overlap
AR2 0.094 0.085 0.086 0.808
CB1 0.077 0.053 0.061 0.731
CB2 0.055 0.047 0.049 0.803
KY1 0.173 0.106 0.126 0.676
KY2 0.158 0.099 0.118 0.676
YK1 0.148 0.121 0.132 0.683
YK2 0.139 0.113 0.123 0.679

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Ave. F-measure Ave. Overlap
AR2 0.114 0.099 0.102 0.812
CB1 0.100 0.065 0.077 0.731
CB2 0.081 0.067 0.071 0.803
KY1 0.197 0.118 0.142 0.673
KY2 0.183 0.113 0.134 0.674
YK1 0.172 0.138 0.152 0.678
YK2 0.170 0.134 0.149 0.680

download these results as csv


Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
AR2 0.311 0.266 0.274 0.565
CB1 0.291 0.195 0.228 0.508
CB2 0.191 0.159 0.165 0.516
KY1 0.553 0.339 0.398 0.428
KY2 0.529 0.350 0.405 0.419
YK1 0.491 0.409 0.442 0.417
YK2 0.488 0.409 0.441 0.424

download these results as csv

Chroma Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
AR2 0.348 0.292 0.304 0.546
CB1 0.345 0.229 0.268 0.494
CB2 0.246 0.198 0.209 0.510
KY1 0.577 0.352 0.414 0.428
KY2 0.556 0.365 0.423 0.417
YK1 0.522 0.431 0.468 0.414
YK2 0.522 0.433 0.469 0.425

download these results as csv

Run Times

Friedman Tests for Note Tracking

The Friedman test was run in MATLAB to test significant differences amongst systems with regard to the F-measure on individual files.

Tukey-Kramer HSD Multi-Comparison for Task2
TeamID TeamID Lowerbound Mean Upperbound Significance
YK1 YK2 -2.4483 0.4000 3.2483 FALSE
YK1 KY2 -1.0483 1.8000 4.6483 FALSE
YK1 KY1 -0.2483 2.6000 5.4483 FALSE
YK1 AR2 -0.0483 2.8000 5.6483 FALSE
YK1 CB1 1.3517 4.2000 7.0483 TRUE
YK1 CB2 2.1517 5.0000 7.8483 TRUE
YK2 KY2 -1.4483 1.4000 4.2483 FALSE
YK2 KY1 -0.6483 2.2000 5.0483 FALSE
YK2 AR2 -0.4483 2.4000 5.2483 FALSE
YK2 CB1 0.9517 3.8000 6.6483 TRUE
YK2 CB2 1.7517 4.6000 7.4483 TRUE
KY2 KY1 -2.0483 0.8000 3.6483 FALSE
KY2 AR2 -1.8483 1.0000 3.8483 FALSE
KY2 CB1 -0.4483 2.4000 5.2483 FALSE
KY2 CB2 0.3517 3.2000 6.0483 TRUE
KY1 AR2 -2.6483 0.2000 3.0483 FALSE
KY1 CB1 -1.2483 1.6000 4.4483 FALSE
KY1 CB2 -0.4483 2.4000 5.2483 FALSE
AR2 CB1 -1.4483 1.4000 4.2483 FALSE
AR2 CB2 -0.6483 2.2000 5.0483 FALSE
CB1 CB2 -2.0483 0.8000 3.6483 FALSE

download these results as csv

2019 Su Accuracy Per Song Friedman Mean Rankstask2.onsetOnly.friedman.Friedman Mean Ranks.png

NT Piano-Only Overall Summary Results

This subtask is evaluated in two different ways. In the first setup , a returned note is assumed correct if its onset is within +-50ms of a ref note and its F0 is within +- quarter tone of the corresponding reference note, ignoring the returned offset values. In the second setup, on top of the above requirements, a correct returned note is required to have an offset value within 20% of the ref notes duration around the ref note`s offset, or within 50ms whichever is larger. 3 piano solo recordings are evaluated separately for this subtask.

AR2 BK1 CB1 CB2 KN3 KNJ1 KY1 KY2 YK1 YK2
Ave. F-Measure Onset-Offset 0.1009 0.2005 0.0892 0.0744 0.2006 0.0743 0.1602 0.1324 0.1553 0.1387
Ave. F-Measure Onset Only 0.3768 0.5446 0.3686 0.2522 0.5501 0.6038 0.6141 0.6245 0.6297 0.6235
Ave. F-Measure Chroma 0.1118 0.2144 0.0940 0.0894 0.2006 0.0743 0.1602 0.1324 0.1553 0.1387
Ave. F-Measure Onset Only Chroma 0.3950 0.5486 0.3834 0.2692 0.5501 0.6063 0.6166 0.6270 0.6321 0.6259

download these results as csv

Detailed Results

Precision Recall Ave. F-measure Ave. Overlap
AR2 0.094 0.110 0.101 0.826
BK1 0.194 0.211 0.201 0.842
CB1 0.098 0.083 0.089 0.838
CB2 0.072 0.079 0.074 0.774
KN3 0.203 0.198 0.201 0.508
KNJ1 0.075 0.073 0.074 0.819
KY1 0.162 0.159 0.160 0.822
KY2 0.133 0.132 0.132 0.838
YK1 0.154 0.157 0.155 0.831
YK2 0.137 0.141 0.139 0.825

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Ave. F-measure Ave. Overlap
AR2 0.104 0.122 0.112 0.829
BK1 0.207 0.226 0.214 0.849
CB1 0.103 0.088 0.094 0.839
CB2 0.088 0.094 0.089 0.784
KN3 0.203 0.198 0.201 0.508
KNJ1 0.075 0.073 0.074 0.819
KY1 0.162 0.159 0.160 0.822
KY2 0.133 0.132 0.132 0.837
YK1 0.154 0.157 0.155 0.831
YK2 0.137 0.141 0.139 0.824

download these results as csv

Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
AR2 0.347 0.419 0.377 0.539
BK1 0.537 0.560 0.545 0.573
CB1 0.412 0.336 0.369 0.556
CB2 0.245 0.267 0.252 0.540
KN3 0.641 0.508 0.550 0.465
KNJ1 0.614 0.594 0.604 0.338
KY1 0.631 0.599 0.614 0.445
KY2 0.636 0.614 0.624 0.429
YK1 0.625 0.634 0.630 0.445
YK2 0.616 0.631 0.623 0.431

download these results as csv

Chroma Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
AR2 0.363 0.440 0.395 0.521
BK1 0.541 0.564 0.549 0.570
CB1 0.429 0.349 0.383 0.538
CB2 0.262 0.284 0.269 0.550
KN3 0.641 0.508 0.550 0.466
KNJ1 0.617 0.596 0.606 0.338
KY1 0.634 0.601 0.617 0.445
KY2 0.638 0.617 0.627 0.429
YK1 0.627 0.637 0.632 0.444
YK2 0.618 0.633 0.626 0.431

download these results as csv

Individual Results Files for Task 2

AR2= Anton Runov
CB1= Chris Cannam, Emmanouil Benetos
CB2= Chris Cannam, Emmanouil Benetos
BK1= Sebastian Böck, Rainer Kelz
KN3= Taegyun Kwon, Juhan Nam, Dasaem Jung
KNJ1= Taegyun Kwon, Juhan Nam, Dasaem Jung
KY1= Changhyun Kim, Sangeon Yong
KY2= Changhyun Kim, Sangeon Yong
YK1= Sangeon Yong, Changhyun Kim
YK2= Sangeon Yong, Changhyun Kim