2018:Multiple Fundamental Frequency Estimation & Tracking Results - Su Dataset

From MIREX Wiki

Introduction

Since 2015, a newly annotated polyphonic dataset has been added to this task. This dataset contains a wider range of real-world music in comparison to the old dataset used from 2009. Specifically, the new dataset contains 3 clips of piano solo, 3 clips of string quartet, 2 clips of piano quintet, and 2 clips of violin sonata (violin with piano accompaniment), all of which are selected from real-world recordings. The length of each clip is between 20 and 30 seconds. The dataset is annotated by the method described in the following paper:

Li Su and Yi-Hsuan Yang, "Escaping from the Abyss of Manual Annotation: New Methodology of Building Polyphonic Datasets for Automatic Music Transcription," in Int. Symp. Computer Music Multidisciplinary Research (CMMR), June 2015.

As also mentioned in the paper, we tried our best to calibrate the errors (mostly the mismatch between onset and offset time stamps) in the preliminary annotation by human labor. Since there are still potential errors of annotation that we didn’t find, we decide to make the data and the annotation publicly available after the announcement of MIREX result this year. Specifically, we encourage every participant to help us check the annotation. The result of each competing algorithm will be updated based on the revised annotation. We hope that this can let the participants get more detailed information about the behaviors of the algorithm performing on the dataset. Moreover, in this way we can join our efforts to create a better dataset for the research on multiple-F0 estimation and tracking.

General Legend

Sub code Submission name Abstract Contributors
CB1 Silvet PDF Chris Cannam, Emmanouil Benetos
CB2 Silvet Live PDF Chris Cannam, Emmanouil Benetos
KB1 (Note-subtask2 only) PianoTranscriptor PDF Rainer Kelz, Sebastian Böck]

Task 1: Multiple Fundamental Frequency Estimation (MF0E)

MF0E Overall Summary Results

Detailed Results

Precision Recall Accuracy Etot Esubs Emiss Efa
CB1 0.617 0.236 0.234 0.773 0.150 0.614 0.009
CB2 0.586 0.224 0.221 0.788 0.162 0.614 0.011

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Accuracy Etot Esubs Emiss Efa
CB1 0.735 0.284 0.281 0.725 0.102 0.614 0.009
CB2 0.736 0.284 0.280 0.728 0.102 0.614 0.011

download these results as csv

Individual Results Files for Task 1

CB1= Chris Cannam, Emmanouil Benetos
CB2= Chris Cannam, Emmanouil Benetos

Info about the filenames

The first two letters of the filename represent the music type:

PQ = piano quintet, PS = piano solo, SQ = string quartet, VS = violin sonata (with piano accompaniment)

Run Times

Friedman tests for Multiple Fundamental Frequency Estimation (MF0E)

The Friedman test was run in MATLAB to test significant differences amongst systems with regard to the performance (accuracy) on individual files.

Tukey-Kramer HSD Multi-Comparison

TeamID TeamID Lowerbound Mean Upperbound Significance
CB1 CB2 -0.2198 0.4000 1.0198 FALSE

download these results as csv

2018 Su Accuracy Per Song Friedman Mean Rankstask1.friedman.Friedman Mean Ranks.png

Task 2:Note Tracking (NT)

NT Mixed Set Overall Summary Results

This subtask is evaluated in two different ways. In the first setup , a returned note is assumed correct if its onset is within +-50ms of a ref note and its F0 is within +- quarter tone of the corresponding reference note, ignoring the returned offset values. In the second setup, on top of the above requirements, a correct returned note is required to have an offset value within 20% of the ref notes duration around the ref note`s offset, or within 50ms whichever is larger.

CB1 CB2
Ave. F-Measure Onset-Offset 0.0614 0.0491
Ave. F-Measure Onset Only 0.2280 0.1653
Ave. F-Measure Chroma 0.0771 0.0707
Ave. F-Measure Onset Only Chroma 0.2676 0.2089

download these results as csv

Detailed Results

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.077 0.053 0.061 0.731
CB2 0.055 0.047 0.049 0.803

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.100 0.065 0.077 0.731
CB2 0.081 0.067 0.071 0.803

download these results as csv


Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.291 0.195 0.228 0.508
CB2 0.191 0.159 0.165 0.516

download these results as csv

Chroma Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.345 0.229 0.268 0.494
CB2 0.246 0.198 0.209 0.510

download these results as csv

Run Times

Friedman Tests for Note Tracking

The Friedman test was run in MATLAB to test significant differences amongst systems with regard to the F-measure on individual files.

Tukey-Kramer HSD Multi-Comparison for Task2
TeamID TeamID Lowerbound Mean Upperbound Significance
CB1 CB2 -0.0198 0.6000 1.2198 FALSE

download these results as csv

2018 Su Accuracy Per Song Friedman Mean Rankstask2.onsetOnly.friedman.Friedman Mean Ranks.png

NT Piano-Only Overall Summary Results

This subtask is evaluated in two different ways. In the first setup , a returned note is assumed correct if its onset is within +-50ms of a ref note and its F0 is within +- quarter tone of the corresponding reference note, ignoring the returned offset values. In the second setup, on top of the above requirements, a correct returned note is required to have an offset value within 20% of the ref notes duration around the ref note`s offset, or within 50ms whichever is larger. 3 piano solo recordings are evaluated separately for this subtask.

CB1 CB2 KB1
Ave. F-Measure Onset-Offset 0.0892 0.0744 0.2136
Ave. F-Measure Onset Only 0.3686 0.2522 0.6090
Ave. F-Measure Chroma 0.0940 0.0894 0.2180
Ave. F-Measure Onset Only Chroma 0.3834 0.2692 0.6090

download these results as csv

Detailed Results

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.098 0.083 0.089 0.838
CB2 0.072 0.079 0.074 0.774
KB1 0.227 0.203 0.214 0.837

download these results as csv

Detailed Chroma Results

Here, accuracy is assessed on chroma results (i.e. all F0's are mapped to a single octave before evaluating)

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.103 0.088 0.094 0.839
CB2 0.088 0.094 0.089 0.784
KB1 0.232 0.207 0.218 0.839

download these results as csv

Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.412 0.336 0.369 0.556
CB2 0.245 0.267 0.252 0.540
KB1 0.658 0.571 0.609 0.595

download these results as csv

Chroma Results Based on Onset Only

Precision Recall Ave. F-measure Ave. Overlap
CB1 0.429 0.349 0.383 0.538
CB2 0.262 0.284 0.269 0.550
KB1 0.658 0.571 0.609 0.589

download these results as csv

Individual Results Files for Task 2

CB1= Chris Cannam, Emmanouil Benetos
CB2= Chris Cannam, Emmanouil Benetos
KB1= Rainer Kelz, Sebastian Böck