2010:Audio Onset Detection

2010-05-10T20:54:12Z

Roebel: /* Potential Participants */

== Proposers ==

Originally proposed (2005) by Paul Brossier and Pierre Leveau . Has run in 2005, 2006, 2007, 2009.

== Description ==

''The text of this section is largely copied from the 2006 page''

The onset detection contest is a continuation of the 2005/2006 Onset Detection contest.

== Input data ==

The dataset will essentially be the same as in 2005/2006/2007/2009 unless new or updated datasets are made available.

=== Audio format ===

The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* file length between 2 and 36 seconds (total time: 14 minutes)

=== Audio content ===

The dataset is subdivided into classes, because onset detection is sometimes performed in applications dedicated to a single type of signal (ex: segmentation of a single track in a mix, drum transcription, complex mixes databases segmentation...). The performance of each algorithm will be assessed on the whole dataset but also on each class separately.

The dataset contains 85 files from 5 classes annotated as follows:

* 30 solo drum excerpts cross-annotated by 3 people
* 30 solo monophonic pitched instruments excerpts cross-annotated by 3 people
* 10 solo polyphonic pitched instruments excerpts cross-annotated by 3 people
* 15 complex mixes cross-annotated by 5 people

Moreover the monophonic pitched instruments class is divided into 6 sub-classes: brass (2 excerpts), winds (4), sustained strings (6), plucked strings (9), bars and bells (4), singing voice (5).

== Submission File formats ==
'''Note:''' <AudioFileName>.wav indicates the file name.

=== Output data ===

The onset detection algorithms will return onset times in a text file:
<Results of evaluated Algo path>/<AudioFileName>.output.

=== Onset file Format ===

<onset time(in seconds)>\n

where \n denotes the end of line. The < and > characters are not included.

=== README file ===

A README file accompanying each submission should contain explicit instructions on how to to run the program. In particular, each command line to run should be specified, using %input% for the input sound file and %output% for the resulting text file.

For instance, to test the program foobar with different values for parameters param1 and param2, the README file would look like:

foobar -param1 .1 -param2 1 -i %input% -o %output%
foobar -param1 .1 -param2 2 -i %input% -o %output%
foobar -param1 .2 -param2 1 -i %input% -o %output%
foobar -param1 .2 -param2 2 -i %input% -o %output%
foobar -param1 .3 -param2 1 -i %input% -o %output%
...

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,1,'%input%','%output%');quit;"
matlab -r "foobar(.1,2,'%input%','%output%');quit;"
matlab -r "foobar(.2,1,'%input%','%output%');quit;"
matlab -r "foobar(.2,2,'%input%','%output%');quit;"
matlab -r "foobar(.3,1,'%input%','%output%');quit;"
...

The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input%' and '%output%' strings.

== Evaluation procedures ==

''This text has been copied from the 2006 Onset detection page''

The detected onset times will be compared with the ground-truth ones. For a given ground-truth onset time, if there is a detection in a tolerance time-window around it, it is considered as a correct detection (CD). If not, there is a false negative (FN). The detections outside all the tolerance windows are counted as false positives (FP). Doubled onsets (two detections for one ground-truth onset) and merged onsets (one detection for two ground-truth onsets) will be taken into account in the evaluation. Doubled onsets are a subset of the FP onsets, and merged onsets a subset of FN onsets.

'''We define:'''

'''Precision'''
P = Ocd / (Ocd +Ofp)
'''Recall'''
R = Ocd / (Ocd + Ofn)
'''F-measure'''
F = 2*P*R/(P+R)

'''with these notations:'''

'''Ocd'''
number of correctly detected onsets (CD)
'''Ofn'''
number of missed onsets (FN)
'''Om'''
number of merged onsets
'''Ofp'''
number of false positive onsets (FP)
'''Od'''
number of double onsets

==== Other indicative measurements: ====

'''FP rate'''
FP = 100. * (Ofp) / (Ocd+Ofp)
'''Doubled Onset rate in FP'''
D = 100 * Od / Ofp
'''Merged Onset rate in FN'''
M = 100 * Om / Ofn

Because files are cross-annotated, the mean Precision and Recall rates are defined by averaging Precision and Recall rates computed for each annotation.

To establish a ranking, we will use the F-measure, widely used in string comparisons. This criterion is arbitrary, but gives an indication of performance. It must be remembered that onset detection is a preprocessing step, so the real cost of an error of each type (false positive or false negative) depends on the application following this task.

=== Evaluation measures: ===

* percentage of correct detections / false positives (can also be expressed as precision/recall)
* time precision (tolerance from +/- 50 ms to less). For certain file, we can't be much more accurate than 50 ms because of the weak annotation precision. This must be taken into account.
* separate scoring for different instrument types (percussive, strings, winds, etc)

==== More detailed data: ====

* percentage of doubled detections
* speed measurements of the algorithms
* scalability to large files
* robustness to noise, loudness

== Comments from participants ==
== Potential Participants ==
axel(dot)roebel[at]ircam(dot)fr

MIREX Wiki - User contributions [en]

2010:Audio Onset Detection