Difference between revisions of "2009:Audio Onset detection"

From MIREX Wiki
(Input data)
Line 21: Line 21:
 
The dataset will essentially be the same as in 2005/2006/2007 unless new or updated datasets are made available.
 
The dataset will essentially be the same as in 2005/2006/2007 unless new or updated datasets are made available.
  
== Audio format ==
+
=== Audio format ===
  
 
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.
 
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.
Line 29: Line 29:
 
* file length between 2 and 36 seconds (total time: 14 minutes)  
 
* file length between 2 and 36 seconds (total time: 14 minutes)  
  
 
+
=== Audio content ===
== Audio content ==
 
  
 
The dataset is subdivided into classes, because onset detection is sometimes performed in applications dedicated to a single type of signal (ex: segmentation of a single track in a mix, drum transcription, complex mixes databases segmentation...). The performance of each algorithm will be assessed on the whole dataset but also on each class separately.
 
The dataset is subdivided into classes, because onset detection is sometimes performed in applications dedicated to a single type of signal (ex: segmentation of a single track in a mix, drum transcription, complex mixes databases segmentation...). The performance of each algorithm will be assessed on the whole dataset but also on each class separately.
Line 44: Line 43:
  
  
=== Nomenclature ===
+
== Required File formats ==
 
'''Note:''' <AudioFileName>.wav indicates the file name.
 
'''Note:''' <AudioFileName>.wav indicates the file name.
 
  
 
== Output data ==
 
== Output data ==
Line 52: Line 50:
 
The onset detection algorithms will return onset times in a text file:  
 
The onset detection algorithms will return onset times in a text file:  
 
  <Results of evaluated Algo path>/<AudioFileName>.output.
 
  <Results of evaluated Algo path>/<AudioFileName>.output.
 
  
 
== Onset file Format ==
 
== Onset file Format ==
Line 59: Line 56:
  
 
where \n denotes the end of line. The < and > characters are not included.
 
where \n denotes the end of line. The < and > characters are not included.
 
  
 
== README file ==
 
== README file ==
Line 84: Line 80:
  
 
The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input%' and '%output%' strings.
 
The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input%' and '%output%' strings.
 
  
 
== Evaluation procedures ==
 
== Evaluation procedures ==
Line 115: Line 110:
  
 
==== Other indicative measurements: ====
 
==== Other indicative measurements: ====
 
  
 
'''FP rate'''
 
'''FP rate'''
Line 127: Line 121:
  
 
To establish a ranking, we will use the F-measure, widely used in string comparisons. This criterion is arbitrary, but gives an indication of performance. It must be remembered that onset detection is a preprocessing step, so the real cost of an error of each type (false positive or false negative) depends on the application following this task.
 
To establish a ranking, we will use the F-measure, widely used in string comparisons. This criterion is arbitrary, but gives an indication of performance. It must be remembered that onset detection is a preprocessing step, so the real cost of an error of each type (false positive or false negative) depends on the application following this task.
 
  
 
=== Evaluation measures: ===
 
=== Evaluation measures: ===
 
  
 
* percentage of correct detections / false positives (can also be expressed as precision/recall)
 
* percentage of correct detections / false positives (can also be expressed as precision/recall)
 
* time precision (tolerance from +/- 50 ms to less). For certain file, we can't be much more accurate than 50 ms because of the weak annotation precision. This must be taken into account.
 
* time precision (tolerance from +/- 50 ms to less). For certain file, we can't be much more accurate than 50 ms because of the weak annotation precision. This must be taken into account.
 
* separate scoring for different instrument types (percussive, strings, winds, etc)  
 
* separate scoring for different instrument types (percussive, strings, winds, etc)  
 
  
 
==== More detailed data: ====
 
==== More detailed data: ====
Line 143: Line 134:
 
* scalability to large files
 
* scalability to large files
 
* robustness to noise, loudness  
 
* robustness to noise, loudness  
 
  
 
== Comments from participants ==
 
== Comments from participants ==
 
<your comments here>
 
<your comments here>

Revision as of 15:36, 2 January 2009

Proposers

Originally proposed (2005) by Paul Brossier and Pierre Leveau . Has run in 2005, 2006 and 2007.


Participants

   * <your name here>


Description

The text of this section is largely copied from the 2006 page

The onset detection contest is a continuation of the 2005/2006 Onset Detection contest.

Input data

The dataset will essentially be the same as in 2005/2006/2007 unless new or updated datasets are made available.

Audio format

The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

  • CD-quality (PCM, 16-bit, 44100 Hz)
  • single channel (mono)
  • file length between 2 and 36 seconds (total time: 14 minutes)

Audio content

The dataset is subdivided into classes, because onset detection is sometimes performed in applications dedicated to a single type of signal (ex: segmentation of a single track in a mix, drum transcription, complex mixes databases segmentation...). The performance of each algorithm will be assessed on the whole dataset but also on each class separately.

The dataset contains 85 files from 5 classes annotated as follows:

  • 30 solo drum excerpts cross-annotated by 3 people
  • 30 solo monophonic pitched instruments excerpts cross-annotated by 3 people
  • 10 solo polyphonic pitched instruments excerpts cross-annotated by 3 people
  • 15 complex mixes cross-annotated by 5 people

Moreover the monophonic pitched instruments class is divided into 6 sub-classes: brass (2 excerpts), winds (4), sustained strings (6), plucked strings (9), bars and bells (4), singing voice (5).


Required File formats

Note: <AudioFileName>.wav indicates the file name.

Output data

The onset detection algorithms will return onset times in a text file:

<Results of evaluated Algo path>/<AudioFileName>.output.

Onset file Format

<onset time(in seconds)>\n

where \n denotes the end of line. The < and > characters are not included.

README file

A README file accompanying each submission should contain explicit instructions on how to to run the program. In particular, each command line to run should be specified, using %input% for the input sound file and %output% for the resulting text file.

For instance, to test the program foobar with different values for parameters param1 and param2, the README file would look like:

foobar -param1 .1 -param2 1 -i %input% -o %output%
foobar -param1 .1 -param2 2 -i %input% -o %output%
foobar -param1 .2 -param2 1 -i %input% -o %output%
foobar -param1 .2 -param2 2 -i %input% -o %output%
foobar -param1 .3 -param2 1 -i %input% -o %output%
...

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,1,'%input%','%output%');quit;"
matlab -r "foobar(.1,2,'%input%','%output%');quit;"
matlab -r "foobar(.2,1,'%input%','%output%');quit;" 
matlab -r "foobar(.2,2,'%input%','%output%');quit;"
matlab -r "foobar(.3,1,'%input%','%output%');quit;"
...

The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input%' and '%output%' strings.

Evaluation procedures

This text has been copied from the 2006 Onset detection page

The detected onset times will be compared with the ground-truth ones. For a given ground-truth onset time, if there is a detection in a tolerance time-window around it, it is considered as a correct detection (CD). If not, there is a false negative (FN). The detections outside all the tolerance windows are counted as false positives (FP). Doubled onsets (two detections for one ground-truth onset) and merged onsets (one detection for two ground-truth onsets) will be taken into account in the evaluation. Doubled onsets are a subset of the FP onsets, and merged onsets a subset of FN onsets.

We define:

Precision

   P = Ocd / (Ocd +Ofp) 

Recall

   R = Ocd / (Ocd + Ofn) 

F-measure

   F = 2*P*R/(P+R) 

with these notations:

Ocd

   number of correctly detected onsets (CD) 

Ofn

   number of missed onsets (FN) 

Om

   number of merged onsets 

Ofp

   number of false positive onsets (FP) 

Od

   number of double onsets 

Other indicative measurements:

FP rate

   FP = 100. * (Ofp) / (Ocd+Ofp) 

Doubled Onset rate in FP

   D = 100 * Od / Ofp 

Merged Onset rate in FN

   M = 100 * Om / Ofn 

Because files are cross-annotated, the mean Precision and Recall rates are defined by averaging Precision and Recall rates computed for each annotation.

To establish a ranking, we will use the F-measure, widely used in string comparisons. This criterion is arbitrary, but gives an indication of performance. It must be remembered that onset detection is a preprocessing step, so the real cost of an error of each type (false positive or false negative) depends on the application following this task.

Evaluation measures:

  • percentage of correct detections / false positives (can also be expressed as precision/recall)
  • time precision (tolerance from +/- 50 ms to less). For certain file, we can't be much more accurate than 50 ms because of the weak annotation precision. This must be taken into account.
  • separate scoring for different instrument types (percussive, strings, winds, etc)

More detailed data:

  • percentage of doubled detections
  • speed measurements of the algorithms
  • scalability to large files
  • robustness to noise, loudness

Comments from participants

<your comments here>