Difference between revisions of "2010:Audio Onset Detection"

From MIREX Wiki
(Potential Participants)
 
(18 intermediate revisions by 2 users not shown)
Line 1: Line 1:
== Proposers ==
+
== Description ==
  
Originally proposed (2005) by Paul Brossier and Pierre Leveau . Has run in 2005, 2006, 2007, 2009.
+
Audio Onset Detection concerns itself with finding the time-locations of all sonic events in a piece of audio. This task was originally proposed in 2005 by Paul Brossier and Pierre Leveau . It has subsequently been run in 2005, 2006, 2007, 2009.
  
  
 +
== Data ==
 +
=== Collections ===
 +
The dataset will be the same as in 2005/2006/2007/2009 unless new or updated datasets are made available. The current dataset is subdivided into classes, because onset detection is sometimes performed in applications dedicated to a single type of signal (ex: segmentation of a single track in a mix, drum transcription, complex mixes databases segmentation...). The performance of each algorithm will be assessed on the whole dataset but also on each class separately.
  
== Description ==
+
The dataset contains 85 files from 5 classes annotated as follows:
  
 +
* 30 solo drum excerpts cross-annotated by 3 people
 +
* 30 solo monophonic pitched instruments excerpts cross-annotated by 3 people
 +
* 10 solo polyphonic pitched instruments excerpts cross-annotated by 3 people
 +
* 15 complex mixes cross-annotated by 5 people
  
''The text of this section is largely copied from the 2006 page''
+
Moreover the monophonic pitched instruments class is divided into 6 sub-classes: brass (2 excerpts), winds (4), sustained strings (6), plucked strings (9), bars and bells (4), singing voice (5).
 
 
The onset detection contest is a continuation of the 2005/2006 Onset Detection contest.
 
 
 
== Input data ==
 
  
The dataset will essentially be the same as in 2005/2006/2007/2009 unless new or updated datasets are made available.
 
  
=== Audio format ===
+
=== Audio Formats ===
  
 
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.
 
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.
Line 24: Line 26:
 
* file length between 2 and 36 seconds (total time: 14 minutes)  
 
* file length between 2 and 36 seconds (total time: 14 minutes)  
  
=== Audio content ===
 
  
The dataset is subdivided into classes, because onset detection is sometimes performed in applications dedicated to a single type of signal (ex: segmentation of a single track in a mix, drum transcription, complex mixes databases segmentation...). The performance of each algorithm will be assessed on the whole dataset but also on each class separately.
+
== Evaluation Procedures ==
 +
 
 +
The detected onset times will be compared with the ground-truth ones. For a given ground-truth onset time, if there is a detection in a tolerance time-window around it, it is considered as a correct detection (CD). If not, there is a false negative (FN). The detections outside all the tolerance windows are counted as false positives (FP). Doubled onsets (two detections for one ground-truth onset) and merged onsets (one detection for two ground-truth onsets) will be taken into account in the evaluation. Doubled onsets are a subset of the FP onsets, and merged onsets a subset of FN onsets.
 +
 
 +
We define:
 +
 
 +
*'''Precision''' P = Ocd / (Ocd +Ofp)
 +
*'''Recall''' R = Ocd / (Ocd + Ofn)
 +
*'''F-measure''' F = 2*P*R/(P+R)
 +
 
 +
with these notations:
 +
 
 +
*'''Ocd''' number of correctly detected onsets (CD)
 +
*'''Ofn''' number of missed onsets (FN)
 +
*'''Om''' number of merged onsets
 +
*'''Ofp''' number of false positive onsets (FP)
 +
*'''Od''' number of double onsets
 +
 
 +
Other indicative measurements:
 +
 
 +
*'''FP rate''' FP = 100. * (Ofp) / (Ocd+Ofp)
 +
*'''Doubled Onset rate in FP''' D = 100 * Od / Ofp
 +
*'''Merged Onset rate in FN''' M = 100 * Om / Ofn
 +
 
 +
Because files are cross-annotated, the mean Precision and Recall rates are defined by averaging Precision and Recall rates computed for each annotation.
 +
 
 +
To establish a ranking, we will use the F-measure, widely used in string comparisons. This criterion is arbitrary, but gives an indication of performance. It must be remembered that onset detection is a preprocessing step, so the real cost of an error of each type (false positive or false negative) depends on the application following this task.
 +
 
 +
=== Evaluation measures ===
 +
 
 +
* percentage of correct detections / false positives (can also be expressed as precision/recall)
 +
* time precision (tolerance from +/- 50 ms to less). For certain file, we can't be much more accurate than 50 ms because of the weak annotation precision. This must be taken into account.
 +
* separate scoring for different instrument types (percussive, strings, winds, etc)
 +
* percentage of doubled detections
 +
* speed measurements of the algorithms
 +
 
  
The dataset contains 85 files from 5 classes annotated as follows:
+
== Submission Format ==
 +
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.
  
* 30 solo drum excerpts cross-annotated by 3 people
+
=== Input Data ===
* 30 solo monophonic pitched instruments excerpts cross-annotated by 3 people
+
Participating algorithms will have to read audio in the following format:
* 10 solo polyphonic pitched instruments excerpts cross-annotated by 3 people
 
* 15 complex mixes cross-annotated by 5 people
 
  
Moreover the monophonic pitched instruments class is divided into 6 sub-classes: brass (2 excerpts), winds (4), sustained strings (6), plucked strings (9), bars and bells (4), singing voice (5).
+
* Sample rate: 44.1 KHz
 +
* Sample size: 16 bit
 +
* Number of channels: 1 (mono)
 +
* Encoding: WAV
  
== Submission File formats ==
+
=== Output Data ===
'''Note:''' <AudioFileName>.wav indicates the file name.
 
  
=== Output data ===
+
The onset detection algorithms will return onset times in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.
  
The onset detection algorithms will return onset times in a text file:
+
==== Output File Format (Audio Onset Detection) ====
<Results of evaluated Algo path>/<AudioFileName>.output.
 
  
=== Onset file Format ===
+
The Audio Onset Detection output file format is an ASCII text format. Each onset time is specified, in seconds, on its own line. Specifically,
  
 
  <onset time(in seconds)>\n
 
  <onset time(in seconds)>\n
  
where \n denotes the end of line. The < and > characters are not included.
+
where \n denotes the end of line. The < and > characters are not included. An example output file would look something like:
 +
 
 +
0.243
 +
1.476
 +
1.987
 +
2.449
 +
3.224
 +
 
 +
=== Algorithm Calling Format ===
 +
 
 +
The submitted algorithm must take as arguments a SINGLE .wav file to perform the onset detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:
 +
 
 +
foobar %input %output
 +
foobar -i %input -o %output
 +
 
 +
Moreover, if your submission takes additional parameters, such as a detection threshold, foobar could be called like:
 +
 
 +
foobar .1 %input %output
 +
foobar -param1 .1 -i %input -o %output 
  
=== README file ===
+
If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:
  
A README file accompanying each submission should contain explicit instructions on how to to run the program. In particular, each command line to run should be specified, using %input% for the input sound file and %output% for the resulting text file.
+
foobar('%input','%output')
 +
foobar(.1,'%input','%output')
  
For instance, to test the program foobar with different values for parameters param1 and param2, the README file would look like:
+
=== Parameter Sweeps ===
 +
In past iterations of MIREX, submitters have been allowed to specify a parameter sweep so as to generate a precision-recall operator characteristic to better evaluate and understand the algorithm. If you wish to do so, please specify TEN different settings for your sweepable parameter. There are no guarantees that all ten will be tested and evaluated, however, as the time-constraints for MIREX are getting ever smaller as the number of submissions are getting ever larger. Therefore, please also specify the ONE single parameterization you feel is best in the README. If the whole parameter sweep cannot be evaluated, this single parameterization will be used.
  
foobar -param1 .1 -param2 1 -i %input% -o %output%
 
foobar -param1 .1 -param2 2 -i %input% -o %output%
 
foobar -param1 .2 -param2 1 -i %input% -o %output%
 
foobar -param1 .2 -param2 2 -i %input% -o %output%
 
foobar -param1 .3 -param2 1 -i %input% -o %output%
 
...
 
  
For a submission using MATLAB, the README file could look like:
+
=== Packaging submissions ===
  
matlab -r "foobar(.1,1,'%input%','%output%');quit;"
+
* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
matlab -r "foobar(.1,2,'%input%','%output%');quit;"
+
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
matlab -r "foobar(.2,1,'%input%','%output%');quit;"
+
* Be sure to follow the [[MIREX 2010 Submission Instructions]]
matlab -r "foobar(.2,2,'%input%','%output%');quit;"
 
matlab -r "foobar(.3,1,'%input%','%output%');quit;"
 
...
 
  
The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input%' and '%output%' strings.
+
All submissions should include a README file including the following the information:
  
== Evaluation procedures ==
+
* Command line calling format for all executables including examples
 +
* Number of threads/cores used or whether this should be specified on the command line
 +
* Expected memory footprint
 +
* Expected runtime
 +
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
 +
* Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
 +
* Any special notice regarding to running your algorithm
  
''This text has been copied from the 2006 Onset detection page''  
+
Note that the information that you place in the README file is '''extremely''' important in ensuring that your submission is evaluated properly.
  
The detected onset times will be compared with the ground-truth ones. For a given ground-truth onset time, if there is a detection in a tolerance time-window around it, it is considered as a correct detection (CD). If not, there is a false negative (FN). The detections outside all the tolerance windows are counted as false positives (FP). Doubled onsets (two detections for one ground-truth onset) and merged onsets (one detection for two ground-truth onsets) will be taken into account in the evaluation. Doubled onsets are a subset of the FP onsets, and merged onsets a subset of FN onsets.
 
  
'''We define:'''
+
==== README File ====
  
'''Precision'''
+
A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.
    P = Ocd / (Ocd +Ofp)
 
'''Recall'''
 
    R = Ocd / (Ocd + Ofn)
 
'''F-measure'''
 
    F = 2*P*R/(P+R)  
 
  
'''with these notations:'''
+
For instance, to test the program foobar with different values for parameters param1, the README file would look like:
  
'''Ocd'''
+
foobar -param1 .1 -i %input -o %output
    number of correctly detected onsets (CD)
+
foobar -param1 .15 -i %input -o %output
'''Ofn'''
+
foobar -param1 .2 -i %input -o %output
    number of missed onsets (FN)
+
foobar -param1 .25 -i %input -o %output
'''Om'''
+
foobar -param1 .3 -i %input -o %output
    number of merged onsets
+
...
'''Ofp'''
 
    number of false positive onsets (FP)
 
'''Od'''
 
    number of double onsets
 
  
==== Other indicative measurements: ====
+
For a submission using MATLAB, the README file could look like:
  
'''FP rate'''
+
matlab -r "foobar(.1,'%input','%output');quit;"
    FP = 100. * (Ofp) / (Ocd+Ofp)
+
matlab -r "foobar(.15,'%input','%output');quit;"
'''Doubled Onset rate in FP'''
+
matlab -r "foobar(.2,'%input','%output');quit;"
    D = 100 * Od / Ofp
+
matlab -r "foobar(.25,'%input','%output');quit;"
'''Merged Onset rate in FN'''
+
matlab -r "foobar(.3,'%input','%output');quit;"
    M = 100 * Om / Ofn
+
...
  
Because files are cross-annotated, the mean Precision and Recall rates are defined by averaging Precision and Recall rates computed for each annotation.
+
The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input' and '%output' strings.
  
To establish a ranking, we will use the F-measure, widely used in string comparisons. This criterion is arbitrary, but gives an indication of performance. It must be remembered that onset detection is a preprocessing step, so the real cost of an error of each type (false positive or false negative) depends on the application following this task.
+
== Time and hardware limits ==
 +
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.
  
=== Evaluation measures: ===
+
A hard limit of 6 hours will be imposed on analysis times.
  
* percentage of correct detections / false positives (can also be expressed as precision/recall)
 
* time precision (tolerance from +/- 50 ms to less). For certain file, we can't be much more accurate than 50 ms because of the weak annotation precision. This must be taken into account.
 
* separate scoring for different instrument types (percussive, strings, winds, etc)
 
  
==== More detailed data: ====
+
== Submission opening date ==
  
* percentage of doubled detections
+
Friday 4th June 2010
* speed measurements of the algorithms
 
* scalability to large files
 
* robustness to noise, loudness
 
  
== Comments from participants ==
+
== Submission closing date ==
== Potential Participants ==
+
TBA
axel(dot)roebel[at]ircam(dot)fr
 

Latest revision as of 03:20, 5 June 2010

Description

Audio Onset Detection concerns itself with finding the time-locations of all sonic events in a piece of audio. This task was originally proposed in 2005 by Paul Brossier and Pierre Leveau . It has subsequently been run in 2005, 2006, 2007, 2009.


Data

Collections

The dataset will be the same as in 2005/2006/2007/2009 unless new or updated datasets are made available. The current dataset is subdivided into classes, because onset detection is sometimes performed in applications dedicated to a single type of signal (ex: segmentation of a single track in a mix, drum transcription, complex mixes databases segmentation...). The performance of each algorithm will be assessed on the whole dataset but also on each class separately.

The dataset contains 85 files from 5 classes annotated as follows:

  • 30 solo drum excerpts cross-annotated by 3 people
  • 30 solo monophonic pitched instruments excerpts cross-annotated by 3 people
  • 10 solo polyphonic pitched instruments excerpts cross-annotated by 3 people
  • 15 complex mixes cross-annotated by 5 people

Moreover the monophonic pitched instruments class is divided into 6 sub-classes: brass (2 excerpts), winds (4), sustained strings (6), plucked strings (9), bars and bells (4), singing voice (5).


Audio Formats

The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

  • CD-quality (PCM, 16-bit, 44100 Hz)
  • single channel (mono)
  • file length between 2 and 36 seconds (total time: 14 minutes)


Evaluation Procedures

The detected onset times will be compared with the ground-truth ones. For a given ground-truth onset time, if there is a detection in a tolerance time-window around it, it is considered as a correct detection (CD). If not, there is a false negative (FN). The detections outside all the tolerance windows are counted as false positives (FP). Doubled onsets (two detections for one ground-truth onset) and merged onsets (one detection for two ground-truth onsets) will be taken into account in the evaluation. Doubled onsets are a subset of the FP onsets, and merged onsets a subset of FN onsets.

We define:

  • Precision P = Ocd / (Ocd +Ofp)
  • Recall R = Ocd / (Ocd + Ofn)
  • F-measure F = 2*P*R/(P+R)

with these notations:

  • Ocd number of correctly detected onsets (CD)
  • Ofn number of missed onsets (FN)
  • Om number of merged onsets
  • Ofp number of false positive onsets (FP)
  • Od number of double onsets

Other indicative measurements:

  • FP rate FP = 100. * (Ofp) / (Ocd+Ofp)
  • Doubled Onset rate in FP D = 100 * Od / Ofp
  • Merged Onset rate in FN M = 100 * Om / Ofn

Because files are cross-annotated, the mean Precision and Recall rates are defined by averaging Precision and Recall rates computed for each annotation.

To establish a ranking, we will use the F-measure, widely used in string comparisons. This criterion is arbitrary, but gives an indication of performance. It must be remembered that onset detection is a preprocessing step, so the real cost of an error of each type (false positive or false negative) depends on the application following this task.

Evaluation measures

  • percentage of correct detections / false positives (can also be expressed as precision/recall)
  • time precision (tolerance from +/- 50 ms to less). For certain file, we can't be much more accurate than 50 ms because of the weak annotation precision. This must be taken into account.
  • separate scoring for different instrument types (percussive, strings, winds, etc)
  • percentage of doubled detections
  • speed measurements of the algorithms


Submission Format

Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

Input Data

Participating algorithms will have to read audio in the following format:

  • Sample rate: 44.1 KHz
  • Sample size: 16 bit
  • Number of channels: 1 (mono)
  • Encoding: WAV

Output Data

The onset detection algorithms will return onset times in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

Output File Format (Audio Onset Detection)

The Audio Onset Detection output file format is an ASCII text format. Each onset time is specified, in seconds, on its own line. Specifically,

<onset time(in seconds)>\n

where \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

0.243
1.476
1.987
2.449
3.224

Algorithm Calling Format

The submitted algorithm must take as arguments a SINGLE .wav file to perform the onset detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, such as a detection threshold, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output  

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

Parameter Sweeps

In past iterations of MIREX, submitters have been allowed to specify a parameter sweep so as to generate a precision-recall operator characteristic to better evaluate and understand the algorithm. If you wish to do so, please specify TEN different settings for your sweepable parameter. There are no guarantees that all ten will be tested and evaluated, however, as the time-constraints for MIREX are getting ever smaller as the number of submissions are getting ever larger. Therefore, please also specify the ONE single parameterization you feel is best in the README. If the whole parameter sweep cannot be evaluated, this single parameterization will be used.


Packaging submissions

  • All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). IMIRSEL should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
  • Be sure to follow the Best Coding Practices for MIREX
  • Be sure to follow the MIREX 2010 Submission Instructions

All submissions should include a README file including the following the information:

  • Command line calling format for all executables including examples
  • Number of threads/cores used or whether this should be specified on the command line
  • Expected memory footprint
  • Expected runtime
  • Approximately how much scratch disk space will the submission need to store any feature/cache files?
  • Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
  • Any special notice regarding to running your algorithm

Note that the information that you place in the README file is extremely important in ensuring that your submission is evaluated properly.


README File

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with different values for parameters param1, the README file would look like:

foobar -param1 .1 -i %input -o %output
foobar -param1 .15 -i %input -o %output
foobar -param1 .2 -i %input -o %output
foobar -param1 .25 -i %input -o %output
foobar -param1 .3 -i %input -o %output
...

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"
matlab -r "foobar(.15,'%input','%output');quit;"
matlab -r "foobar(.2,'%input','%output');quit;" 
matlab -r "foobar(.25,'%input','%output');quit;"
matlab -r "foobar(.3,'%input','%output');quit;"
...

The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input' and '%output' strings.

Time and hardware limits

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 6 hours will be imposed on analysis times.


Submission opening date

Friday 4th June 2010

Submission closing date

TBA