Difference between revisions of "2007:Audio Drum Detection"

From MIREX Wiki
(New page: ==Note== The material below is largely taken from the 2005 page. ==Proposer== 2005 Koen Tanghe (Ghent University). 2007 resurrection Jouni Paulus (Tampere University of Technology) ==Par...)
 
Line 14: Line 14:
 
Audio format:  
 
Audio format:  
 
*CD-quality (PCM, 16-bit, 44100 Hz)  
 
*CD-quality (PCM, 16-bit, 44100 Hz)  
*mono and stereo
+
*mono
 
*30 seconds excerpts (longer excerpts of whole pieces?)
 
*30 seconds excerpts (longer excerpts of whole pieces?)
 
*files are named as "001.wav" to "999.wav" (or with another extension depending on the chosen format)  
 
*files are named as "001.wav" to "999.wav" (or with another extension depending on the chosen format)  
Line 20: Line 20:
 
Audio content:  
 
Audio content:  
 
*polyphonic music with drums (most)  
 
*polyphonic music with drums (most)  
*polyphonic music without drums (some)  
+
*polyphonic music without drums (only few))  
 
*different genres / playing styles  
 
*different genres / playing styles  
 
*both live performances and sequenced music  
 
*both live performances and sequenced music  
Line 26: Line 26:
 
*at least 50 files  
 
*at least 50 files  
 
*participants receive a representative subset in advance
 
*participants receive a representative subset in advance
 +
 +
*Could the data from 2005 be reused?
 +
*New data from [ http://perso.enst.fr/~gillet/ENST-drums/ ENST drums]?
 +
 +
Distributed data
 +
*a representative random subset of the data will be made available to all participants in advance of the evaluation (20% of all available files, the organizers know how many they received on their ftp site)
 +
*this data can be used by the participants as they please
 +
*this data will not be used again during the evaluation
  
 
2) Output results The output of this task is, for each sound file, an ASCII text file containing 2 columns, where each line represents a drum event. The first column is the position (in seconds) of the drum event, and the second column is the label for the drum event at that position. Multiple drum events may occur at the same time, so there may be multiple lines having the same value in the first column. The file names of the output files are the same as the audio files, but the extension is ".txt" (so: "001.txt" for "001.wav").  
 
2) Output results The output of this task is, for each sound file, an ASCII text file containing 2 columns, where each line represents a drum event. The first column is the position (in seconds) of the drum event, and the second column is the label for the drum event at that position. Multiple drum events may occur at the same time, so there may be multiple lines having the same value in the first column. The file names of the output files are the same as the audio files, but the extension is ".txt" (so: "001.txt" for "001.wav").  
  
 
Classes and labels that are considered:  
 
Classes and labels that are considered:  
BD (bass drum)  
+
*BD (bass drum)  
SD (snare drum)  
+
*SD (snare drum)  
HH (hihat)
+
*HH (hihat, open, closed, pedal...)
 +
 
 +
==Evaluation Procedures==
 +
*F-measure (harmonic mean of the recall rate and the precision rate, beta parameter 1, so equal importance to prec. and recall) is calculated for each of three drum types (BD, SD, and HH), resulting in three F-measure scores and their average score
 +
*speed measure: the time it takes to do the complete run from the moment your algorithm starts until the moment it stops will be reported (relevance?)
 +
*parameter: the limit of onset-deviation errors in calculating the above F-measure is 30 ms (so a range of [-30 ms, +30 ms] around the true times)
 +
*condition: the actual drum sounds (sound samples) used in the input audio signal of each song are not known in advance
 +
*condition: participants who provided data and who need in-advance training or tuning, should only use the data made available to all participants by the organizers (If not, they should explictely state that they used their own data that was donated to the MIREX organizers so that this is known in public, and that they can be put in a separate category. They could also submit two versions: one trained with the public data only, and one trained as they had done before using all of their own data. The point is that this must be clear to everyone so that this is known for interpreting the evaluation results correctly.)

Revision as of 01:25, 6 June 2007

Note

The material below is largely taken from the 2005 page.

Proposer

2005 Koen Tanghe (Ghent University). 2007 resurrection Jouni Paulus (Tampere University of Technology)

Participants

Jouni Paulus (Tampere University of Technology) jouni[dot]paulus[at]tut[dot]fi

Description

The task consists of determining the positions (localization) and corresponding drum class names (labeling) of drum events in polyphonic music. This is very interesting rhythmic information for the popular music genres nowadays, can help in determining tempo and (sub)genre, and can also be queried for directly (typical rhythmic sequences/patterns).

1) Input data The only input for this task is a set of sound file excerpts adhering to the format and content requirements mentioned below. Audio format:

  • CD-quality (PCM, 16-bit, 44100 Hz)
  • mono
  • 30 seconds excerpts (longer excerpts of whole pieces?)
  • files are named as "001.wav" to "999.wav" (or with another extension depending on the chosen format)

Audio content:

  • polyphonic music with drums (most)
  • polyphonic music without drums (only few))
  • different genres / playing styles
  • both live performances and sequenced music
  • different types of drum sets (acoustic, electronic, ...)
  • at least 50 files
  • participants receive a representative subset in advance

Distributed data

  • a representative random subset of the data will be made available to all participants in advance of the evaluation (20% of all available files, the organizers know how many they received on their ftp site)
  • this data can be used by the participants as they please
  • this data will not be used again during the evaluation

2) Output results The output of this task is, for each sound file, an ASCII text file containing 2 columns, where each line represents a drum event. The first column is the position (in seconds) of the drum event, and the second column is the label for the drum event at that position. Multiple drum events may occur at the same time, so there may be multiple lines having the same value in the first column. The file names of the output files are the same as the audio files, but the extension is ".txt" (so: "001.txt" for "001.wav").

Classes and labels that are considered:

  • BD (bass drum)
  • SD (snare drum)
  • HH (hihat, open, closed, pedal...)

Evaluation Procedures

  • F-measure (harmonic mean of the recall rate and the precision rate, beta parameter 1, so equal importance to prec. and recall) is calculated for each of three drum types (BD, SD, and HH), resulting in three F-measure scores and their average score
  • speed measure: the time it takes to do the complete run from the moment your algorithm starts until the moment it stops will be reported (relevance?)
  • parameter: the limit of onset-deviation errors in calculating the above F-measure is 30 ms (so a range of [-30 ms, +30 ms] around the true times)
  • condition: the actual drum sounds (sound samples) used in the input audio signal of each song are not known in advance
  • condition: participants who provided data and who need in-advance training or tuning, should only use the data made available to all participants by the organizers (If not, they should explictely state that they used their own data that was donated to the MIREX organizers so that this is known in public, and that they can be put in a separate category. They could also submit two versions: one trained with the public data only, and one trained as they had done before using all of their own data. The point is that this must be clear to everyone so that this is known for interpreting the evaluation results correctly.)