Difference between revisions of "2010:Audio Beat Tracking"

From MIREX Wiki
(Created page with '== Description == The aim of the automatic beat tracking task is to track each beat locations in a collection of sound files. Unlike the Audio Tempo Extraction task, which a…')
 
(Collections)
 
(8 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
== Description ==
 
== Description ==
  
The aim of the automatic beat tracking task is to track each beat locations in a collection of sound files. Unlike the [[Audio Tempo Extraction]] task, which aim is to detect tempi for each file, the beat tracking task aims at detecting all beat locations in recordings. The algorithms will be evaluated in terms of their accuracy in predicting beat locations annotated by a group of listeners.  
+
The aim of the automatic beat tracking task is to track each beat locations in a collection of sound files. Unlike the Audio Tempo Extraction task, which aim is to detect tempi for each file, the beat tracking task aims at detecting all beat locations in recordings. The algorithms will be evaluated in terms of their accuracy in predicting beat locations annotated by a group of listeners.  
  
=== Input data ===
+
== Data ==
 +
=== Collections ===
 +
The original 2006 dataset contains 160 30-second excerpts (WAV format) used for the Audio Tempo and Beat contests in 2006.  Beat locations have been annotated in each excerpt by 40 different listeners (39 listeners for a few excerpts. The length of each excerpt is 30 seconds. These audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.  One disadvantage of using this set for beat tracking is that the tempi are rather stable and this set will not test beat-tracking algorithms in their ability to track tempo changes.
  
''Audio Format'':
+
The second collection is comprised of 367 Chopin Mazurkas, represented as full audio tracks (WAV format). The Mazurka dataset contains tempo changes so it will evaluate the ability of algorithms to track these.
  
We have an additional data set this year of Chopin Mazurka pieces. There are on the order of 367 pieces with marked up beat locations. Because the tempo varies so much in these pieces, we expect it to be extremely difficult. Nevertheless, it should prove to be interesting for evaluations. We might select a subset of these 367 for the eval database.
+
=== Audio Formats ===
  
The original old dataset are the same 160 30-second excerpts (WAV format) used for the Audio Tempo and Beat contests in 2006.  Beat locations have been annotated in each excerpt by 40 different listeners (39 listeners for a few excerpts. The length of each excerpt is 30 seconds.
+
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.
  
''Audio Content'':
+
* CD-quality (PCM, 16-bit, 44100 Hz)
 +
* single channel (mono)
 +
* file length between 2 and 36 seconds (total time: 14 minutes)
  
The older dataset audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.  One disadvantage of using this set for beat tracking is that the tempi are rather stable and this set will not test beat-tracking algorithms in their ability to track tempo changes.
 
  
The second mazurka dataset contains tempo changes so it will evaluate the ability of algorithms to track these.  
+
== Submission Format ==
 +
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.
  
=== Output data ===
+
=== Input Data ===
 +
Participating algorithms will have to read audio in the following format:
  
Submitted programs should output one beat location per line, with a «new line» character (\n) at the end of each line. The results should either be saved to a text file.
+
* Sample rate: 44.1 KHz
 +
* Sample size: 16 bit
 +
* Number of channels: 1 (mono)
 +
* Encoding: WAV
  
Example of possible output:
+
=== Output Data ===
  
0.0123156<br>
+
The beat tracking algorithms will return beat-times in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.
1.9388662<br>
 
3.8777323<br>
 
5.8165980<br>
 
7.7554634<br>
 
  
Each submission should be accompanied with a README file describing how the program should be used. For instance:
+
=== Output File Format (Audio Beat tracking) ===
  
To run the program ''foobar'' on the file input.wav and store the results in the file output.txt, the following commands are examples of what should be used:
+
The Beat Tracking output file format is an ASCII text format. Each beat time is specified, in seconds, on its own line. Specifically,  
  
  foobar -i input.wav > output.txt
+
<beat time(in seconds)>\n
  foobar -i input.wav -o output.txt
 
  foobar input.wav output.txt
 
  
== Participants ==
+
where \n denotes the end of line. The < and > characters are not included. An example output file would look something like:
  
* Matthew Davies (Queen Mary, University of London), <matthew.davies at elec.qmul.ac.uk>
+
0.243
* Douglas Eck (University of Montreal), <eckdoug at iro.umontreal.ca>
+
0.486
* Simon Dixon (Queen Mary, University of London) <simon.dixon at elec.qmul.ac.uk>
+
0.729
* Geoffroy Peeters (Ircam - CNRS) <geoffroy.peeters at ircam.fr>
 
* Tsung-Chi Lee (National Tsing Hua University of Taiwan), <leetc at mirlab.org>
 
  
== Evaluation Procedures ==
+
=== Algorithm Calling Format ===
 +
 
 +
The submitted algorithm must take as arguments a SINGLE .wav file to perform the onset detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:
 +
 
 +
foobar %input %output
 +
foobar -i %input -o %output
 +
 
 +
Moreover, if your submission takes additional parameters, such as a detection threshold, foobar could be called like:
 +
 
 +
foobar .1 %input %output
 +
foobar -param1 .1 -i %input -o %output 
 +
 
 +
If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:
 +
 
 +
foobar('%input','%output')
 +
foobar(.1,'%input','%output')
 +
 
 +
 
 +
=== README File ===
 +
 
 +
A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.
 +
 
 +
For instance, to test the program foobar with different values for parameters param1, the README file would look like:
 +
 
 +
foobar -param1 .1 -i %input -o %output
 +
foobar -param1 .15 -i %input -o %output
 +
foobar -param1 .2 -i %input -o %output
 +
foobar -param1 .25 -i %input -o %output
 +
foobar -param1 .3 -i %input -o %output
 +
...
 +
 
 +
For a submission using MATLAB, the README file could look like:
 +
 
 +
matlab -r "foobar(.1,'%input','%output');quit;"
 +
matlab -r "foobar(.15,'%input','%output');quit;"
 +
matlab -r "foobar(.2,'%input','%output');quit;"
 +
matlab -r "foobar(.25,'%input','%output');quit;"
 +
matlab -r "foobar(.3,'%input','%output');quit;"
 +
...
  
''This is a major re-write by Martin McKinney and is open to suggestions.''
+
The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input' and '%output' strings.
  
Evaluation of beat-tracking includes an implicit evaluation of tempo accuracy, however, the focus here will be on proper time position of beats.  We propose the following evaluation method, which is quite simple in nature and accounts for ambiguity in the perception of the most salient metrical level:  For each excerpt, an impulse train will be created from each of the 40 annotated ground truth beat vectors as well as from the algorithm output.  The impulse trains will be 25 seconds long (ignoring tapped beats at times less than 5 seconds), constructed with a 100-Hz sampling rate, and have unit impulses at beat times.  Each impulse train of annotations will be denoted by <math>a_s[n]</math>, where the subscript <math>s</math> is the annotator number (1-40), and the impulse train from the algorithm will be donoted by <math>y[n]</math>.  The performance, <math>p</math>, of the beat-tracking algorithm for a single excperpt will be measured by calculating the cross-correlation function of <math>a_s[n]</math> and <math>y[n]</math> within a small delay window, <math>W</math>, around zero and then averaged across the number of annotators (<math>S</math>):
+
== Evaluation Procedures ==
  
<math>P = \frac{1}{S}\sum_{s=1}^{S}\frac{1}{NP}\sum_{m=-W}^{+W}{\sum_{n=1}^{N}{y[n] \cdot a_s[n-m]}}</math>,
+
The evaluation methods are taken from the beat evaluation toolbox and
 +
are described in the following technical report:
  
where <math>N</math> is the sample-length of pulse trains <math>y[n]</math> and <math>a_s[n]</math>, and NP is a normalization factor defined by the maximum number of impulses in either impulse train:
+
M. E. P. Davies, N. Degara and M. D. Plumbley. "Evaluation methods for musical audio beat tracking algorithms". [https://music-ir.org/mirex/results/2009/beat/techreport_beateval.pdf ''Technical Report C4DM-TR-09-06''].
  
<math>NP = \mbox{max}(\sum{y[n]},\sum{a_s[n]})</math>.
+
For further details on the specifics of the methods please refer to the
 +
paper. However, here is a brief summary with appropriate references:
  
The "error" window, W, is proportional to (1/5 of) the beat in the annotated taps and is defined (in Matlab notation ;-) as:
+
*'''F-measure''' - the standard calculation as used in onset evaluation but
 +
with a 70ms window.
  
<math>W</math> = round(0.2 * median(diff(find((a_s[n])))).
+
S. Dixon, "Onset detection revisited," in ''Proceedings of 9th
 +
International Conference on Digital Audio Effects (DAFx)'', Montreal,
 +
Canada, pp. 133-137, 2006.
  
The algorithm with the best average P-score (across excerpts) will win.
+
S. Dixon, "Evaluation of audio beat tracking system beatroot," ''Journal
 +
of New Music Research'', vol. 36, no. 1, pp. 39-51, 2007.
  
''The choice of 1/5 of the beat was somewhat arbitrarily chosen and is open for discussion. I've used this method to examine correlations between taps of different subjects and it works quite well.  Comments please.  -Martin''
+
*'''Cemgil''' - beat accuracy is calculated using a Gaussian error function
 +
with 40ms standard deviation.
  
== Evaluation Database ==
+
A. T. Cemgil, B. Kappen, P. Desain, and H. Honing, "On tempo tracking:
 +
Tempogram representation and Kalman filtering," ''Journal Of New Music
 +
Research'', vol. 28, no. 4, pp. 259-273, 2001
 +
 +
*'''Goto''' - binary decision of correct or incorrect tracking based on
 +
statistical properties of a beat error sequence.
  
A collection of 160 musical exerpts will be used for the evaluation procedure, the same collection used for the 2006 Audio Tempo Extraction and Beat tasks. Each recording has been annotated by 40 different listeners (39 in a few cases). The annotation procedures were described in [2] and [3].
+
M. Goto and Y. Muraoka, "Issues in evaluating beat tracking systems," in
 +
''Working Notes of the IJCAI-97 Workshop on Issues in AI and Music -
 +
Evaluation and Assessment'', 1997, pp. 9-16.
  
20 excerpts will be provided for training to the participant, and the remaining 140 excerpts, novel to all participants, will be used for the contest.
+
*'''PScore''' - McKinney's impulse train cross-correlation method as used in
 +
2006.
  
The second dataset consists of 367 mazurka performances, although a subset of these may be used.
+
M. F. McKinney, D. Moelants, M. E. P. Davies, and A. Klapuri,
 +
"Evaluation of audio beat tracking and music tempo extraction
 +
algorithms," ''Journal of New Music Research'', vol. 36, no. 1, pp. 1-16,
 +
2007.
  
== References ==
+
*'''CMLc''', '''CMLt''', '''AMLc''', '''AMLt''' - continuity-based evaluation methods based on
 +
the longest continuously correctly tracked section.
  
# Masataka Goto and Yoichi Muraoka. Issues in evaluating beat tracking systems. In Working Notes of IJCAI-97 Workshop on Issues in AI and Music - Evaluation and Assessment, pages 9­16, 1997 [http://staff.aist.go.jp/m.goto/PAPER/AIM97.300dpi.ps postscript]
+
S. Hainsworth, "Techniques for the automated analysis of musical audio,"
# McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. [http://gewi.kfunigraz.ac.at/~cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf pdf]
+
Ph.D. dissertation, Department of Engineering, Cambridge University,
# Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. [http://www.northwestern.edu/icmpc/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF pdf]
+
2004.
  
== Comments ==
+
A. P. Klapuri, A. Eronen, and J. Astola, "Analysis of the meter of
Leave comments here
+
acoustic musical signals," IEEE Transactions on Audio, Speech and
 +
Language Processing, vol. 14, no. 1, pp. 342-355, 2006.
  
 +
*'''D''', '''Dg''' - information based criteria based on analysis of a beat error
 +
histogram (note the results are measured in 'bits' and not percentages),
 +
see the technical report for a description.
  
==Practice Data==
+
== Relevant Development Collections ==
 
You can find it here:
 
You can find it here:
  
Line 95: Line 156:
  
 
Data has been uploaded in both .tgz and .zip format.
 
Data has been uploaded in both .tgz and .zip format.
 +
 +
== Time and hardware limits ==
 +
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.
 +
 +
A hard limit of 12 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.
 +
 +
== Submission opening date ==
 +
 +
Friday 4th June 2010
 +
 +
== Submission closing date ==
 +
TBA

Latest revision as of 05:13, 7 June 2010

Description

The aim of the automatic beat tracking task is to track each beat locations in a collection of sound files. Unlike the Audio Tempo Extraction task, which aim is to detect tempi for each file, the beat tracking task aims at detecting all beat locations in recordings. The algorithms will be evaluated in terms of their accuracy in predicting beat locations annotated by a group of listeners.

Data

Collections

The original 2006 dataset contains 160 30-second excerpts (WAV format) used for the Audio Tempo and Beat contests in 2006. Beat locations have been annotated in each excerpt by 40 different listeners (39 listeners for a few excerpts. The length of each excerpt is 30 seconds. These audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters. One disadvantage of using this set for beat tracking is that the tempi are rather stable and this set will not test beat-tracking algorithms in their ability to track tempo changes.

The second collection is comprised of 367 Chopin Mazurkas, represented as full audio tracks (WAV format). The Mazurka dataset contains tempo changes so it will evaluate the ability of algorithms to track these.

Audio Formats

The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

  • CD-quality (PCM, 16-bit, 44100 Hz)
  • single channel (mono)
  • file length between 2 and 36 seconds (total time: 14 minutes)


Submission Format

Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

Input Data

Participating algorithms will have to read audio in the following format:

  • Sample rate: 44.1 KHz
  • Sample size: 16 bit
  • Number of channels: 1 (mono)
  • Encoding: WAV

Output Data

The beat tracking algorithms will return beat-times in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

Output File Format (Audio Beat tracking)

The Beat Tracking output file format is an ASCII text format. Each beat time is specified, in seconds, on its own line. Specifically,

<beat time(in seconds)>\n

where \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

0.243
0.486
0.729

Algorithm Calling Format

The submitted algorithm must take as arguments a SINGLE .wav file to perform the onset detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, such as a detection threshold, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output  

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')


README File

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with different values for parameters param1, the README file would look like:

foobar -param1 .1 -i %input -o %output
foobar -param1 .15 -i %input -o %output
foobar -param1 .2 -i %input -o %output
foobar -param1 .25 -i %input -o %output
foobar -param1 .3 -i %input -o %output
...

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"
matlab -r "foobar(.15,'%input','%output');quit;"
matlab -r "foobar(.2,'%input','%output');quit;" 
matlab -r "foobar(.25,'%input','%output');quit;"
matlab -r "foobar(.3,'%input','%output');quit;"
...

The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input' and '%output' strings.

Evaluation Procedures

The evaluation methods are taken from the beat evaluation toolbox and are described in the following technical report:

M. E. P. Davies, N. Degara and M. D. Plumbley. "Evaluation methods for musical audio beat tracking algorithms". Technical Report C4DM-TR-09-06.

For further details on the specifics of the methods please refer to the paper. However, here is a brief summary with appropriate references:

  • F-measure - the standard calculation as used in onset evaluation but

with a 70ms window.

S. Dixon, "Onset detection revisited," in Proceedings of 9th
International Conference on Digital Audio Effects (DAFx), Montreal,
Canada, pp. 133-137, 2006.
S. Dixon, "Evaluation of audio beat tracking system beatroot," Journal
of New Music Research, vol. 36, no. 1, pp. 39-51, 2007.
  • Cemgil - beat accuracy is calculated using a Gaussian error function

with 40ms standard deviation.

A. T. Cemgil, B. Kappen, P. Desain, and H. Honing, "On tempo tracking:
Tempogram representation and Kalman filtering," Journal Of New Music
Research, vol. 28, no. 4, pp. 259-273, 2001

  • Goto - binary decision of correct or incorrect tracking based on

statistical properties of a beat error sequence.

M. Goto and Y. Muraoka, "Issues in evaluating beat tracking systems," in
Working Notes of the IJCAI-97 Workshop on Issues in AI and Music -
Evaluation and Assessment, 1997, pp. 9-16.
  • PScore - McKinney's impulse train cross-correlation method as used in

2006.

M. F. McKinney, D. Moelants, M. E. P. Davies, and A. Klapuri,
"Evaluation of audio beat tracking and music tempo extraction
algorithms," Journal of New Music Research, vol. 36, no. 1, pp. 1-16,
2007.
  • CMLc, CMLt, AMLc, AMLt - continuity-based evaluation methods based on

the longest continuously correctly tracked section.

S. Hainsworth, "Techniques for the automated analysis of musical audio,"
Ph.D. dissertation, Department of Engineering, Cambridge University,
2004.
A. P. Klapuri, A. Eronen, and J. Astola, "Analysis of the meter of
acoustic musical signals," IEEE Transactions on Audio, Speech and
Language Processing, vol. 14, no. 1, pp. 342-355, 2006.
  • D, Dg - information based criteria based on analysis of a beat error

histogram (note the results are measured in 'bits' and not percentages), see the technical report for a description.

Relevant Development Collections

You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

Time and hardware limits

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 12 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

Submission opening date

Friday 4th June 2010

Submission closing date

TBA