MIREX Wiki - User contributions [en]

2014:Audio Tempo Estimation

2014-07-12T23:32:35Z

Michelle Daniels: /* Potential Participants */

== Description ==
This task compares current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for the extraction of perceptual tempo.

We differentiate between notated tempo and perceived tempo. If you have the notated tempo (e.g., from the score) it is straightforward attach a tempo annotation to an excerpt and run a contest for algorithms to predict the notated tempo. For excerpts for which we have no "official" tempo annotation, we can also annotate the *perceived* tempo. This is not a straightforward task and needs to be done carefully. If you ask a group of listeners (including skilled musicians) to annotate the tempo of music excerpts, they can give you different answers (they tap at different metrical levels) if they are unfamiliar with the piece. For some excerpts the perceived pulse or tempo is less ambiguous and everyone taps at the same metrical level, but for other excerpts the tempo can be quite ambiguous and you get a complete split across listeners.

The annotation of perceptual tempo can take several forms: a probability density function as a function of tempo; a series of tempos, ranked by their respective perceptual salience; etc. These measures of perceptual tempo can be used as a ground truth on which to test algorithms for tempo extraction. The dominant perceived tempo is sometimes the same as the notated tempo but not always. A piece of music can "feel" faster or slower than it's notated tempo in that the dominant perceived pulse can be a metrical level higher or lower than the notated tempo.

There are several reasons to examine the perceptual tempo, either in place of or in addition to the notated tempo. For many applications of automatic tempo extractors, the perceived tempo of the music is more relevant than the notated tempo. An automatic playlist generator or music navigator, for instance, might allow listeners to select or filter music by its (automatically extracted) tempo. In this case, the "feel", or perceptual tempo may be more relevant than the notated tempo. An automatic DJ apparatus might also perform better with a representation of perceived tempo rather than notated tempo.

A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.

== Data ==
=== Collections ===
MIREX 2006 Tempo dataset collected by Martin F. McKinney (Philips) and Dirk Moelants (IPEM, Ghent University). Composed of 160 30-second clips in WAV format with annotated tempos.

=== Audio Formats ===
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* 30 second clips

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input data ===
Individual audio files in WAV format (30-second clips drawn from the 140 unseen tracks in the dataset). The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.

=== Output Data ===
Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength ST2 (not output) is simply 1 - ST1. The tempo estimates from each algorithm should be written to a text file in the following format:

T1<tab>T2<tab>ST1

E.g.
60 180 0.7

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the tempo estimation detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as ''%input'' and the output file path and name as ''%output'', a program called foobar could be called from the command-line as follows:

foobar %input %output
or
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

== Evaluation Procedures ==

This section focuses on the mechanics of the method while we discuss the data (music excerpts and perceptual data) in the next section. There are two general steps to the method: 1) collection of perceptual tempo annotations; and 2) evaluation of tempo extraction algorithms.

=== Perceptual tempo data collection ===

The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners were asked to tap to the beat of a series of musical excerpts. Responses were collected and their perceived tempo was calculated. For each excerpt, a distribution of perceived tempo was generated. A relatively simple form of perceived tempo was proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt were taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo.

==== References ====
* McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://www-gewi.uni-graz.at/staff/parncutt/cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
* Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://icmpc8.umn.edu/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF

=== Evaluation of tempo extraction algorithms ===
Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:

P = ST1 * TT1 + (1 - ST1) * TT2

where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to within 8%, and TT2 is the ability of the algorithm to identify T2 to within 8%. No credit will be given for tempi other than T1 and T2.

The algorithm with the best average P-score will achieve the highest rank in the task.

== Relevant Test Collections ==
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of McKinney/Moelants previous experiments (See McKinney/Moelants ICMPC paper above).

Excerpts were selected to provide:

* stable tempo within each excerpt
* a good distribution of tempi across excerpts
* a large variety of instrumentation and beat strengths (with and without percussion)
* a variation of musical styles, including many non-western styles
* the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).

We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.

===Practice Data===
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 8 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Potential Participants ==
name / email

Michelle Daniels / michelledaniels (at) ucsd.edu

2014:Audio Beat Tracking

2014-07-12T23:32:11Z

Michelle Daniels: /* Potential Participants */

== Description ==
The text of this section was copied from the 2012 Wiki. Please add your comments and discussion at the bottom of this page.

The aim of the automatic beat tracking task is to track each beat locations in a collection of sound files. Unlike the Audio Tempo Extraction task, which aim is to detect tempi for each file, the beat tracking task aims at detecting all beat locations in recordings. The algorithms will be evaluated in terms of their accuracy in predicting beat locations annotated by a group of listeners.

== Data ==
=== Collections ===
The original 2006 dataset contains 160 30-second excerpts (WAV format) used for the Audio Tempo and Beat contests in 2006. Beat locations have been annotated in each excerpt by 40 different listeners (39 listeners for a few excerpts. The length of each excerpt is 30 seconds. These audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters. One disadvantage of using this set for beat tracking is that the tempi are rather stable and this set will not test beat-tracking algorithms in their ability to track tempo changes.

The second collection is comprised of 367 Chopin Mazurkas, represented as full audio tracks (WAV format). The Mazurka dataset contains tempo changes so it will evaluate the ability of algorithms to track these.

The third collection was assembled and donated in 2012. This dataset contains 217 excerpts around 40s each, of which 19 are "easy" and the remaining 198 are "hard". The harder excerpts were drawn from the following musical styles: Romantic music, ﬁlm soundtracks, blues, chanson and solo guitar.

This dataset has been designed for radically new techniques which can contend with challenging beat tracking situations like: quiet accompaniment, expressive timing, changes in time signature, slow tempo, poor sound quality etc. So, if your beat tracker likes a 4/4 time-signature with a steady tempo and needs clear percussive onsets, don't expect it to do very well!
But don't be deterred, this is for the good of beat tracking.

You can read in detail about how the dataset was made here:
[http://dx.doi.org/10.1109/TASL.2012.2205244 ''Selective Sampling for Beat Tracking Evaluation'']

=== Audio Formats ===

The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* file length between 2 and 36 seconds (total time: 14 minutes)

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input Data ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 44.1 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV

=== Output Data ===

The beat tracking algorithms will return beat-times in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

=== Output File Format (Audio Beat tracking) ===

The Beat Tracking output file format is an ASCII text format. Each beat time is specified, in seconds, on its own line. Specifically,

<beat time(in seconds)>\n

where \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

0.243
0.486
0.729

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the onset detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, such as a detection threshold, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with different values for parameters param1, the README file would look like:

foobar -param1 .1 -i %input -o %output
foobar -param1 .15 -i %input -o %output
foobar -param1 .2 -i %input -o %output
foobar -param1 .25 -i %input -o %output
foobar -param1 .3 -i %input -o %output
...

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"
matlab -r "foobar(.15,'%input','%output');quit;"
matlab -r "foobar(.2,'%input','%output');quit;"
matlab -r "foobar(.25,'%input','%output');quit;"
matlab -r "foobar(.3,'%input','%output');quit;"
...

The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input' and '%output' strings.

== Evaluation Procedures ==

The evaluation methods are taken from the beat evaluation toolbox and
are described in the following technical report:

M. E. P. Davies, N. Degara and M. D. Plumbley. "Evaluation methods for musical audio beat tracking algorithms". [http://www.elec.qmul.ac.uk/people/markp/2009/DaviesDegaraPlumbley09-evaluation-tr.pdf ''Technical Report C4DM-TR-09-06'']. This link now works! :)

For further details on the specifics of the methods please refer to the
paper. However, here is a brief summary with appropriate references:

*'''F-measure''' - the standard calculation as used in onset evaluation but
with a 70ms window.

S. Dixon, "Onset detection revisited," in ''Proceedings of 9th
International Conference on Digital Audio Effects (DAFx)'', Montreal,
Canada, pp. 133-137, 2006.

S. Dixon, "Evaluation of audio beat tracking system beatroot," ''Journal
of New Music Research'', vol. 36, no. 1, pp. 39-51, 2007.

*'''Cemgil''' - beat accuracy is calculated using a Gaussian error function
with 40ms standard deviation.

A. T. Cemgil, B. Kappen, P. Desain, and H. Honing, "On tempo tracking:
Tempogram representation and Kalman filtering," ''Journal Of New Music
Research'', vol. 28, no. 4, pp. 259-273, 2001

*'''Goto''' - binary decision of correct or incorrect tracking based on
statistical properties of a beat error sequence.

M. Goto and Y. Muraoka, "Issues in evaluating beat tracking systems," in
''Working Notes of the IJCAI-97 Workshop on Issues in AI and Music -
Evaluation and Assessment'', 1997, pp. 9-16.

*'''PScore''' - McKinney's impulse train cross-correlation method as used in
2006.

M. F. McKinney, D. Moelants, M. E. P. Davies, and A. Klapuri,
"Evaluation of audio beat tracking and music tempo extraction
algorithms," ''Journal of New Music Research'', vol. 36, no. 1, pp. 1-16,
2007.

*'''CMLc''', '''CMLt''', '''AMLc''', '''AMLt''' - continuity-based evaluation methods based on
the longest continuously correctly tracked section.

S. Hainsworth, "Techniques for the automated analysis of musical audio,"
Ph.D. dissertation, Department of Engineering, Cambridge University,
2004.

A. P. Klapuri, A. Eronen, and J. Astola, "Analysis of the meter of
acoustic musical signals," IEEE Transactions on Audio, Speech and
Language Processing, vol. 14, no. 1, pp. 342-355, 2006.

*'''D''', '''Dg''' - information based criteria based on analysis of a beat error
histogram (note the results are measured in 'bits' and not percentages),
see the technical report for a description.

== Relevant Development Collections ==
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 12 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Potential Participants ==
name / email

Jose R. Zapata / joser.zapata (at) upb.edu.co

Michelle Daniels / michelledaniels (at) ucsd.edu

== Discussion ==
name / email

2014:Audio Beat Tracking

2014-07-12T23:31:48Z

Michelle Daniels: /* Potential Participants */

== Description ==
The text of this section was copied from the 2012 Wiki. Please add your comments and discussion at the bottom of this page.

The aim of the automatic beat tracking task is to track each beat locations in a collection of sound files. Unlike the Audio Tempo Extraction task, which aim is to detect tempi for each file, the beat tracking task aims at detecting all beat locations in recordings. The algorithms will be evaluated in terms of their accuracy in predicting beat locations annotated by a group of listeners.

== Data ==
=== Collections ===
The original 2006 dataset contains 160 30-second excerpts (WAV format) used for the Audio Tempo and Beat contests in 2006. Beat locations have been annotated in each excerpt by 40 different listeners (39 listeners for a few excerpts. The length of each excerpt is 30 seconds. These audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters. One disadvantage of using this set for beat tracking is that the tempi are rather stable and this set will not test beat-tracking algorithms in their ability to track tempo changes.

The second collection is comprised of 367 Chopin Mazurkas, represented as full audio tracks (WAV format). The Mazurka dataset contains tempo changes so it will evaluate the ability of algorithms to track these.

The third collection was assembled and donated in 2012. This dataset contains 217 excerpts around 40s each, of which 19 are "easy" and the remaining 198 are "hard". The harder excerpts were drawn from the following musical styles: Romantic music, ﬁlm soundtracks, blues, chanson and solo guitar.

This dataset has been designed for radically new techniques which can contend with challenging beat tracking situations like: quiet accompaniment, expressive timing, changes in time signature, slow tempo, poor sound quality etc. So, if your beat tracker likes a 4/4 time-signature with a steady tempo and needs clear percussive onsets, don't expect it to do very well!
But don't be deterred, this is for the good of beat tracking.

You can read in detail about how the dataset was made here:
[http://dx.doi.org/10.1109/TASL.2012.2205244 ''Selective Sampling for Beat Tracking Evaluation'']

=== Audio Formats ===

The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* file length between 2 and 36 seconds (total time: 14 minutes)

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input Data ===
Participating algorithms will have to read audio in the following format:

* Sample rate: 44.1 KHz
* Sample size: 16 bit
* Number of channels: 1 (mono)
* Encoding: WAV

=== Output Data ===

The beat tracking algorithms will return beat-times in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

=== Output File Format (Audio Beat tracking) ===

The Beat Tracking output file format is an ASCII text format. Each beat time is specified, in seconds, on its own line. Specifically,

<beat time(in seconds)>\n

where \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

0.243
0.486
0.729

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the onset detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, such as a detection threshold, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with different values for parameters param1, the README file would look like:

foobar -param1 .1 -i %input -o %output
foobar -param1 .15 -i %input -o %output
foobar -param1 .2 -i %input -o %output
foobar -param1 .25 -i %input -o %output
foobar -param1 .3 -i %input -o %output
...

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"
matlab -r "foobar(.15,'%input','%output');quit;"
matlab -r "foobar(.2,'%input','%output');quit;"
matlab -r "foobar(.25,'%input','%output');quit;"
matlab -r "foobar(.3,'%input','%output');quit;"
...

The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input' and '%output' strings.

== Evaluation Procedures ==

The evaluation methods are taken from the beat evaluation toolbox and
are described in the following technical report:

M. E. P. Davies, N. Degara and M. D. Plumbley. "Evaluation methods for musical audio beat tracking algorithms". [http://www.elec.qmul.ac.uk/people/markp/2009/DaviesDegaraPlumbley09-evaluation-tr.pdf ''Technical Report C4DM-TR-09-06'']. This link now works! :)

For further details on the specifics of the methods please refer to the
paper. However, here is a brief summary with appropriate references:

*'''F-measure''' - the standard calculation as used in onset evaluation but
with a 70ms window.

S. Dixon, "Onset detection revisited," in ''Proceedings of 9th
International Conference on Digital Audio Effects (DAFx)'', Montreal,
Canada, pp. 133-137, 2006.

S. Dixon, "Evaluation of audio beat tracking system beatroot," ''Journal
of New Music Research'', vol. 36, no. 1, pp. 39-51, 2007.

*'''Cemgil''' - beat accuracy is calculated using a Gaussian error function
with 40ms standard deviation.

A. T. Cemgil, B. Kappen, P. Desain, and H. Honing, "On tempo tracking:
Tempogram representation and Kalman filtering," ''Journal Of New Music
Research'', vol. 28, no. 4, pp. 259-273, 2001

*'''Goto''' - binary decision of correct or incorrect tracking based on
statistical properties of a beat error sequence.

M. Goto and Y. Muraoka, "Issues in evaluating beat tracking systems," in
''Working Notes of the IJCAI-97 Workshop on Issues in AI and Music -
Evaluation and Assessment'', 1997, pp. 9-16.

*'''PScore''' - McKinney's impulse train cross-correlation method as used in
2006.

M. F. McKinney, D. Moelants, M. E. P. Davies, and A. Klapuri,
"Evaluation of audio beat tracking and music tempo extraction
algorithms," ''Journal of New Music Research'', vol. 36, no. 1, pp. 1-16,
2007.

*'''CMLc''', '''CMLt''', '''AMLc''', '''AMLt''' - continuity-based evaluation methods based on
the longest continuously correctly tracked section.

S. Hainsworth, "Techniques for the automated analysis of musical audio,"
Ph.D. dissertation, Department of Engineering, Cambridge University,
2004.

A. P. Klapuri, A. Eronen, and J. Astola, "Analysis of the meter of
acoustic musical signals," IEEE Transactions on Audio, Speech and
Language Processing, vol. 14, no. 1, pp. 342-355, 2006.

*'''D''', '''Dg''' - information based criteria based on analysis of a beat error
histogram (note the results are measured in 'bits' and not percentages),
see the technical report for a description.

== Relevant Development Collections ==
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 12 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Potential Participants ==
name / email

Jose R. Zapata / joser.zapata (at) upb.edu.co
Michelle Daniels / michelledaniels (at) ucsd.edu

== Discussion ==
name / email

2014:Audio Tempo Estimation

2014-07-12T23:31:07Z

Michelle Daniels: /* Potential Participants */

2014:Audio Tempo Estimation

2014-07-12T23:30:52Z

Michelle Daniels: /* Potential Participants */