Difference between revisions of "2006:Audio Tempo Extraction"

From MIREX Wiki
 
m (Robot: Automated text replacement (-\[\[([A-Z][^:]+)\]\] +2006:\1))
 
(15 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 +
== Results ==
 +
Results are on [[2006:Audio Tempo Extraction Results]] page.
  
 
== Proposer ==
 
== Proposer ==
Line 7: Line 9:
 
Dirk Moelants (IPEM, Ghent University) dirk@moelants.net
 
Dirk Moelants (IPEM, Ghent University) dirk@moelants.net
 
Title
 
Title
 
  
 
== Automatic tempo extraction ==
 
== Automatic tempo extraction ==
  
The contest will be very similar to last year's contest except that we will ignore phase (a parallel beat-tracking contest will also be run this year).  We will use the same groundtruth data as last year but the scoring will be slightly different (see [[evaluation section]] below).
+
The contest will be very similar to last year's contest except that we will ignore phase (a parallel beat-tracking contest will also be run this year).  We will use the same groundtruth data as last year but the scoring will be slightly different (see evaluation section below).
  
  
Line 26: Line 27:
 
A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.
 
A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.
  
 +
=== Input data ===
 +
 +
''Audio Format'':
 +
 +
The sound files are 160 30-second excerpts (WAV format).
 +
 +
''Audio Content'':
 +
 +
The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.
 +
 +
=== Output Data ===
 +
 +
Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength of ST2 (not output) is simply 1 - ST1.  The results should either be written to a text file in the following format:
 +
 +
T1<tab>T2<tab>ST1
 +
 +
Each submission should be accompanied with a README file describing how the program should be used. For instance:
 +
 +
To run the program ''foobar'' on the file input.wav and store the results in the file output.txt, the following command should be used:
 +
 +
  foobar -i input.wav > output.txt
  
 
== Participants ==
 
== Participants ==
  
    * Geoffroy Peeters (IRCAM), peeters@ircam.fr
+
* Geoffroy Peeters (IRCAM), peeters@ircam.fr
    * Douglas Eck (University of Montreal), eckdoug@iro.umontreal.ca  
+
* Douglas Eck (University of Montreal), eckdoug@iro.umontreal.ca  
    * Matthew Davies (Queen Mary, University of London), matthew.davies@elec.qmul.ac.uk
+
* Matthew Davies (Queen Mary, University of London), matthew.davies@elec.qmul.ac.uk
 +
* Anssi Klapuri (Tampere University), klap@cs.tut.fi
  
    Other Potential Participants
+
Other Potential Participants
 
 
    * Miguel Alonso (ENST), miguel.alonso@enst.fr
 
    * George Tzanetakis (University of Victoria), gtzan@cs.uvic.ca
 
    * Bill Sethares (University of Wisconsin-Madison), sethares@ece.wisc.edu
 
    * Fabien Gouyon (University Pompeu Fabra) and Simon Dixon (OFAI), fgouyon@iua.upf.es, simon@oefai.at
 
  
 +
* Miguel Alonso (ENST), miguel.alonso@enst.fr
 +
* George Tzanetakis (University of Victoria), gtzan@cs.uvic.ca
 +
* Bill Sethares (University of Wisconsin-Madison), sethares@ece.wisc.edu
 +
* Fabien Gouyon (University Pompeu Fabra) and Simon Dixon (OFAI), fgouyon@iua.upf.es, simon@oefai.at
 +
* Iasonas Antonopoulos (University of Athens) and Aggelos Pikrakis (University of Athens), jantonop@di.uoa.gr, pikrakis@di.uoa.gr
  
 
== Evaluation Procedures ==
 
== Evaluation Procedures ==
Line 49: Line 72:
 
The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners will be asked to tap to the beat of a series of musical excerpts. Responses will be collected and their perceived tempo will be calculated. For each excerpt, a distribution of perceived tempo will be generated. A relatively simple form of perceived tempo is proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt will be taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo. In addition to tempo, the phase and tapping times of listeners will also be recorded to evaluation of phase-locking ability of tempo-extraction algorithms.
 
The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners will be asked to tap to the beat of a series of musical excerpts. Responses will be collected and their perceived tempo will be calculated. For each excerpt, a distribution of perceived tempo will be generated. A relatively simple form of perceived tempo is proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt will be taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo. In addition to tempo, the phase and tapping times of listeners will also be recorded to evaluation of phase-locking ability of tempo-extraction algorithms.
  
References:
+
''References'':
  
    * McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://gewi.kfunigraz.ac.at/~cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
+
* McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://www-gewi.uni-graz.at/staff/parncutt/cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
    * Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://www.northwestern.edu/icmpc/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF  
+
* Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://icmpc8.umn.edu/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF  
  
 
'''2) Evaluation of tempo extraction algorithms'''
 
'''2) Evaluation of tempo extraction algorithms'''
  
Algorithms will process musical excerpts and return the following data: Two tempos (T1 and T2, BPM, where T1 is the slower of the two tempos), relative salience/srength of T1 (ST1, normalized so that ST1 + ST2 = 1.0), and the phases of T1 and T2 (P1 and P2, sec from beginning of audio file to the first beat or an integer multiple of the beat).
+
Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:
  
    * Task TT1: Ability to identify T1 to within 8%
+
P = ST1 * TT1 + (1 - ST1) * TT2
    * Task TT2: Ability to identify T2 to within 8%
 
    * Task TT1I: Ability to identify an acceptable (see below) integer multiple/fraction of T1 to within 8% (given if Task TT1 is correct)
 
    * Task TT2I: Ability to identify an acceptable (see below) integer multiple/fraction of T2 to within 8% (given if Task TT2 is correct)
 
    * Task TST1: Ability to identify the relative strength of the T1
 
    * Task TP1: Ability to correctly identify phase of T1 to within 15% of the T1 beat period (N/A if Task TT1 is incorrect)
 
    * Task TP2: Ability to correctly identify phase of T2 to within 15% of the T2 beat period (N/A if Task TT2 is incorrect)
 
  
Each task (except for TST1) will receive a score of 1.0 for correct evaulation, 0.0 for incorrect evaluation. For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:
+
where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to within 8%, and TT2 is the ability of the algorithm to identify T2 to within 8%.  No credit will be given for tempi other than T1 and T2.
  
P = 0.25 * TT1 + 0.25 * TT2 + 0.10 * TT1I + 0.10 * TT2I + 0.20 * (1.0 - |ST1 - ST1GT|/max(ST1,ST1GT)) + 0.05 * TP1 + 0.05 * TP2
+
The algorithm with the best average P-score will win the contest. We will provide some measures of statistical significance to the results, most likely through bootstrapping the test data.
  
where ST1GT is the ground truth data for the salience of T1. Tasks TT1I and TT2I will be assumed correct if the tempo identification tasks (TT1 and TT2, respectively) are performed correctly. Acceptable integers for Tasks TT1I and TT2I will be based upon examination of the meter of individual excerpts and of the distributions of their tapped tempi (e.g., 3 and 1/3 for tertiary meters). Tasks TST1 and TP1 will be assumed incorrect if Task TT1 is performed incorrectly. Task TP2 will be assumed incorrect if Task TT2 is performed incorrectly. If the ground-truth T1 is reported on T2, it will be accepted as correct. If the ground-truth T2 is reported on T1, it will also be accepted as correct and ST2 (calculated from: ST2 = 1.0 - ST1) will be taken as the new ST1.
+
== Relevant Test Collections ==
  
The algorithm with the best average P-score will win the contest. We can also analyze the scores of individual tasks as well. We will provide some measures of statistical significance to the results, most likely through bootstrapping the test data.
+
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of our previous experiments (See McKinney/Moelants ICMPC paper above).
Relevant Test Collections
 
 
 
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of our previous experiments (See McKinney/Moelants ICMPC paper above). We are currently running tapping experiments to evaluate 120 new excerpts. These new excerpts were taken from our local collections.
 
  
 
Excerpts were selected to provide:
 
Excerpts were selected to provide:
  
    * stable tempo within each excerpt
+
* stable tempo within each excerpt
    * a good distribution of tempi across excerpts
+
* a good distribution of tempi across excerpts
    * a large variety of instrumentation and beat strengths (with and without percussion)
+
* a large variety of instrumentation and beat strengths (with and without percussion)
    * a variation of musical styles, including many non-western styles
+
* a variation of musical styles, including many non-western styles
    * the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).  
+
* the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).  
  
 
We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.
 
We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.
 +
 +
==Practice Data==
 +
You can find it here:
 +
 +
https://www.music-ir.org/evaluation/MIREX/data/2006/beat/
 +
 +
User: beattrack Password: b34trx
 +
 +
https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/
 +
 +
User: tempo Password: t3mp0
 +
 +
Data has been uploaded in both .tgz and .zip format.

Latest revision as of 13:04, 13 May 2010

Results

Results are on 2006:Audio Tempo Extraction Results page.

Proposer

Martin F. McKinney (Philips) mckinney@alum.mit.edu

Dirk Moelants (IPEM, Ghent University) dirk@moelants.net Title

Automatic tempo extraction

The contest will be very similar to last year's contest except that we will ignore phase (a parallel beat-tracking contest will also be run this year). We will use the same groundtruth data as last year but the scoring will be slightly different (see evaluation section below).


Description

This contest will compare current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for the extraction of perceptual tempo. We will also test for tempo following if there is enough interest.

We differentiate between notated tempo and perceived tempo. If you have the notated tempo (e.g., from the score) it is straightforward attach a tempo annotation to an excerpt and run a contest for algorithms to predict the notated tempo. For excerpts for which we have no "official" tempo annotation, we can also annotate the *perceived* tempo. This is not a straightforward task and needs to be done carefully. If you ask a group of listeners (including skilled musicians) to annotate the tempo of music excerpts, they can give you different answers (they tap at different metrical levels) if they are unfamiliar with the piece. For some excerpts the perceived pulse or tempo is less ambiguous and everyone taps at the same metrical level, but for other excerpts the tempo can be quite ambiguous and you get a complete split across listeners.

The annotation of perceptual tempo can take several forms: a probability density function as a function of tempo; a series of tempos, ranked by their respective perceptual salience; etc. These measures of perceptual tempo can be used as a ground truth on which to test algorithms for tempo extraction. The dominant perceived tempo is sometimes the same as the notated tempo but not always. A piece of music can "feel" faster or slower than it's notated tempo in that the dominant perceived pulse can be a metrical level higher or lower than the notated tempo.

There are several reasons to examine the perceptual tempo, either in place of or in addition to the notated tempo. For many applications of automatic tempo extractors, the perceived tempo of the music is more relevant than the notated tempo. An automatic playlist generator or music navigator, for instance, might allow listeners to select or filter music by its (automatically extracted) tempo. In this case, the "feel", or perceptual tempo may be more relevant than the notated tempo. An automatic DJ apparatus might also perform better with a representation of perceived tempo rather than notated tempo.

A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.

Input data

Audio Format:

The sound files are 160 30-second excerpts (WAV format).

Audio Content:

The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.

Output Data

Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength of ST2 (not output) is simply 1 - ST1. The results should either be written to a text file in the following format:

T1<tab>T2<tab>ST1

Each submission should be accompanied with a README file describing how the program should be used. For instance:

To run the program foobar on the file input.wav and store the results in the file output.txt, the following command should be used:

 foobar -i input.wav > output.txt

Participants

  • Geoffroy Peeters (IRCAM), peeters@ircam.fr
  • Douglas Eck (University of Montreal), eckdoug@iro.umontreal.ca
  • Matthew Davies (Queen Mary, University of London), matthew.davies@elec.qmul.ac.uk
  • Anssi Klapuri (Tampere University), klap@cs.tut.fi

Other Potential Participants

  • Miguel Alonso (ENST), miguel.alonso@enst.fr
  • George Tzanetakis (University of Victoria), gtzan@cs.uvic.ca
  • Bill Sethares (University of Wisconsin-Madison), sethares@ece.wisc.edu
  • Fabien Gouyon (University Pompeu Fabra) and Simon Dixon (OFAI), fgouyon@iua.upf.es, simon@oefai.at
  • Iasonas Antonopoulos (University of Athens) and Aggelos Pikrakis (University of Athens), jantonop@di.uoa.gr, pikrakis@di.uoa.gr

Evaluation Procedures

This section focuses on the mechanics of the method while we discuss the data (music excerpts and perceptual data) in the next section. There are two general steps to the method: 1) collection of perceptual tempo annotations; and 2) evaluation of tempo extraction algorithms.

1) Perceptual tempo data collection

The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners will be asked to tap to the beat of a series of musical excerpts. Responses will be collected and their perceived tempo will be calculated. For each excerpt, a distribution of perceived tempo will be generated. A relatively simple form of perceived tempo is proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt will be taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo. In addition to tempo, the phase and tapping times of listeners will also be recorded to evaluation of phase-locking ability of tempo-extraction algorithms.

References:

2) Evaluation of tempo extraction algorithms

Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:

P = ST1 * TT1 + (1 - ST1) * TT2

where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to within 8%, and TT2 is the ability of the algorithm to identify T2 to within 8%. No credit will be given for tempi other than T1 and T2.

The algorithm with the best average P-score will win the contest. We will provide some measures of statistical significance to the results, most likely through bootstrapping the test data.

Relevant Test Collections

We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of our previous experiments (See McKinney/Moelants ICMPC paper above).

Excerpts were selected to provide:

  • stable tempo within each excerpt
  • a good distribution of tempi across excerpts
  • a large variety of instrumentation and beat strengths (with and without percussion)
  • a variation of musical styles, including many non-western styles
  • the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).

We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.

Practice Data

You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.