MIREX Wiki - User contributions [en]

2018:Main Page

2018-08-03T01:10:35Z

Aggelos Gkiokas: /* MIREX 2018 Deadline Dates */

==Welcome to MIREX 2018==

This is the main page for the 14th running of the Music Information Retrieval Evaluation eXchange (MIREX 2018). The International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL) at [https://ischool.illinois.edu School of Information Sciences], University of Illinois at Urbana-Champaign ([http://www.illinois.edu UIUC]) is the principal organizer of MIREX 2018.

The MIREX 2018 community will hold its annual meeting as part of [http://ismir2018.ircam.fr The 19th International Society for Music Information Retrieval Conference], ISMIR 2018, which will be held in Paris, France, September 23-27, 2018.

J. Stephen Downie 
Director, IMIRSEL 

==Task Leadership Model==

Like previous years, we are prepared to improve the distribution of tasks for the upcoming MIREX 2018. To do so, we really need leaders to help us organize and run each task.

To volunteer to lead a task, please complete the form [https://goo.gl/forms/5igUsWBxrwMvF2h62 here]. Current information about task captains can be found on the [[2018:Task Captains]] page. Please direct any communication to the [https://mail.lis.illinois.edu/mailman/listinfo/evalfest EvalFest] mailing list.

What does it mean to lead a task?
* Update wiki pages as needed
* Communicate with submitters and troubleshooting submissions
* Execution and evaluation of submissions
* Publishing final results

Due to the proprietary nature of much of the data, the submission system, evaluation framework, and most of the datasets will continue to be hosted by IMIRSEL. However, we are prepared to provide access to task organizers to manage and run submissions on the IMIRSEL systems.

We really need leaders to help us this year!

==MIREX 2018 Deadline Dates==
* <del>'''July 21st 2018'''</del> '''July 31st 2018'''
** [[2018:Audio Classification (Train/Test) Tasks]] <TC: Yun Hao (IMIRSEL)>, including
*** Audio US Pop Genre Classification
*** Audio Latin Genre Classification
*** Audio Music Mood Classification
*** Audio Classical Composer Identification
** [[2018:Audio K-POP Mood Classification]] <TC: Yun Hao (IMIRSEL)>
** [[2018:Audio K-POP Genre Classification]] <TC: Yun Hao (IMIRSEL)>
** [[2018:Audio Fingerprinting]] <TC: Chung-Che Wang>

* '''July 31st 2018'''
** [[2018:Multiple Fundamental Frequency Estimation & Tracking]] <TC: Yun Hao (IMIRSEL)>
** [[2018:Set List Identification]] <TC: Ming-Chi Yen>
** [[2018:Audio Melody Extraction]] <TC: Derek Wu>

* '''August 4th 2018'''
** [[2018:Audio Onset Detection]] <TC: Vidya Rangasayee, Priya Arora, Sebastian Böck>
** [[2018:Audio Beat Tracking]] <TC: Aggelos Gkiokas>
** [[2018:Audio Key Detection]] <TC: Johan Pauwels>
** [[2018:Audio Downbeat Estimation]] <TC: Mickaël Zehren, Bientinesi, Paolo>
** [[2018:Real-time Audio to Score Alignment (a.k.a Score Following)]] <TC: Julio Carabias>
** [[2018:Audio Cover Song Identification]] <TC: Chris Tralie>
** [[2018:Audio Chord Estimation]] <TC: Johan Pauwels>

* '''August 11th 2018'''
** [[2018:Automatic Lyrics-to-Audio Alignment]] <TC: Rong Gong, Georgi Dzhambazov>
** [[2018:Drum Transcription]] <TC: Richard Vogl, Carl Southall, Chih-Wei Wu>
** [[2018:Music and/or Speech Detection]] <TC: Blai Meléndez-Catalán, Emilio Molina, David Doukhan, Jan Schlüter>

* '''August 25th 2018'''
** [[2018:Patterns for Prediction]] (offshoot of [[2017:Discovery of Repeated Themes & Sections]]) <TC: Iris Ren, Berit Janssen, Tom Collins>
** [[2018:Audio Tempo Estimation]] <TC: Aggelos Gkiokas>

==MIREX 2018 Submission Instructions==
* Be sure to read through the rest of this page
* Be sure to read though the task pages for which you are submitting
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2018 Submission Instructions]] including both the tutorial video and the text
* The MIREX 2018 Submission System can be found at: https://www.music-ir.org/mirex/sub/ .

==MIREX 2018 Evaluation==

===Note to New Participants===
Please take the time to read the following review articles that explain the history and structure of MIREX.

Downie, J. Stephen (2008). The Music Information Retrieval Evaluation Exchange (2005-2007): 
A window into music information retrieval research.''Acoustical Science and Technology 29'' (4): 247-255. 
Available at: [http://dx.doi.org/10.1250/ast.29.247 http://dx.doi.org/10.1250/ast.29.247]

Downie, J. Stephen, Andreas F. Ehmann, Mert Bay and M. Cameron Jones. (2010). 
The Music Information Retrieval Evaluation eXchange: Some Observations and Insights. 
''Advances in Music Information Retrieval'' Vol. 274, pp. 93-115 
Available at: [http://bit.ly/KpM5u5 http://bit.ly/KpM5u5]

===Runtime Limits===

We reserve the right to stop any process that exceeds runtime limits for each task. We will do our best to notify you in enough time to allow revisions, but this may not be possible in some cases. Please respect the published runtime limits.

===Note to All Participants===

Because MIREX is premised upon the sharing of ideas and results, '''ALL''' MIREX participants are expected to:

# submit a DRAFT 2-3 page extended abstract PDF in the ISMIR format about the submitted program(s) to help us and the community better understand how the algorithm works when submitting their programme(s).
# submit a FINALIZED 2-3 page extended abstract PDF in the ISMIR format prior to ISMIR 2018 for posting on the respective results pages (sometimes the same abstract can be used for multiple submissions; in many cases the DRAFT and FINALIZED abstracts are the same)
# present a poster at the MIREX 2018 poster session at ISMIR 2018

===Software Dependency Requests===
If you have not submitted to MIREX before or are unsure whether IMIRSEL currently supports some of the software/architecture dependencies for your submission a [https://goo.gl/forms/96Wndw9j9dzv4x3c2 dependency request form is available]. Please submit details of your dependencies on this form and the IMIRSEL team will attempt to satisfy them for you.

Due to the high volume of submissions expected at MIREX 2018, submissions with difficulty to satisfy dependencies that the team has not been given sufficient notice of may result in the submission being rejected.

Finally, you will also be expected to detail your software/architecture dependencies in a README file to be provided to the submission system.

==Getting Involved in MIREX 2018==
MIREX is a community-based endeavour. Be a part of the community and help make MIREX 2018 the best yet.

===Mailing List Participation===
If you are interested in formal MIR evaluation, you should also subscribe to the "MIREX" (aka "EvalFest") mail list and participate in the community discussions about defining and running MIREX 2018 tasks. Subscription information at:
[https://mail.lis.illinois.edu/mailman/listinfo/evalfest EvalFest Central].

If you are participating in MIREX 2018, it is VERY IMPORTANT that you are subscribed to EvalFest. Deadlines, task updates and other important information will be announced via this mailing list. Please use the EvalFest for discussion of MIREX task proposals and other MIREX related issues. This wiki (MIREX 2018 wiki) will be used to embody and disseminate task proposals, however, task related discussions should be conducted on the MIREX organization mailing list (EvalFest) rather than on this wiki, but should be summarized here.

Where possible, definitions or example code for new evaluation metrics or tasks should be provided to the IMIRSEL team who will embody them in software as part of the NEMA analytics framework, which will be released to the community at or before ISMIR 2018 - providing a standardised set of interfaces and output to disciplined evaluation procedures for a great many MIR tasks.

===Wiki Participation===
If you find that you cannot edit a MIREX wiki page, you will need to create a new account via: [[Special:Userlogin]].

Please note that because of "spam-bots", MIREX wiki registration requests may be moderated by IMIRSEL members. It might take up to 24 hours for approval (Thank you for your patience!).

==MIREX 2005 - 2017 Wikis==
Content from MIREX 2005 - 2016 are available at:
'''[[2017:Main_Page|MIREX 2017]]'''
'''[[2016:Main_Page|MIREX 2016]]'''
'''[[2015:Main_Page|MIREX 2015]]'''
'''[[2014:Main_Page|MIREX 2014]]'''
'''[[2013:Main_Page|MIREX 2013]]'''
'''[[2012:Main_Page|MIREX 2012]]'''
'''[[2011:Main_Page|MIREX 2011]]'''
'''[[2010:Main_Page|MIREX 2010]]'''
'''[[2009:Main_Page|MIREX 2009]]'''
'''[[2008:Main_Page|MIREX 2008]]'''
'''[[2007:Main_Page|MIREX 2007]]'''
'''[[2006:Main_Page|MIREX 2006]]'''
'''[[2005:Main_Page|MIREX 2005]]'''

2018:Main Page

2018-08-03T01:02:34Z

Aggelos Gkiokas: /* MIREX 2018 Deadline Dates */

==Welcome to MIREX 2018==

This is the main page for the 14th running of the Music Information Retrieval Evaluation eXchange (MIREX 2018). The International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL) at [https://ischool.illinois.edu School of Information Sciences], University of Illinois at Urbana-Champaign ([http://www.illinois.edu UIUC]) is the principal organizer of MIREX 2018.

The MIREX 2018 community will hold its annual meeting as part of [http://ismir2018.ircam.fr The 19th International Society for Music Information Retrieval Conference], ISMIR 2018, which will be held in Paris, France, September 23-27, 2018.

J. Stephen Downie 
Director, IMIRSEL 

==Task Leadership Model==

Like previous years, we are prepared to improve the distribution of tasks for the upcoming MIREX 2018. To do so, we really need leaders to help us organize and run each task.

To volunteer to lead a task, please complete the form [https://goo.gl/forms/5igUsWBxrwMvF2h62 here]. Current information about task captains can be found on the [[2018:Task Captains]] page. Please direct any communication to the [https://mail.lis.illinois.edu/mailman/listinfo/evalfest EvalFest] mailing list.

What does it mean to lead a task?
* Update wiki pages as needed
* Communicate with submitters and troubleshooting submissions
* Execution and evaluation of submissions
* Publishing final results

Due to the proprietary nature of much of the data, the submission system, evaluation framework, and most of the datasets will continue to be hosted by IMIRSEL. However, we are prepared to provide access to task organizers to manage and run submissions on the IMIRSEL systems.

We really need leaders to help us this year!

==MIREX 2018 Deadline Dates==
* <del>'''July 21st 2018'''</del> '''July 31st 2018'''
** [[2018:Audio Classification (Train/Test) Tasks]] <TC: Yun Hao (IMIRSEL)>, including
*** Audio US Pop Genre Classification
*** Audio Latin Genre Classification
*** Audio Music Mood Classification
*** Audio Classical Composer Identification
** [[2018:Audio K-POP Mood Classification]] <TC: Yun Hao (IMIRSEL)>
** [[2018:Audio K-POP Genre Classification]] <TC: Yun Hao (IMIRSEL)>
** [[2018:Audio Fingerprinting]] <TC: Chung-Che Wang>

* '''July 31st 2018'''
** [[2018:Multiple Fundamental Frequency Estimation & Tracking]] <TC: Yun Hao (IMIRSEL)>
** [[2018:Set List Identification]] <TC: Ming-Chi Yen>
** [[2018:Audio Melody Extraction]] <TC: Derek Wu>

* '''August 4th 2018'''
** [[2018:Audio Onset Detection]] <TC: Vidya Rangasayee, Priya Arora, Sebastian Böck>
** [[2018:Audio Beat Tracking]] <TC: Aggelos Gkiokas>
** [[2018:Audio Key Detection]] <TC: Johan Pauwels>
** [[2018:Audio Downbeat Estimation]] <TC: Mickaël Zehren, Bientinesi, Paolo>
** [[2018:Real-time Audio to Score Alignment (a.k.a Score Following)]] <TC: Julio Carabias>
** [[2018:Audio Cover Song Identification]] <TC: Chris Tralie>
** [[2018:Audio Chord Estimation]] <TC: Johan Pauwels>

* '''August 11th 2018'''
** [[2018:Automatic Lyrics-to-Audio Alignment]] <TC: Rong Gong, Georgi Dzhambazov>
** [[2018:Drum Transcription]] <TC: Richard Vogl, Carl Southall, Chih-Wei Wu>
** [[2018:Music and/or Speech Detection]] <TC: Blai Meléndez-Catalán, Emilio Molina, David Doukhan, Jan Schlüter>
** [[2018:Audio Tempo Estimation]] <TC: Aggelos Gkiokas>

* '''August 25th 2018'''
** [[2018:Patterns for Prediction]] (offshoot of [[2017:Discovery of Repeated Themes & Sections]]) <TC: Iris Ren, Berit Janssen, Tom Collins>

==MIREX 2018 Submission Instructions==
* Be sure to read through the rest of this page
* Be sure to read though the task pages for which you are submitting
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2018 Submission Instructions]] including both the tutorial video and the text
* The MIREX 2018 Submission System can be found at: https://www.music-ir.org/mirex/sub/ .

==MIREX 2018 Evaluation==

===Note to New Participants===
Please take the time to read the following review articles that explain the history and structure of MIREX.

Downie, J. Stephen (2008). The Music Information Retrieval Evaluation Exchange (2005-2007): 
A window into music information retrieval research.''Acoustical Science and Technology 29'' (4): 247-255. 
Available at: [http://dx.doi.org/10.1250/ast.29.247 http://dx.doi.org/10.1250/ast.29.247]

Downie, J. Stephen, Andreas F. Ehmann, Mert Bay and M. Cameron Jones. (2010). 
The Music Information Retrieval Evaluation eXchange: Some Observations and Insights. 
''Advances in Music Information Retrieval'' Vol. 274, pp. 93-115 
Available at: [http://bit.ly/KpM5u5 http://bit.ly/KpM5u5]

===Runtime Limits===

We reserve the right to stop any process that exceeds runtime limits for each task. We will do our best to notify you in enough time to allow revisions, but this may not be possible in some cases. Please respect the published runtime limits.

===Note to All Participants===

Because MIREX is premised upon the sharing of ideas and results, '''ALL''' MIREX participants are expected to:

# submit a DRAFT 2-3 page extended abstract PDF in the ISMIR format about the submitted program(s) to help us and the community better understand how the algorithm works when submitting their programme(s).
# submit a FINALIZED 2-3 page extended abstract PDF in the ISMIR format prior to ISMIR 2018 for posting on the respective results pages (sometimes the same abstract can be used for multiple submissions; in many cases the DRAFT and FINALIZED abstracts are the same)
# present a poster at the MIREX 2018 poster session at ISMIR 2018

===Software Dependency Requests===
If you have not submitted to MIREX before or are unsure whether IMIRSEL currently supports some of the software/architecture dependencies for your submission a [https://goo.gl/forms/96Wndw9j9dzv4x3c2 dependency request form is available]. Please submit details of your dependencies on this form and the IMIRSEL team will attempt to satisfy them for you.

Due to the high volume of submissions expected at MIREX 2018, submissions with difficulty to satisfy dependencies that the team has not been given sufficient notice of may result in the submission being rejected.

Finally, you will also be expected to detail your software/architecture dependencies in a README file to be provided to the submission system.

==Getting Involved in MIREX 2018==
MIREX is a community-based endeavour. Be a part of the community and help make MIREX 2018 the best yet.

===Mailing List Participation===
If you are interested in formal MIR evaluation, you should also subscribe to the "MIREX" (aka "EvalFest") mail list and participate in the community discussions about defining and running MIREX 2018 tasks. Subscription information at:
[https://mail.lis.illinois.edu/mailman/listinfo/evalfest EvalFest Central].

If you are participating in MIREX 2018, it is VERY IMPORTANT that you are subscribed to EvalFest. Deadlines, task updates and other important information will be announced via this mailing list. Please use the EvalFest for discussion of MIREX task proposals and other MIREX related issues. This wiki (MIREX 2018 wiki) will be used to embody and disseminate task proposals, however, task related discussions should be conducted on the MIREX organization mailing list (EvalFest) rather than on this wiki, but should be summarized here.

Where possible, definitions or example code for new evaluation metrics or tasks should be provided to the IMIRSEL team who will embody them in software as part of the NEMA analytics framework, which will be released to the community at or before ISMIR 2018 - providing a standardised set of interfaces and output to disciplined evaluation procedures for a great many MIR tasks.

===Wiki Participation===
If you find that you cannot edit a MIREX wiki page, you will need to create a new account via: [[Special:Userlogin]].

Please note that because of "spam-bots", MIREX wiki registration requests may be moderated by IMIRSEL members. It might take up to 24 hours for approval (Thank you for your patience!).

==MIREX 2005 - 2017 Wikis==
Content from MIREX 2005 - 2016 are available at:
'''[[2017:Main_Page|MIREX 2017]]'''
'''[[2016:Main_Page|MIREX 2016]]'''
'''[[2015:Main_Page|MIREX 2015]]'''
'''[[2014:Main_Page|MIREX 2014]]'''
'''[[2013:Main_Page|MIREX 2013]]'''
'''[[2012:Main_Page|MIREX 2012]]'''
'''[[2011:Main_Page|MIREX 2011]]'''
'''[[2010:Main_Page|MIREX 2010]]'''
'''[[2009:Main_Page|MIREX 2009]]'''
'''[[2008:Main_Page|MIREX 2008]]'''
'''[[2007:Main_Page|MIREX 2007]]'''
'''[[2006:Main_Page|MIREX 2006]]'''
'''[[2005:Main_Page|MIREX 2005]]'''

2018:Audio Tempo Estimation

2018-08-03T01:01:14Z

Aggelos Gkiokas: /* Evaluation of tempo extraction algorithms */

== Description ==
This task compares current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for the extraction of perceptual tempo.

We differentiate between notated tempo and perceived tempo. If you have the notated tempo (e.g., from the score) it is straightforward attach a tempo annotation to an excerpt and run a contest for algorithms to predict the notated tempo. For excerpts for which we have no "official" tempo annotation, we can also annotate the *perceived* tempo. This is not a straightforward task and needs to be done carefully. If you ask a group of listeners (including skilled musicians) to annotate the tempo of music excerpts, they can give you different answers (they tap at different metrical levels) if they are unfamiliar with the piece. For some excerpts the perceived pulse or tempo is less ambiguous and everyone taps at the same metrical level, but for other excerpts the tempo can be quite ambiguous and you get a complete split across listeners.

The annotation of perceptual tempo can take several forms: a probability density function as a function of tempo; a series of tempos, ranked by their respective perceptual salience; etc. These measures of perceptual tempo can be used as a ground truth on which to test algorithms for tempo extraction. The dominant perceived tempo is sometimes the same as the notated tempo but not always. A piece of music can "feel" faster or slower than it's notated tempo in that the dominant perceived pulse can be a metrical level higher or lower than the notated tempo.

There are several reasons to examine the perceptual tempo, either in place of or in addition to the notated tempo. For many applications of automatic tempo extractors, the perceived tempo of the music is more relevant than the notated tempo. An automatic playlist generator or music navigator, for instance, might allow listeners to select or filter music by its (automatically extracted) tempo. In this case, the "feel", or perceptual tempo may be more relevant than the notated tempo. An automatic DJ apparatus might also perform better with a representation of perceived tempo rather than notated tempo.

A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.

== Data ==
=== Collections ===
This year algorithm will be evaluated on two datasets:

*MIREX 2006 Tempo dataset collected by Martin F. McKinney (Philips) and Dirk Moelants (IPEM, Ghent University). Composed of 160 30-second clips in WAV format with annotated tempos.
*GiantSteps tempo dataset (Knees et al. 2015), using the perceptual annotations by Schreiber and Müller (2018). This dataset exclusively features electronic dance music (EDM) and is publicly available. If you are interested in a fair and unbiased evaluation, you must not use the dataset for training or validation, but only for informational purposes.

=== Audio Formats ===
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* 30 second clips

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input data ===
Individual audio files in WAV format (30-second clips drawn from the 140 unseen tracks in the dataset). The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.

=== Output Data ===
Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength ST2 (not output) is simply 1 - ST1. The tempo estimates from each algorithm should be written to a text file in the following format:

T1<tab>T2<tab>ST1

E.g.
60 180 0.7

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the tempo estimation detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as ''%input'' and the output file path and name as ''%output'', a program called foobar could be called from the command-line as follows:

foobar %input %output
or
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

== Evaluation Procedures ==

This section focuses on the mechanics of the method while we discuss the data (music excerpts and perceptual data) in the next section. There are two general steps to the method: 1) collection of perceptual tempo annotations; and 2) evaluation of tempo extraction algorithms.

=== Perceptual tempo data collection ===

The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners were asked to tap to the beat of a series of musical excerpts. Responses were collected and their perceived tempo was calculated. For each excerpt, a distribution of perceived tempo was generated. A relatively simple form of perceived tempo was proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt were taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo.

Perceptual tempo data collection for the GiantSteps dataset (Knees et al. 2015) was conducted in an online tapping experiment described in detail in Schreiber und Müller (2018). Just like the original McKinney/Moelants dataset, its annotations feature two tempi and a relative salience value. The dataset is publicly available for inspection, but must not be used for training.

==== References ====
*Knees, P. et al. (2015), Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Málaga, Spain, October 2015. URL: http://www.mtg.upf.edu/system/files/publications/246_Paper.pdf
* McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://www-gewi.uni-graz.at/staff/parncutt/cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
* Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://icmpc8.umn.edu/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF
*Schreiber, H. and Müller, M. (2018), A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018. URL: http://www.tagtraum.com/download/2018_schreiber_tempo_giantsteps.pdf

=== Evaluation of tempo extraction algorithms ===
Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:

P = ST1 * TT1 + (1 - ST1) * TT2

where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to '''within 4%''', and TT2 is the ability of the algorithm to identify T2 to '''within 4%'''. No credit will be given for tempi other than T1 and T2.

'''Tempo tolerance''' has changed from '''8%''' to '''4%''' this year. However, in order to compare submitted methods with past years, results will be '''also reported''' for 8% tolerance.

The algorithm with the best average P-score will achieve the highest rank in the task.

== Relevant Test Collections ==
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of McKinney/Moelants previous experiments (See McKinney/Moelants ICMPC paper above).

Excerpts were selected to provide:

* stable tempo within each excerpt
* a good distribution of tempi across excerpts
* a large variety of instrumentation and beat strengths (with and without percussion)
* a variation of musical styles, including many non-western styles
* the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).

We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.

Regarding the GiantSteps tempo dataset (Knees et al. 2015), if you are interested in a '''fair and unbiased evaluation''', you must '''not use''' the dataset for training or validation, but '''only''' for informational purposes.

===Practice Data===
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

Giantsteps Dataset:

GiantSteps Audio: https://github.com/GiantSteps/giantsteps-tempo-dataset

GiantSteps Perceptual Annotations: http://www.tagtraum.com/download/schreiber_new_giantsteps_tempo.zip

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 8 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Potential Participants ==
name / email

2018:Audio Tempo Estimation

2018-08-03T01:00:45Z

Aggelos Gkiokas: /* References */

== Description ==
This task compares current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for the extraction of perceptual tempo.

We differentiate between notated tempo and perceived tempo. If you have the notated tempo (e.g., from the score) it is straightforward attach a tempo annotation to an excerpt and run a contest for algorithms to predict the notated tempo. For excerpts for which we have no "official" tempo annotation, we can also annotate the *perceived* tempo. This is not a straightforward task and needs to be done carefully. If you ask a group of listeners (including skilled musicians) to annotate the tempo of music excerpts, they can give you different answers (they tap at different metrical levels) if they are unfamiliar with the piece. For some excerpts the perceived pulse or tempo is less ambiguous and everyone taps at the same metrical level, but for other excerpts the tempo can be quite ambiguous and you get a complete split across listeners.

The annotation of perceptual tempo can take several forms: a probability density function as a function of tempo; a series of tempos, ranked by their respective perceptual salience; etc. These measures of perceptual tempo can be used as a ground truth on which to test algorithms for tempo extraction. The dominant perceived tempo is sometimes the same as the notated tempo but not always. A piece of music can "feel" faster or slower than it's notated tempo in that the dominant perceived pulse can be a metrical level higher or lower than the notated tempo.

There are several reasons to examine the perceptual tempo, either in place of or in addition to the notated tempo. For many applications of automatic tempo extractors, the perceived tempo of the music is more relevant than the notated tempo. An automatic playlist generator or music navigator, for instance, might allow listeners to select or filter music by its (automatically extracted) tempo. In this case, the "feel", or perceptual tempo may be more relevant than the notated tempo. An automatic DJ apparatus might also perform better with a representation of perceived tempo rather than notated tempo.

A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.

== Data ==
=== Collections ===
This year algorithm will be evaluated on two datasets:

*MIREX 2006 Tempo dataset collected by Martin F. McKinney (Philips) and Dirk Moelants (IPEM, Ghent University). Composed of 160 30-second clips in WAV format with annotated tempos.
*GiantSteps tempo dataset (Knees et al. 2015), using the perceptual annotations by Schreiber and Müller (2018). This dataset exclusively features electronic dance music (EDM) and is publicly available. If you are interested in a fair and unbiased evaluation, you must not use the dataset for training or validation, but only for informational purposes.

=== Audio Formats ===
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* 30 second clips

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input data ===
Individual audio files in WAV format (30-second clips drawn from the 140 unseen tracks in the dataset). The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.

=== Output Data ===
Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength ST2 (not output) is simply 1 - ST1. The tempo estimates from each algorithm should be written to a text file in the following format:

T1<tab>T2<tab>ST1

E.g.
60 180 0.7

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the tempo estimation detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as ''%input'' and the output file path and name as ''%output'', a program called foobar could be called from the command-line as follows:

foobar %input %output
or
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

== Evaluation Procedures ==

This section focuses on the mechanics of the method while we discuss the data (music excerpts and perceptual data) in the next section. There are two general steps to the method: 1) collection of perceptual tempo annotations; and 2) evaluation of tempo extraction algorithms.

=== Perceptual tempo data collection ===

The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners were asked to tap to the beat of a series of musical excerpts. Responses were collected and their perceived tempo was calculated. For each excerpt, a distribution of perceived tempo was generated. A relatively simple form of perceived tempo was proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt were taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo.

Perceptual tempo data collection for the GiantSteps dataset (Knees et al. 2015) was conducted in an online tapping experiment described in detail in Schreiber und Müller (2018). Just like the original McKinney/Moelants dataset, its annotations feature two tempi and a relative salience value. The dataset is publicly available for inspection, but must not be used for training.

==== References ====
*Knees, P. et al. (2015), Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Málaga, Spain, October 2015. URL: http://www.mtg.upf.edu/system/files/publications/246_Paper.pdf
* McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://www-gewi.uni-graz.at/staff/parncutt/cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
* Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://icmpc8.umn.edu/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF
*Schreiber, H. and Müller, M. (2018), A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018. URL: http://www.tagtraum.com/download/2018_schreiber_tempo_giantsteps.pdf

=== Evaluation of tempo extraction algorithms ===
Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:

P = ST1 * TT1 + (1 - ST1) * TT2

where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to '''within 4%''', and TT2 is the ability of the algorithm to identify T2 to '''within 4%'''. No credit will be given for tempi other than T1 and T2.

Tempo tolerance has changed from '''8%''' to '''4%''' this year. However, in order to compare submitted methods with past years, results will be '''also reported''' for 8% tolerance.

The algorithm with the best average P-score will achieve the highest rank in the task.

== Relevant Test Collections ==
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of McKinney/Moelants previous experiments (See McKinney/Moelants ICMPC paper above).

Excerpts were selected to provide:

* stable tempo within each excerpt
* a good distribution of tempi across excerpts
* a large variety of instrumentation and beat strengths (with and without percussion)
* a variation of musical styles, including many non-western styles
* the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).

We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.

Regarding the GiantSteps tempo dataset (Knees et al. 2015), if you are interested in a '''fair and unbiased evaluation''', you must '''not use''' the dataset for training or validation, but '''only''' for informational purposes.

===Practice Data===
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

Giantsteps Dataset:

GiantSteps Audio: https://github.com/GiantSteps/giantsteps-tempo-dataset

GiantSteps Perceptual Annotations: http://www.tagtraum.com/download/schreiber_new_giantsteps_tempo.zip

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 8 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Potential Participants ==
name / email

2018:Audio Tempo Estimation

2018-08-03T01:00:08Z

Aggelos Gkiokas: /* Evaluation of tempo extraction algorithms */

== Description ==
This task compares current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for the extraction of perceptual tempo.

We differentiate between notated tempo and perceived tempo. If you have the notated tempo (e.g., from the score) it is straightforward attach a tempo annotation to an excerpt and run a contest for algorithms to predict the notated tempo. For excerpts for which we have no "official" tempo annotation, we can also annotate the *perceived* tempo. This is not a straightforward task and needs to be done carefully. If you ask a group of listeners (including skilled musicians) to annotate the tempo of music excerpts, they can give you different answers (they tap at different metrical levels) if they are unfamiliar with the piece. For some excerpts the perceived pulse or tempo is less ambiguous and everyone taps at the same metrical level, but for other excerpts the tempo can be quite ambiguous and you get a complete split across listeners.

The annotation of perceptual tempo can take several forms: a probability density function as a function of tempo; a series of tempos, ranked by their respective perceptual salience; etc. These measures of perceptual tempo can be used as a ground truth on which to test algorithms for tempo extraction. The dominant perceived tempo is sometimes the same as the notated tempo but not always. A piece of music can "feel" faster or slower than it's notated tempo in that the dominant perceived pulse can be a metrical level higher or lower than the notated tempo.

There are several reasons to examine the perceptual tempo, either in place of or in addition to the notated tempo. For many applications of automatic tempo extractors, the perceived tempo of the music is more relevant than the notated tempo. An automatic playlist generator or music navigator, for instance, might allow listeners to select or filter music by its (automatically extracted) tempo. In this case, the "feel", or perceptual tempo may be more relevant than the notated tempo. An automatic DJ apparatus might also perform better with a representation of perceived tempo rather than notated tempo.

A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.

== Data ==
=== Collections ===
This year algorithm will be evaluated on two datasets:

*MIREX 2006 Tempo dataset collected by Martin F. McKinney (Philips) and Dirk Moelants (IPEM, Ghent University). Composed of 160 30-second clips in WAV format with annotated tempos.
*GiantSteps tempo dataset (Knees et al. 2015), using the perceptual annotations by Schreiber and Müller (2018). This dataset exclusively features electronic dance music (EDM) and is publicly available. If you are interested in a fair and unbiased evaluation, you must not use the dataset for training or validation, but only for informational purposes.

=== Audio Formats ===
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* 30 second clips

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input data ===
Individual audio files in WAV format (30-second clips drawn from the 140 unseen tracks in the dataset). The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.

=== Output Data ===
Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength ST2 (not output) is simply 1 - ST1. The tempo estimates from each algorithm should be written to a text file in the following format:

T1<tab>T2<tab>ST1

E.g.
60 180 0.7

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the tempo estimation detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as ''%input'' and the output file path and name as ''%output'', a program called foobar could be called from the command-line as follows:

foobar %input %output
or
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

== Evaluation Procedures ==

This section focuses on the mechanics of the method while we discuss the data (music excerpts and perceptual data) in the next section. There are two general steps to the method: 1) collection of perceptual tempo annotations; and 2) evaluation of tempo extraction algorithms.

=== Perceptual tempo data collection ===

The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners were asked to tap to the beat of a series of musical excerpts. Responses were collected and their perceived tempo was calculated. For each excerpt, a distribution of perceived tempo was generated. A relatively simple form of perceived tempo was proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt were taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo.

Perceptual tempo data collection for the GiantSteps dataset (Knees et al. 2015) was conducted in an online tapping experiment described in detail in Schreiber und Müller (2018). Just like the original McKinney/Moelants dataset, its annotations feature two tempi and a relative salience value. The dataset is publicly available for inspection, but must not be used for training.

==== References ====
* McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://www-gewi.uni-graz.at/staff/parncutt/cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
* Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://icmpc8.umn.edu/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF
*Schreiber, H. and Müller, M. (2018), A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018. URL: http://www.tagtraum.com/download/2018_schreiber_tempo_giantsteps.pdf
*Knees, P. et al. (2015), Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Málaga, Spain, October 2015. URL: http://www.mtg.upf.edu/system/files/publications/246_Paper.pdf

=== Evaluation of tempo extraction algorithms ===
Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:

P = ST1 * TT1 + (1 - ST1) * TT2

where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to '''within 4%''', and TT2 is the ability of the algorithm to identify T2 to '''within 4%'''. No credit will be given for tempi other than T1 and T2.

Tempo tolerance has changed from '''8%''' to '''4%''' this year. However, in order to compare submitted methods with past years, results will be '''also reported''' for 8% tolerance.

The algorithm with the best average P-score will achieve the highest rank in the task.

== Relevant Test Collections ==
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of McKinney/Moelants previous experiments (See McKinney/Moelants ICMPC paper above).

Excerpts were selected to provide:

* stable tempo within each excerpt
* a good distribution of tempi across excerpts
* a large variety of instrumentation and beat strengths (with and without percussion)
* a variation of musical styles, including many non-western styles
* the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).

We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.

Regarding the GiantSteps tempo dataset (Knees et al. 2015), if you are interested in a '''fair and unbiased evaluation''', you must '''not use''' the dataset for training or validation, but '''only''' for informational purposes.

===Practice Data===
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

Giantsteps Dataset:

GiantSteps Audio: https://github.com/GiantSteps/giantsteps-tempo-dataset

GiantSteps Perceptual Annotations: http://www.tagtraum.com/download/schreiber_new_giantsteps_tempo.zip

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 8 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Potential Participants ==
name / email

2018:Audio Tempo Estimation

2018-08-03T00:56:33Z

Aggelos Gkiokas: /* Practice Data */

== Description ==
This task compares current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for the extraction of perceptual tempo.

We differentiate between notated tempo and perceived tempo. If you have the notated tempo (e.g., from the score) it is straightforward attach a tempo annotation to an excerpt and run a contest for algorithms to predict the notated tempo. For excerpts for which we have no "official" tempo annotation, we can also annotate the *perceived* tempo. This is not a straightforward task and needs to be done carefully. If you ask a group of listeners (including skilled musicians) to annotate the tempo of music excerpts, they can give you different answers (they tap at different metrical levels) if they are unfamiliar with the piece. For some excerpts the perceived pulse or tempo is less ambiguous and everyone taps at the same metrical level, but for other excerpts the tempo can be quite ambiguous and you get a complete split across listeners.

The annotation of perceptual tempo can take several forms: a probability density function as a function of tempo; a series of tempos, ranked by their respective perceptual salience; etc. These measures of perceptual tempo can be used as a ground truth on which to test algorithms for tempo extraction. The dominant perceived tempo is sometimes the same as the notated tempo but not always. A piece of music can "feel" faster or slower than it's notated tempo in that the dominant perceived pulse can be a metrical level higher or lower than the notated tempo.

There are several reasons to examine the perceptual tempo, either in place of or in addition to the notated tempo. For many applications of automatic tempo extractors, the perceived tempo of the music is more relevant than the notated tempo. An automatic playlist generator or music navigator, for instance, might allow listeners to select or filter music by its (automatically extracted) tempo. In this case, the "feel", or perceptual tempo may be more relevant than the notated tempo. An automatic DJ apparatus might also perform better with a representation of perceived tempo rather than notated tempo.

A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.

== Data ==
=== Collections ===
This year algorithm will be evaluated on two datasets:

*MIREX 2006 Tempo dataset collected by Martin F. McKinney (Philips) and Dirk Moelants (IPEM, Ghent University). Composed of 160 30-second clips in WAV format with annotated tempos.
*GiantSteps tempo dataset (Knees et al. 2015), using the perceptual annotations by Schreiber and Müller (2018). This dataset exclusively features electronic dance music (EDM) and is publicly available. If you are interested in a fair and unbiased evaluation, you must not use the dataset for training or validation, but only for informational purposes.

=== Audio Formats ===
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* 30 second clips

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input data ===
Individual audio files in WAV format (30-second clips drawn from the 140 unseen tracks in the dataset). The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.

=== Output Data ===
Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength ST2 (not output) is simply 1 - ST1. The tempo estimates from each algorithm should be written to a text file in the following format:

T1<tab>T2<tab>ST1

E.g.
60 180 0.7

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the tempo estimation detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as ''%input'' and the output file path and name as ''%output'', a program called foobar could be called from the command-line as follows:

foobar %input %output
or
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

== Evaluation Procedures ==

This section focuses on the mechanics of the method while we discuss the data (music excerpts and perceptual data) in the next section. There are two general steps to the method: 1) collection of perceptual tempo annotations; and 2) evaluation of tempo extraction algorithms.

=== Perceptual tempo data collection ===

The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners were asked to tap to the beat of a series of musical excerpts. Responses were collected and their perceived tempo was calculated. For each excerpt, a distribution of perceived tempo was generated. A relatively simple form of perceived tempo was proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt were taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo.

Perceptual tempo data collection for the GiantSteps dataset (Knees et al. 2015) was conducted in an online tapping experiment described in detail in Schreiber und Müller (2018). Just like the original McKinney/Moelants dataset, its annotations feature two tempi and a relative salience value. The dataset is publicly available for inspection, but must not be used for training.

==== References ====
* McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://www-gewi.uni-graz.at/staff/parncutt/cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
* Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://icmpc8.umn.edu/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF
*Schreiber, H. and Müller, M. (2018), A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018. URL: http://www.tagtraum.com/download/2018_schreiber_tempo_giantsteps.pdf
*Knees, P. et al. (2015), Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Málaga, Spain, October 2015. URL: http://www.mtg.upf.edu/system/files/publications/246_Paper.pdf

=== Evaluation of tempo extraction algorithms ===
Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:

P = ST1 * TT1 + (1 - ST1) * TT2

where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to within 8%, and TT2 is the ability of the algorithm to identify T2 to within 8%. No credit will be given for tempi other than T1 and T2.

The algorithm with the best average P-score will achieve the highest rank in the task.

== Relevant Test Collections ==
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of McKinney/Moelants previous experiments (See McKinney/Moelants ICMPC paper above).

Excerpts were selected to provide:

* stable tempo within each excerpt
* a good distribution of tempi across excerpts
* a large variety of instrumentation and beat strengths (with and without percussion)
* a variation of musical styles, including many non-western styles
* the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).

We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.

Regarding the GiantSteps tempo dataset (Knees et al. 2015), if you are interested in a '''fair and unbiased evaluation''', you must '''not use''' the dataset for training or validation, but '''only''' for informational purposes.

===Practice Data===
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

Giantsteps Dataset:

GiantSteps Audio: https://github.com/GiantSteps/giantsteps-tempo-dataset

GiantSteps Perceptual Annotations: http://www.tagtraum.com/download/schreiber_new_giantsteps_tempo.zip

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 8 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Potential Participants ==
name / email

2018:Audio Tempo Estimation

2018-08-03T00:56:11Z

Aggelos Gkiokas: /* Relevant Test Collections */

== Description ==
This task compares current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for the extraction of perceptual tempo.

We differentiate between notated tempo and perceived tempo. If you have the notated tempo (e.g., from the score) it is straightforward attach a tempo annotation to an excerpt and run a contest for algorithms to predict the notated tempo. For excerpts for which we have no "official" tempo annotation, we can also annotate the *perceived* tempo. This is not a straightforward task and needs to be done carefully. If you ask a group of listeners (including skilled musicians) to annotate the tempo of music excerpts, they can give you different answers (they tap at different metrical levels) if they are unfamiliar with the piece. For some excerpts the perceived pulse or tempo is less ambiguous and everyone taps at the same metrical level, but for other excerpts the tempo can be quite ambiguous and you get a complete split across listeners.

The annotation of perceptual tempo can take several forms: a probability density function as a function of tempo; a series of tempos, ranked by their respective perceptual salience; etc. These measures of perceptual tempo can be used as a ground truth on which to test algorithms for tempo extraction. The dominant perceived tempo is sometimes the same as the notated tempo but not always. A piece of music can "feel" faster or slower than it's notated tempo in that the dominant perceived pulse can be a metrical level higher or lower than the notated tempo.

There are several reasons to examine the perceptual tempo, either in place of or in addition to the notated tempo. For many applications of automatic tempo extractors, the perceived tempo of the music is more relevant than the notated tempo. An automatic playlist generator or music navigator, for instance, might allow listeners to select or filter music by its (automatically extracted) tempo. In this case, the "feel", or perceptual tempo may be more relevant than the notated tempo. An automatic DJ apparatus might also perform better with a representation of perceived tempo rather than notated tempo.

A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.

== Data ==
=== Collections ===
This year algorithm will be evaluated on two datasets:

*MIREX 2006 Tempo dataset collected by Martin F. McKinney (Philips) and Dirk Moelants (IPEM, Ghent University). Composed of 160 30-second clips in WAV format with annotated tempos.
*GiantSteps tempo dataset (Knees et al. 2015), using the perceptual annotations by Schreiber and Müller (2018). This dataset exclusively features electronic dance music (EDM) and is publicly available. If you are interested in a fair and unbiased evaluation, you must not use the dataset for training or validation, but only for informational purposes.

=== Audio Formats ===
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* 30 second clips

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input data ===
Individual audio files in WAV format (30-second clips drawn from the 140 unseen tracks in the dataset). The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.

=== Output Data ===
Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength ST2 (not output) is simply 1 - ST1. The tempo estimates from each algorithm should be written to a text file in the following format:

T1<tab>T2<tab>ST1

E.g.
60 180 0.7

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the tempo estimation detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as ''%input'' and the output file path and name as ''%output'', a program called foobar could be called from the command-line as follows:

foobar %input %output
or
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

== Evaluation Procedures ==

This section focuses on the mechanics of the method while we discuss the data (music excerpts and perceptual data) in the next section. There are two general steps to the method: 1) collection of perceptual tempo annotations; and 2) evaluation of tempo extraction algorithms.

=== Perceptual tempo data collection ===

The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners were asked to tap to the beat of a series of musical excerpts. Responses were collected and their perceived tempo was calculated. For each excerpt, a distribution of perceived tempo was generated. A relatively simple form of perceived tempo was proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt were taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo.

Perceptual tempo data collection for the GiantSteps dataset (Knees et al. 2015) was conducted in an online tapping experiment described in detail in Schreiber und Müller (2018). Just like the original McKinney/Moelants dataset, its annotations feature two tempi and a relative salience value. The dataset is publicly available for inspection, but must not be used for training.

==== References ====
* McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://www-gewi.uni-graz.at/staff/parncutt/cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
* Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://icmpc8.umn.edu/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF
*Schreiber, H. and Müller, M. (2018), A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018. URL: http://www.tagtraum.com/download/2018_schreiber_tempo_giantsteps.pdf
*Knees, P. et al. (2015), Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Málaga, Spain, October 2015. URL: http://www.mtg.upf.edu/system/files/publications/246_Paper.pdf

=== Evaluation of tempo extraction algorithms ===
Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:

P = ST1 * TT1 + (1 - ST1) * TT2

where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to within 8%, and TT2 is the ability of the algorithm to identify T2 to within 8%. No credit will be given for tempi other than T1 and T2.

The algorithm with the best average P-score will achieve the highest rank in the task.

== Relevant Test Collections ==
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of McKinney/Moelants previous experiments (See McKinney/Moelants ICMPC paper above).

Excerpts were selected to provide:

* stable tempo within each excerpt
* a good distribution of tempi across excerpts
* a large variety of instrumentation and beat strengths (with and without percussion)
* a variation of musical styles, including many non-western styles
* the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).

We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.

Regarding the GiantSteps tempo dataset (Knees et al. 2015), if you are interested in a '''fair and unbiased evaluation''', you must '''not use''' the dataset for training or validation, but '''only''' for informational purposes.

===Practice Data===
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

Giantsteps Dataset:

GiantSteps Audio: https://github.com/GiantSteps/giantsteps-tempo-dataset
GiantSteps Perceptual Annotations: http://www.tagtraum.com/download/schreiber_new_giantsteps_tempo.zip

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 8 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Potential Participants ==
name / email

2018:Audio Tempo Estimation

2018-08-03T00:55:03Z

Aggelos Gkiokas: /* Relevant Test Collections */

== Description ==
This task compares current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for the extraction of perceptual tempo.

We differentiate between notated tempo and perceived tempo. If you have the notated tempo (e.g., from the score) it is straightforward attach a tempo annotation to an excerpt and run a contest for algorithms to predict the notated tempo. For excerpts for which we have no "official" tempo annotation, we can also annotate the *perceived* tempo. This is not a straightforward task and needs to be done carefully. If you ask a group of listeners (including skilled musicians) to annotate the tempo of music excerpts, they can give you different answers (they tap at different metrical levels) if they are unfamiliar with the piece. For some excerpts the perceived pulse or tempo is less ambiguous and everyone taps at the same metrical level, but for other excerpts the tempo can be quite ambiguous and you get a complete split across listeners.

The annotation of perceptual tempo can take several forms: a probability density function as a function of tempo; a series of tempos, ranked by their respective perceptual salience; etc. These measures of perceptual tempo can be used as a ground truth on which to test algorithms for tempo extraction. The dominant perceived tempo is sometimes the same as the notated tempo but not always. A piece of music can "feel" faster or slower than it's notated tempo in that the dominant perceived pulse can be a metrical level higher or lower than the notated tempo.

There are several reasons to examine the perceptual tempo, either in place of or in addition to the notated tempo. For many applications of automatic tempo extractors, the perceived tempo of the music is more relevant than the notated tempo. An automatic playlist generator or music navigator, for instance, might allow listeners to select or filter music by its (automatically extracted) tempo. In this case, the "feel", or perceptual tempo may be more relevant than the notated tempo. An automatic DJ apparatus might also perform better with a representation of perceived tempo rather than notated tempo.

A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.

== Data ==
=== Collections ===
This year algorithm will be evaluated on two datasets:

*MIREX 2006 Tempo dataset collected by Martin F. McKinney (Philips) and Dirk Moelants (IPEM, Ghent University). Composed of 160 30-second clips in WAV format with annotated tempos.
*GiantSteps tempo dataset (Knees et al. 2015), using the perceptual annotations by Schreiber and Müller (2018). This dataset exclusively features electronic dance music (EDM) and is publicly available. If you are interested in a fair and unbiased evaluation, you must not use the dataset for training or validation, but only for informational purposes.

=== Audio Formats ===
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* 30 second clips

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input data ===
Individual audio files in WAV format (30-second clips drawn from the 140 unseen tracks in the dataset). The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.

=== Output Data ===
Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength ST2 (not output) is simply 1 - ST1. The tempo estimates from each algorithm should be written to a text file in the following format:

T1<tab>T2<tab>ST1

E.g.
60 180 0.7

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the tempo estimation detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as ''%input'' and the output file path and name as ''%output'', a program called foobar could be called from the command-line as follows:

foobar %input %output
or
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

== Evaluation Procedures ==

This section focuses on the mechanics of the method while we discuss the data (music excerpts and perceptual data) in the next section. There are two general steps to the method: 1) collection of perceptual tempo annotations; and 2) evaluation of tempo extraction algorithms.

=== Perceptual tempo data collection ===

The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners were asked to tap to the beat of a series of musical excerpts. Responses were collected and their perceived tempo was calculated. For each excerpt, a distribution of perceived tempo was generated. A relatively simple form of perceived tempo was proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt were taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo.

Perceptual tempo data collection for the GiantSteps dataset (Knees et al. 2015) was conducted in an online tapping experiment described in detail in Schreiber und Müller (2018). Just like the original McKinney/Moelants dataset, its annotations feature two tempi and a relative salience value. The dataset is publicly available for inspection, but must not be used for training.

==== References ====
* McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://www-gewi.uni-graz.at/staff/parncutt/cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
* Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://icmpc8.umn.edu/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF
*Schreiber, H. and Müller, M. (2018), A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018. URL: http://www.tagtraum.com/download/2018_schreiber_tempo_giantsteps.pdf
*Knees, P. et al. (2015), Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Málaga, Spain, October 2015. URL: http://www.mtg.upf.edu/system/files/publications/246_Paper.pdf

=== Evaluation of tempo extraction algorithms ===
Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:

P = ST1 * TT1 + (1 - ST1) * TT2

where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to within 8%, and TT2 is the ability of the algorithm to identify T2 to within 8%. No credit will be given for tempi other than T1 and T2.

The algorithm with the best average P-score will achieve the highest rank in the task.

== Relevant Test Collections ==
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of McKinney/Moelants previous experiments (See McKinney/Moelants ICMPC paper above).

Excerpts were selected to provide:

* stable tempo within each excerpt
* a good distribution of tempi across excerpts
* a large variety of instrumentation and beat strengths (with and without percussion)
* a variation of musical styles, including many non-western styles
* the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).

We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.

Regarding the GiantSteps tempo dataset (Knees et al. 2015), if you are interested in a fair and unbiased evaluation, you must not use the dataset for training or validation, but only for informational purposes.

===Practice Data===
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

Giantsteps Dataset:

GiantSteps Audio: https://github.com/GiantSteps/giantsteps-tempo-dataset
GiantSteps Perceptual Annotations: http://www.tagtraum.com/download/schreiber_new_giantsteps_tempo.zip

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 8 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Potential Participants ==
name / email

2018:Audio Tempo Estimation

2018-08-03T00:52:37Z

Aggelos Gkiokas: /* Perceptual tempo data collection */

== Description ==
This task compares current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for the extraction of perceptual tempo.

We differentiate between notated tempo and perceived tempo. If you have the notated tempo (e.g., from the score) it is straightforward attach a tempo annotation to an excerpt and run a contest for algorithms to predict the notated tempo. For excerpts for which we have no "official" tempo annotation, we can also annotate the *perceived* tempo. This is not a straightforward task and needs to be done carefully. If you ask a group of listeners (including skilled musicians) to annotate the tempo of music excerpts, they can give you different answers (they tap at different metrical levels) if they are unfamiliar with the piece. For some excerpts the perceived pulse or tempo is less ambiguous and everyone taps at the same metrical level, but for other excerpts the tempo can be quite ambiguous and you get a complete split across listeners.

The annotation of perceptual tempo can take several forms: a probability density function as a function of tempo; a series of tempos, ranked by their respective perceptual salience; etc. These measures of perceptual tempo can be used as a ground truth on which to test algorithms for tempo extraction. The dominant perceived tempo is sometimes the same as the notated tempo but not always. A piece of music can "feel" faster or slower than it's notated tempo in that the dominant perceived pulse can be a metrical level higher or lower than the notated tempo.

There are several reasons to examine the perceptual tempo, either in place of or in addition to the notated tempo. For many applications of automatic tempo extractors, the perceived tempo of the music is more relevant than the notated tempo. An automatic playlist generator or music navigator, for instance, might allow listeners to select or filter music by its (automatically extracted) tempo. In this case, the "feel", or perceptual tempo may be more relevant than the notated tempo. An automatic DJ apparatus might also perform better with a representation of perceived tempo rather than notated tempo.

A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.

== Data ==
=== Collections ===
This year algorithm will be evaluated on two datasets:

*MIREX 2006 Tempo dataset collected by Martin F. McKinney (Philips) and Dirk Moelants (IPEM, Ghent University). Composed of 160 30-second clips in WAV format with annotated tempos.
*GiantSteps tempo dataset (Knees et al. 2015), using the perceptual annotations by Schreiber and Müller (2018). This dataset exclusively features electronic dance music (EDM) and is publicly available. If you are interested in a fair and unbiased evaluation, you must not use the dataset for training or validation, but only for informational purposes.

=== Audio Formats ===
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* 30 second clips

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input data ===
Individual audio files in WAV format (30-second clips drawn from the 140 unseen tracks in the dataset). The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.

=== Output Data ===
Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength ST2 (not output) is simply 1 - ST1. The tempo estimates from each algorithm should be written to a text file in the following format:

T1<tab>T2<tab>ST1

E.g.
60 180 0.7

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the tempo estimation detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as ''%input'' and the output file path and name as ''%output'', a program called foobar could be called from the command-line as follows:

foobar %input %output
or
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

== Evaluation Procedures ==

This section focuses on the mechanics of the method while we discuss the data (music excerpts and perceptual data) in the next section. There are two general steps to the method: 1) collection of perceptual tempo annotations; and 2) evaluation of tempo extraction algorithms.

=== Perceptual tempo data collection ===

The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners were asked to tap to the beat of a series of musical excerpts. Responses were collected and their perceived tempo was calculated. For each excerpt, a distribution of perceived tempo was generated. A relatively simple form of perceived tempo was proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt were taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo.

Perceptual tempo data collection for the GiantSteps dataset (Knees et al. 2015) was conducted in an online tapping experiment described in detail in Schreiber und Müller (2018). Just like the original McKinney/Moelants dataset, its annotations feature two tempi and a relative salience value. The dataset is publicly available for inspection, but must not be used for training.

==== References ====
* McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://www-gewi.uni-graz.at/staff/parncutt/cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
* Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://icmpc8.umn.edu/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF
*Schreiber, H. and Müller, M. (2018), A Crowdsourced Experiment for Tempo Estimation of Electronic Dance Music. In Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR), Paris, France, Sept. 2018. URL: http://www.tagtraum.com/download/2018_schreiber_tempo_giantsteps.pdf
*Knees, P. et al. (2015), Two data sets for tempo estimation and key detection in electronic dance music annotated from user corrections. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR), Málaga, Spain, October 2015. URL: http://www.mtg.upf.edu/system/files/publications/246_Paper.pdf

=== Evaluation of tempo extraction algorithms ===
Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:

P = ST1 * TT1 + (1 - ST1) * TT2

where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to within 8%, and TT2 is the ability of the algorithm to identify T2 to within 8%. No credit will be given for tempi other than T1 and T2.

The algorithm with the best average P-score will achieve the highest rank in the task.

== Relevant Test Collections ==
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of McKinney/Moelants previous experiments (See McKinney/Moelants ICMPC paper above).

Excerpts were selected to provide:

* stable tempo within each excerpt
* a good distribution of tempi across excerpts
* a large variety of instrumentation and beat strengths (with and without percussion)
* a variation of musical styles, including many non-western styles
* the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).

We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.

===Practice Data===
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 8 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Potential Participants ==
name / email

2018:Audio Tempo Estimation

2018-08-03T00:51:28Z

Aggelos Gkiokas: /* Collections */

== Description ==
This task compares current methods for the extraction of tempo from musical audio. We distinguish between notated tempo and perceptual tempo and will test for the extraction of perceptual tempo.

We differentiate between notated tempo and perceived tempo. If you have the notated tempo (e.g., from the score) it is straightforward attach a tempo annotation to an excerpt and run a contest for algorithms to predict the notated tempo. For excerpts for which we have no "official" tempo annotation, we can also annotate the *perceived* tempo. This is not a straightforward task and needs to be done carefully. If you ask a group of listeners (including skilled musicians) to annotate the tempo of music excerpts, they can give you different answers (they tap at different metrical levels) if they are unfamiliar with the piece. For some excerpts the perceived pulse or tempo is less ambiguous and everyone taps at the same metrical level, but for other excerpts the tempo can be quite ambiguous and you get a complete split across listeners.

The annotation of perceptual tempo can take several forms: a probability density function as a function of tempo; a series of tempos, ranked by their respective perceptual salience; etc. These measures of perceptual tempo can be used as a ground truth on which to test algorithms for tempo extraction. The dominant perceived tempo is sometimes the same as the notated tempo but not always. A piece of music can "feel" faster or slower than it's notated tempo in that the dominant perceived pulse can be a metrical level higher or lower than the notated tempo.

There are several reasons to examine the perceptual tempo, either in place of or in addition to the notated tempo. For many applications of automatic tempo extractors, the perceived tempo of the music is more relevant than the notated tempo. An automatic playlist generator or music navigator, for instance, might allow listeners to select or filter music by its (automatically extracted) tempo. In this case, the "feel", or perceptual tempo may be more relevant than the notated tempo. An automatic DJ apparatus might also perform better with a representation of perceived tempo rather than notated tempo.

A more pragmatic reason for using perceptual tempo rather than notated tempo as a ground truth for our contest is that we simply do not have the notated tempo of our test set. If we notate it by having a panel of expert listeners tap along and label the excerpts, we are by default dealing with the perceived tempo. The handling of this data as ground truth must be done with care.

== Data ==
=== Collections ===
This year algorithm will be evaluated on two datasets:

*MIREX 2006 Tempo dataset collected by Martin F. McKinney (Philips) and Dirk Moelants (IPEM, Ghent University). Composed of 160 30-second clips in WAV format with annotated tempos.
*GiantSteps tempo dataset (Knees et al. 2015), using the perceptual annotations by Schreiber and Müller (2018). This dataset exclusively features electronic dance music (EDM) and is publicly available. If you are interested in a fair and unbiased evaluation, you must not use the dataset for training or validation, but only for informational purposes.

=== Audio Formats ===
The data are monophonic sound files, with the associated onset times and data about the annotation robustness.

* CD-quality (PCM, 16-bit, 44100 Hz)
* single channel (mono)
* 30 second clips

== Submission Format ==
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

=== Input data ===
Individual audio files in WAV format (30-second clips drawn from the 140 unseen tracks in the dataset). The audio recordings were selected to provide a stable tempo value, a wide distribution of tempi values, and a large variety of instrumentation and musical styles. About 20% of the files contain non-binary meters, and a small number of examples contain changing meters.

=== Output Data ===
Submitted programs should output two tempi (a slower tempo, T1, and a faster tempo, T2) as well as the strength of T1 relative to T2 (0-1). The relative strength ST2 (not output) is simply 1 - ST1. The tempo estimates from each algorithm should be written to a text file in the following format:

T1<tab>T2<tab>ST1

E.g.
60 180 0.7

=== Algorithm Calling Format ===

The submitted algorithm must take as arguments a SINGLE .wav file to perform the tempo estimation detection on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as ''%input'' and the output file path and name as ''%output'', a program called foobar could be called from the command-line as follows:

foobar %input %output
or
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

=== README File ===

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

== Evaluation Procedures ==

This section focuses on the mechanics of the method while we discuss the data (music excerpts and perceptual data) in the next section. There are two general steps to the method: 1) collection of perceptual tempo annotations; and 2) evaluation of tempo extraction algorithms.

=== Perceptual tempo data collection ===

The following procedure is described in more detail in McKinney and Moelants (2004) and Moelants and McKinney (2004). Listeners were asked to tap to the beat of a series of musical excerpts. Responses were collected and their perceived tempo was calculated. For each excerpt, a distribution of perceived tempo was generated. A relatively simple form of perceived tempo was proposed for this contest: The two highest peaks in the perceived tempo distribution for each excerpt were taken, along with their respective heights (normalized to sum to 1.0) as the two tempo candidates for that particular excerpt. The height of a peak in the distribution is assumed to represent the perceptual salience of that tempo.

==== References ====
* McKinney, M.F. and Moelants, D. (2004), Deviations from the resonance theory of tempo induction, Conference on Interdisciplinary Musicology, Graz. URL: http://www-gewi.uni-graz.at/staff/parncutt/cim04/CIM04_paper_pdf/McKinney_Moelants_CIM04_proceedings_t.pdf
* Moelants, D. and McKinney, M.F. (2004), Tempo perception and musical content: What makes a piece slow, fast, or temporally ambiguous? International Conference on Music Perception & Cognition, Evanston, IL. URL: http://icmpc8.umn.edu/proceedings/ICMPC8/PDF/AUTHOR/MP040237.PDF

=== Evaluation of tempo extraction algorithms ===
Algorithms will process musical excerpts and return the following data: Two tempi in BPM (T1 and T2, where T1 is the slower of the two tempi). For a given algorithm, the performance, P, for each audio excerpt will be given by the following equation:

P = ST1 * TT1 + (1 - ST1) * TT2

where ST1 is the relative perceptual strength of T1 (given by groundtruth data, varies from 0 to 1.0), TT1 is the ability of the algorithm to identify T1 to within 8%, and TT2 is the ability of the algorithm to identify T2 to within 8%. No credit will be given for tempi other than T1 and T2.

The algorithm with the best average P-score will achieve the highest rank in the task.

== Relevant Test Collections ==
We will use a collection of 160 musical exerpts for the evaluation procedure. 40 of the excerpts have been taken from one of McKinney/Moelants previous experiments (See McKinney/Moelants ICMPC paper above).

Excerpts were selected to provide:

* stable tempo within each excerpt
* a good distribution of tempi across excerpts
* a large variety of instrumentation and beat strengths (with and without percussion)
* a variation of musical styles, including many non-western styles
* the presence of non-binary meters (about 20% have a ternary element and there are a few examples with odd or changing meter).

We will provide 20 excerpts with ground truth data for participants to try/tune their algorithms before submission. The remaining 140 excerpts will be novel to all participants.

===Practice Data===
You can find it here:

https://www.music-ir.org/evaluation/MIREX/data/2006/beat/

User: beattrack Password: b34trx

https://www.music-ir.org/evaluation/MIREX/data/2006/tempo/

User: tempo Password: t3mp0

Data has been uploaded in both .tgz and .zip format.

== Time and hardware limits ==
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 8 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

== Potential Participants ==
name / email