Revision as of 08:38, 22 July 2014

Description

This task is new for 2014!

This text has been adapted from the Audio Beat Tracking Wiki page. Please add your comments and discussion at the bottom of this page.

The aim of the automatic downbeat estimation task is to identify the locations of downbeats in a collection of sound files. While this is similar to the Audio Beat Tracking task, here the aim is to find the first beat of each bar (measure) rather than all beat times. Algorithms are not required to estimate beat times or time-signature in addition to downbeats.

Submitted algorithms will be evaluated in terms of their accuracy in finding downbeat locations (only) as annotated by musical experts across several diverse datasets.

Update 27/06/14 A small set of training data is being prepared and will be available soon.

Data

Collections

Ballroom The ballroom dataset contains eight different dance styles (Cha Cha, Jive, Quickstep, Rumba, Samba, Tango, Viennese Waltz and Waltz). It consists of 697 excerpts of 30s in duration. The dataset contains two different meters 3/4 and 4/4 - but all pieces have constant meter. For further information see Dixon et al (2004) and Krebs et al (2013). Note, we are using the ground truth annotations from Krebs et al. (2013) available at https://github.com/CPJKU/BallroomAnnotations

Isophonics (Beatles only) The Beatles dataset from the Centre for Digital Music at Queen Mary, University of London (http://www.isophonics.net/), as also used for Audio Chord Estimation in MIREX for many years. This dataset contains 179 complete songs (all except Revolution 9), the majority of which are in 4/4. For further information see Mauch et al (2009).

Turkish Data The Turkish corpus is an extended version of the annotated data used in Srinivasamurthy et al. (2014). It includes 85 excerpts of one minute length each, and each piece belongs to one of four rhythm classes that are referred to as usul in Turkish Art music. 30 pieces are in the 9/8-usul Aksak, 18 pieces in the 10/8-usul Curcuna, 28 samples in the 8/8-usul Düyek, and 9 samples in the 4/4 Sofyan.

Cretan Data The corpus of Cretan music consists of 42 full length pieces of Cretan leaping dances. While there are several dances that differ in terms of their steps, the differences in the sound are most noticeable in the melodic content, and all pieces can be considered to belong to one rhythmic style. All these dances are usually notated using a 2/4 time signature, and the accompanying rhythmical patterns are usually played on a Cretan lute. While a variety of rhythmic patterns exist, they do not relate to a specific dance and can be assumed to occur in all of the 42 songs in this corpus.

Carnatic Data The Carnatic music dataset is a subset of the CompMusic Carnatic Music Rhythm Dataset. It includes 118 two minute long excerpts spanning four most commonly used tālas (the rhythmic framework of Carnatic music, consisting of time cycles) of Carnatic music. There are 30 examples in each of ādi tāla (8 beats/cycle), rūpaka tāla (3 beats/cycle) and miśra chāpu tāla (7 beats/cycle), and 28 examples in khaṇḍa chāpu tāla (5 beats/cycle). The beats of the tāla in miśra chāpu and khaṇḍa chāpu are non-uniform, but for consistency with other datasets, a uniform beat pulse was obtained by interpolating the non-uniformly spaced beat locations. The recordings consist of both vocal and instrumental music recordings representative of the present day performance practice. All recordings contain percussion accompaniment, mainly the Mridangam.

HJDB (to be confirmed) The HJDB dataset contains 236 excerpts of Hardcore, Jungle and Drum and Bass music between 30s and 2 minutes in length. All excerpts are in 4/4 and have a constant tempo. For further information see Hockman et al (2012).

In total this makes 1357 excerpts (of which 259 are full length songs).

Audio Formats

The data are monophonic sound files

CD-quality (PCM, 16-bit, 44100 Hz) for all except Ballroom (originally lower quality, but resampled to 44100 Hz)
single channel (mono)

Submission Format

Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

Input Data

Participating algorithms will have to read audio in the following format:

Sample rate: 44.1 KHz
Sample size: 16 bit
Number of channels: 1 (mono)
Encoding: WAV

Output Data

The downbeat estimation algorithms will return downbeat times in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

Output File Format (Audio Downbeat Estimation)

The downbeat output file format is an ASCII text format. Each downbeat time is specified, in seconds, on its own line. Specifically,

<downbeat time (in seconds)>\n

where \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

0.243
1.486
2.729

Algorithm Calling Format

The submitted algorithm must take as arguments a SINGLE .wav file to perform the downbeat estimation as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, such as a detection threshold, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

README File

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with different values for parameters param1, the README file would look like:

foobar -param1 .1 -i %input -o %output
foobar -param1 .15 -i %input -o %output
foobar -param1 .2 -i %input -o %output
foobar -param1 .25 -i %input -o %output
foobar -param1 .3 -i %input -o %output
...

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"
matlab -r "foobar(.15,'%input','%output');quit;"
matlab -r "foobar(.2,'%input','%output');quit;" 
matlab -r "foobar(.25,'%input','%output');quit;"
matlab -r "foobar(.3,'%input','%output');quit;"
...

The different command lines to evaluate the performance of each parameter set over the whole database will be generated automatically from each line in the README file containing both '%input' and '%output' strings.

Evaluation Procedures

For the evalution procedure we will use

F-measure - the standard calculation as used in onset and beat tracking evaluation with a +/-70ms window, see Dixon (2007).

Given the high diversity of musical styles included in the task, results will be reported per each invidiual dataset.

Time and hardware limits

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 24 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result.

Potential Participants

name / email

Jose R. Zapata / joser.zapata (at) upb.edu.co

Discussion

name / email

Bibliography

S. Dixon, F. Gouyon and G. Widmer, Towards Characterisation of Music via Rhythmic Patterns, In Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR 2004), pp 509-516.

S. Dixon, Evaluation of audio beat tracking system BeatRoot, Journal of New Music Research, vol. 36, no. 1, pp. 39-51, 2007.

J. A. Hockman, M. E. P. Davies, I. Fujinaga.ONE IN THE JUNGLE: Downbeat Detection in Hardcore, Jungle, and Drum and Bass, In Proceedings of 13th International Society for Music Information Retrieval Conference (ISMIR), Porto, Portugal pp. 169-174, 2012.

F. Krebs, S. Boeck, and G. Widmer, Rhythmic Pattern Modeling for Beat- and Downbeat Tracking in Musical Audio, In Proceedings of 14th International Society for Music Information Retrieval Conference (ISMIR), Curitiba, Brazil, 2013.

M. Mauch, C. Cannam, M. E. P. Davies, S. Dixon, C. Harte, S. Kolozali and D. Tidhar, OMRAS2 Metadata Project 2009, Late-breaking session at the 10th International Conference on Music Information Retrieval, 2009.

A. Srinivasamurthy, A. Holzapfel, and Xavier Serra, In Search of Automatic Rhythm Analysis Methods for Turkish and Indian Art Music, Journal of New Music Research, vol. 43, no. 1, pp. 94-114, 2014.

@@ Line 25: / Line 25: @@
 '''Turkish Data'''
-The Turkish corpus is an extended version of the annotated data used in Srinivasamurthy et al. (2014). It includes '''82''' excerpts of one
+The Turkish corpus is an extended version of the annotated data used in Srinivasamurthy et al. (2014). It includes '''85''' excerpts of one
-minute length each, and each piece belongs to one of three
+minute length each, and each piece belongs to one of four
 rhythm classes that are referred to as usul in Turkish Art
-music. 32 pieces are in the 9/8-usul Aksak, 20 pieces
+music. 30 pieces are in the 9/8-usul Aksak, 18 pieces
-in the 10/8-usul Curcuna, and 30 samples in the 8/8-usul
+in the 10/8-usul Curcuna, 28 samples in the 8/8-usul
-Düyek.
+Düyek, and 9 samples in the 4/4 Sofyan.
 '''Cretan Data'''
@@ Line 45: / Line 45: @@
 For further information see Hockman et al (2012).
-In total this makes '''1354''' excerpts (of which 259 are full length songs).
+In total this makes '''1357''' excerpts (of which 259 are full length songs).
 === Audio Formats ===

Difference between revisions of "2014:Audio Downbeat Estimation"

Revision as of 08:38, 22 July 2014

Contents

Description

Data

Collections

Audio Formats

Submission Format

Input Data

Output Data

Output File Format (Audio Downbeat Estimation)

Algorithm Calling Format

README File

Evaluation Procedures

Time and hardware limits

Potential Participants

Discussion

Bibliography

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools