Difference between revisions of "2010:Audio Key Detection"

From MIREX Wiki
m (moved 2010:Audio Key detection to 2010:Audio Key Detection: better capitalization)
Line 3: Line 3:
 
Determination of the key is a prerequisite for any analysis of tonal music. As a result, extensive work has been done in the area of automatic key detection. The goal of this task is the identification of the key from music in audio format.
 
Determination of the key is a prerequisite for any analysis of tonal music. As a result, extensive work has been done in the area of automatic key detection. The goal of this task is the identification of the key from music in audio format.
  
 +
== Data ==
 +
=== Collections ===
 +
The collection used for this year's evaluation is the same as the one used in 2005. It consists of 1252 classical music audio pieces rendered from MIDI using the timidity MIDI synthesizer.
  
 +
=== Audio Formats ===
  
==System Specs==
+
* CD-quality (PCM, 16-bit, 44100 Hz)
'''Input''': Call to individual .wav or .mid files, or an ASCII file list of all files (with full paths).
+
* single channel (mono)
  
'''Ground-truth''': One ground-truth file per .wav file, in ASCII tab delimited format:
+
== Submission Format ==
<pitch (e.g. Ab, A, A#, Bb, B …, G#>\t< major or minor>\n
 
where the < and > characters are not included and \t denotes a tab and \n denotes a new line.
 
  
Note: The framework is aware of the equivalence of certain notes and will handle the mapping internally.
+
Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.
  
'''Output''': One output file per .wav file, in ASCII tab delimited format:
+
=== Input Data ===
<pitch (e.g. Ab, A, A#, Bb, B …, G#>\t< major or minor>\n
+
Participating algorithms will have to read audio in the following format:
  
'''Audio''': (PCM, 16-bit, 44100 Hz) single channel (mono) Excerpts synthesized from MIDI
+
* Sample rate: 44.1 KHz
 +
* Sample size: 16 bit
 +
* Number of channels: 1 (mono)
 +
* Encoding: WAV
  
'''MIDI''': Excerpts of MIDI files
+
=== Output Data ===
  
 +
The audio key detection algorithms will return the estimated key in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.
  
==Evaluation Procedures==
+
=== Output File Format (Audio Key Detection) ===
  
'''Test Set''': The test set we propose to use will consist of pieces for which the keys are known. For example, symphonies and concertos by well-known composers often have the keys stated in the title of the piece. The excerpts will typically be the beginnings of the pieces as this is one part of the piece for which establishing of the global and known key can be guaranteed. Different excerpt durations will be considered: 30 seconds, 20 seconds and 10 seconds.
+
The Audio Key Detection output file format is a single-line tab-delimited ASCII text format. The tonic is reported, followed by a TAB and the mode. For sharps, the "#" symbol is used (e.g. A# for A sharp), for flats, a lowercase "b" is used, e.g. (Bb for B flat). Therefore, the output file should be of the form:
  
'''Input/Output''': The input to the system should be some musical excerpt (either audio or MIDI) and the output should be a key name, for example C major or E flat minor. Only pitch class numbers will be taken into account during evaluation, for instance C sharp major and D flat major will be considered equivalent.
+
<tonic {A, A#, Bb, ...}>\t<mode {major, minor}>\n
  
'''System Calibration''': The test set will be randomly split into training and test data. Training data will be provided to the participants so that they determine the optimal settings for the parameters of their algorithms.
+
where \t denotes a tab, \n denotes the end of line. The < and > characters are not included. An example output file would look something like:
  
'''Evaluation ''': The error analysis will center on comparing the key identified by the algorithm to the actual key of the piece. The key of the piece is the one defined by the composer in the title of the piece. We will then determine how ΓÇÿcloseΓÇÖ each identified key is to the corresponding correct key. Keys will be considered as ΓÇÿcloseΓÇÖ if they have one of the following relationships: distance of perfect fifth, relative major and minor, and parallel major and minor. A correct key assignment will be given a full point, and incorrect assignments will be allocated fractions of a point according to the following table:
+
C    major
 +
 +
or
 +
 +
G# minor
 +
 
 +
=== Algorithm Calling Format ===
 +
 
 +
The submitted algorithm must take as arguments a SINGLE .wav file to perform the melody extraction on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:
 +
 
 +
foobar %input %output
 +
foobar -i %input -o %output
 +
 
 +
Moreover, if your submission takes additional parameters, foobar could be called like:
 +
 
 +
foobar .1 %input %output
 +
foobar -param1 .1 -i %input -o %output 
 +
 
 +
If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:
 +
 
 +
foobar('%input','%output')
 +
foobar(.1,'%input','%output')
 +
 
 +
=== README File ===
 +
 
 +
A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.
 +
 
 +
For instance, to test the program foobar with a specific value for parameter param1, the README file would look like:
 +
 
 +
foobar -param1 .1 -i %input -o %output
 +
 
 +
For a submission using MATLAB, the README file could look like:
 +
 
 +
matlab -r "foobar(.1,'%input','%output');quit;"
 +
 
 +
== Evaluation Procedures ==
 +
The error analysis will center on comparing the key identified by the algorithm to the actual key of the piece. The key of the piece is the one defined by the composer in the title of the piece. We will then determine how "close" each identified key is to the corresponding correct key. Keys will be considered as "close" if they have one of the following relationships: distance of perfect fifth, relative major and minor, and parallel major and minor. A correct key assignment will be given a full point, and incorrect assignments will be allocated fractions of a point according to the following table:
  
 
{|
 
{|
Line 42: Line 84:
 
|-
 
|-
 
|Parallel major/minor||0.2
 
|Parallel major/minor||0.2
|}
+
|}  
 
 
'''Comments''': Many excellent suggestions were made in the review process. Some of the ideas included: using actual audio files from recordings for the audio portion of the contest, employing other metrics used in information retrieval literature, using test data from a wider variety of genres, and considering the detection of key modulations.
 
 
 
As this is a first attempt at evaluating key-finding across different systems employing a variety of algorithm combinations, we have opted to keep the evaluation procedure as simple and streamlined as possible. The results of this contest will lay the groundwork from which we can expand the techniques for key-finding evaluation.
 
 
 
==Relevant Test Collections==
 
 
 
'''Symbolic Data''': The dataset contains 500 classical music MIDI files selected from the Classical Music Archives (http://www.classicalarchives.com) and labelled with the key stated in their title.
 
 
 
Examples of pieces include, but are not limited to, the following:
 
 
 
Pieces from the Baroque period:
 
Bach (http://www.classicalarchives.com/bach.html) ΓÇô Keyboard Works, Chamber Works, and Orchestral Works.
 
Vivaldi (http://www.classicalarchives.com/vivaldi.html) ΓÇô Concerti and Chamber Works.
 
 
 
Pieces from the Classical period:
 
Handel (http://www.classicalarchives.com/handel.html) ΓÇô Orchestral Works, Keyboard Works, and Chamber Works.
 
Haydn (http://www.classicalarchives.com/haydn.html) ΓÇô Keyboard Works, Chamber Works, and Orchestral Works.
 
Mozart (http://www.classicalarchives.com/mozart.html) ΓÇô Keyboard Works, Symphonies and Concertos, and Chamber Works.
 
Early Beethoven (http://www.classicalarchives.com/beethovn.html) ΓÇô Piano Works, Symphonies, Concertos, and Chamber Works.
 
 
 
Pieces from the Romantic period:
 
Late Beethoven (http://www.classicalarchives.com/beethovn.html) ΓÇô Piano Works, Symphonies, Concertos, and Chamber Works.
 
Brahms (http://www.classicalarchives.com/brahms.html) ΓÇô Keyboard Works, Chamber Works, Concertos and Orchestral Works.
 
Chopin (http://www.classicalarchives.com/chopin.html) ΓÇô Piano Works.
 
 
 
'''Audio Data''': The dataset contains the same pieces sythesized from MIDI to CD-quality (16-bit, 44100 Hz, mono) WAV files using various software MIDI synthesizers (Winamp, Cakewalk, etc). The synthetizer for each piece was selected randomly.
 
  
By using the same data for both the symbolic and audio key-finding methods, we will be able to evaluate and compare both approaches. It should be noted that even though synthesized MIDI is a simple alternative to actual audio, it is an appropriate approach for an evaluation where we are considering both audio and symbolic algorithms. Also, this controlled method eliminates possible tuning issues that are sometimes present in recorded audio.
+
== Relevant Development Collections ==

Revision as of 12:42, 26 May 2010

Description

Determination of the key is a prerequisite for any analysis of tonal music. As a result, extensive work has been done in the area of automatic key detection. The goal of this task is the identification of the key from music in audio format.

Data

Collections

The collection used for this year's evaluation is the same as the one used in 2005. It consists of 1252 classical music audio pieces rendered from MIDI using the timidity MIDI synthesizer.

Audio Formats

  • CD-quality (PCM, 16-bit, 44100 Hz)
  • single channel (mono)

Submission Format

Submissions to this task will have to conform to a specified format detailed below. Submissions should be packaged and contain at least two files: The algorithm itself and a README containing contact information and detailing, in full, the use of the algorithm.

Input Data

Participating algorithms will have to read audio in the following format:

  • Sample rate: 44.1 KHz
  • Sample size: 16 bit
  • Number of channels: 1 (mono)
  • Encoding: WAV

Output Data

The audio key detection algorithms will return the estimated key in an ASCII text file for each input .wav audio file. The specification of this output file is immediately below.

Output File Format (Audio Key Detection)

The Audio Key Detection output file format is a single-line tab-delimited ASCII text format. The tonic is reported, followed by a TAB and the mode. For sharps, the "#" symbol is used (e.g. A# for A sharp), for flats, a lowercase "b" is used, e.g. (Bb for B flat). Therefore, the output file should be of the form:

<tonic {A, A#, Bb, ...}>\t<mode {major, minor}>\n

where \t denotes a tab, \n denotes the end of line. The < and > characters are not included. An example output file would look something like:

C    major

or

G# minor

Algorithm Calling Format

The submitted algorithm must take as arguments a SINGLE .wav file to perform the melody extraction on as well as the full output path and filename of the output file. The ability to specify the output path and file name is essential. Denoting the input .wav file path and name as %input and the output file path and name as %output, a program called foobar could be called from the command-line as follows:

foobar %input %output
foobar -i %input -o %output

Moreover, if your submission takes additional parameters, foobar could be called like:

foobar .1 %input %output
foobar -param1 .1 -i %input -o %output  

If your submission is in MATLAB, it should be submitted as a function. Once again, the function must contain String inputs for the full path and names of the input and output files. Parameters could also be specified as input arguments of the function. For example:

foobar('%input','%output')
foobar(.1,'%input','%output')

README File

A README file accompanying each submission should contain explicit instructions on how to to run the program (as well as contact information, etc.). In particular, each command line to run should be specified, using %input for the input sound file and %output for the resulting text file.

For instance, to test the program foobar with a specific value for parameter param1, the README file would look like:

foobar -param1 .1 -i %input -o %output

For a submission using MATLAB, the README file could look like:

matlab -r "foobar(.1,'%input','%output');quit;"

Evaluation Procedures

The error analysis will center on comparing the key identified by the algorithm to the actual key of the piece. The key of the piece is the one defined by the composer in the title of the piece. We will then determine how "close" each identified key is to the corresponding correct key. Keys will be considered as "close" if they have one of the following relationships: distance of perfect fifth, relative major and minor, and parallel major and minor. A correct key assignment will be given a full point, and incorrect assignments will be allocated fractions of a point according to the following table:

Relation to correct key Points
Same 1
Perfect fifth 0.5
Relative major/minor 0.3
Parallel major/minor 0.2

Relevant Development Collections