Difference between revisions of "2010:Audio Melody Extraction"
From MIREX Wiki
 (→Discussions from 2009)  | 
				 (→Dataset:   update the link of the dataset)  | 
				||
| Line 17: | Line 17: | ||
== '''Dataset''' ==    | == '''Dataset''' ==    | ||
| − | * [http://unvoicedsoundseparation.googlepages.com/mir-1k MIR-1K database] : [http://  | + | * [http://unvoicedsoundseparation.googlepages.com/mir-1k MIR-1K database] : [http://mirlab.org/dataset/public/MIR-1K_for_MIREX.rar dataset for Mirex], Karaoke recordings of Chinese songs. Instruments: singing voice (male, female), synthetic accompaniment.  | 
* MIREX08 database : 4 excerpts of 1 min. from "north Indian classical vocal performances", instruments: singing voice (male, female), tanpura (Indian instrument, perpetual background drone), harmonium (secondary melodic instrument) and tablas (pitched percussions).  | * MIREX08 database : 4 excerpts of 1 min. from "north Indian classical vocal performances", instruments: singing voice (male, female), tanpura (Indian instrument, perpetual background drone), harmonium (secondary melodic instrument) and tablas (pitched percussions).  | ||
* MIREX05 database : 25 phrase excerpts of 10-40 sec from the following genres: Rock, R&B, Pop, Jazz, Solo classical piano.  | * MIREX05 database : 25 phrase excerpts of 10-40 sec from the following genres: Rock, R&B, Pop, Jazz, Solo classical piano.  | ||
Revision as of 09:15, 21 April 2010
Contents
Description
The aim of the MIREX audio melody extraction evaluation is to identify the melody pitch contour from polyphonic musical audio.
The task consists of two parts:
- Voicing detection (deciding whether a particular time frame contains a "melody pitch" or not),
 - pitch detection (deciding the most likely melody pitch for each time frame).
 
We structure the submission to allow these parts to be done independently, i.e. it is possible (via a negative pitch value) to guess a pitch even for frames that were being judged unvoiced. Algorithms which don't perform a discrimination between melodic and non-melodic parts are also welcome!
Discussions for 2010
Discussions from 2009
https://www.music-ir.org/mirex/2009/index.php/Audio_Melody_Extraction#Discussions_for_2009
Dataset
- MIR-1K database : dataset for Mirex, Karaoke recordings of Chinese songs. Instruments: singing voice (male, female), synthetic accompaniment.
 - MIREX08 database : 4 excerpts of 1 min. from "north Indian classical vocal performances", instruments: singing voice (male, female), tanpura (Indian instrument, perpetual background drone), harmonium (secondary melodic instrument) and tablas (pitched percussions).
 - MIREX05 database : 25 phrase excerpts of 10-40 sec from the following genres: Rock, R&B, Pop, Jazz, Solo classical piano.
 - ISMIR04 database : 20 excerpts of about 20s each.
 - CD-quality (PCM, 16-bit, 44100 Hz)
 - single channel (mono)
 - manually annotated reference data (10 ms time grid)
 
Output Format
- In order to allow for generalization among potential approaches (i.e. frame size, hop size, etc), submitted algorithms should output pitch estimates, in Hz, at discrete instants in time
 - so the output file successively contains the time stamp [space or tab] the corresponding frequency value [new line]
 - the time grid of the reference file is 10 ms, yet the submission may use a different time grid as output (for example 5.8 ms)
 - Instants which are identified unvoiced (there is no dominant melody) can either be scored as 0 Hz or as a negative pitch value. If negative pitch values are given the statistics for Raw Pitch Accuracy and Raw Chroma Accuracy may be improved.
 
Relevant Development Collections
- Graham's collection: you find the test set here and further explanations on the pages http://www.ee.columbia.edu/~graham/mirex_melody/ and http://labrosa.ee.columbia.edu/projects/melody/
 
- For the ISMIR 2004 Audio Description Contest, the Music Technology Group of the Pompeu Fabra University assembled a diverse of audio segments and corresponding melody transcriptions including audio excerpts from such genres as Rock, R&B, Pop, Jazz, Opera, and MIDI. (full test set with the reference transcriptions (28.6 MB))