2010:Audio Melody Extraction

Description

The aim of the MIREX audio melody extraction evaluation is to identify the melody pitch contour from polyphonic musical audio.

The task consists of two parts:

Voicing detection (deciding whether a particular time frame contains a "melody pitch" or not),
pitch detection (deciding the most likely melody pitch for each time frame).

We structure the submission to allow these parts to be done independently, i.e. it is possible (via a negative pitch value) to guess a pitch even for frames that were being judged unvoiced. Algorithms which don't perform a discrimination between melodic and non-melodic parts are also welcome!

Discussions for 2010

Discussions from 2009

https://www.music-ir.org/mirex/2009/index.php/Audio_Melody_Extraction#Discussions_for_2009

Dataset

MIR-1K database : dataset for Mirex, Karaoke recordings of Chinese songs. Instruments: singing voice (male, female), synthetic accompaniment.
MIREX08 database : 4 excerpts of 1 min. from "north Indian classical vocal performances", instruments: singing voice (male, female), tanpura (Indian instrument, perpetual background drone), harmonium (secondary melodic instrument) and tablas (pitched percussions).
MIREX05 database : 25 phrase excerpts of 10-40 sec from the following genres: Rock, R&B, Pop, Jazz, Solo classical piano.
ISMIR04 database : 20 excerpts of about 20s each.
CD-quality (PCM, 16-bit, 44100 Hz)
single channel (mono)
manually annotated reference data (10 ms time grid)

Output Format

In order to allow for generalization among potential approaches (i.e. frame size, hop size, etc), submitted algorithms should output pitch estimates, in Hz, at discrete instants in time
so the output file successively contains the time stamp [space or tab] the corresponding frequency value [new line]
the time grid of the reference file is 10 ms, yet the submission may use a different time grid as output (for example 5.8 ms)
Instants which are identified unvoiced (there is no dominant melody) can either be scored as 0 Hz or as a negative pitch value. If negative pitch values are given the statistics for Raw Pitch Accuracy and Raw Chroma Accuracy may be improved.

Relevant Development Collections

MIR-1K: MIR-1K for MIREX(Note that this is not the one used for evaluation. The MIREX 2009 dataset used for evaluation last year was created in the same way but has different content and singers).

Graham's collection: you find the test set here and further explanations on the pages http://www.ee.columbia.edu/~graham/mirex_melody/ and http://labrosa.ee.columbia.edu/projects/melody/

For the ISMIR 2004 Audio Description Contest, the Music Technology Group of the Pompeu Fabra University assembled a diverse of audio segments and corresponding melody transcriptions including audio excerpts from such genres as Rock, R&B, Pop, Jazz, Opera, and MIDI. (full test set with the reference transcriptions (28.6 MB))

Potential Participants

Chao-Ling Leon Hsu and Jyh-Shing Roger Jang (Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan)

2010:Audio Melody Extraction

Contents

Description

Discussions for 2010

Discussions from 2009

Dataset

Output Format

Relevant Development Collections

Potential Participants

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools