Difference between revisions of "2009:Audio Melody Extraction"

Revision as of 11:57, 15 July 2009

Description

The text of this section is copied from the 2008 page. Please add your comments and discussions for 2009.

The aim of the MIREX audio melody extraction evaluation is to identify the melody pitch contour from polyphonic musical audio. The task consists of two parts: Voicing detection (deciding whether a particular time frame contains a "melody pitch" or not), and pitch detection (deciding the most likely melody pitch for each time frame). We structure the submission to allow these parts to be done independently, i.e. it is possible (via a negative pitch value) to guess a pitch even for frames that were being judged unvoiced. Algorithms which don't perform a discrimination between melodic and non-melodic parts are also welcome!

(The audio melody extraction evaluation will be essentially a re-run of last years contest i.e. the same test data is used.)

Discussions for 2009

Your comments here.

New evaluations for 2009?

We would like to know if there would be potential participants for this year's evaluation on Audio Melody Extraction.

There has also been an interest last year in evaluating the results at note levels (and not at a frame by frame level), following the multipitch evaluation. However, it has not been done, probably because of both a lack of participants and of database. Would there be more people this year?

cheers, Jean-Louis, 9th July 2009

Chao-Ling's Comments 14/07/2009

Hi everyone. I would like to suggest that we have a separate evaluation on the songs where the main melody is carried by the human singing voice as opposed to other musical instruments (like Vishu's comment in MIREX2008). We proposed a pitch extraction approach for singing voices and may not be likely to perform well for other instruments.

In addition, we have prepared a dataset called MIR-1K and would like to add it as part of the training/evaluation dataset. It contains 1000 song clips recorded at 16 kHz sample rate with 16-bit resolution. The duration of each clip ranges from 4 to 13 seconds, and the total length of the dataset is 133 minutes. These clips were extracted from 110 karaoke songs which contain a mixed track and a music accompaniment track.

Dataset

MIREX05 database : 25 phrase excerpts of 10-40 sec from the following genres: Rock, R&B, Pop, Jazz, Solo classical piano
ISMIR04 database : 20 excerpts of about 20s each
CD-quality (PCM, 16-bit, 44100 Hz)
single channel (mono)
manually annotated reference data (10 ms time grid)

Output Format

In order to allow for generalization among potential approaches (i.e. frame size, hop size, etc), submitted algorithms should output pitch estimates, in Hz, at discrete instants in time
so the output file successively contains the time stamp [space or tab] the corresponding frequency value roma Accuracy may be improved.

Relevant Test Collections

For the ISMIR 2004 Audio Description Contest, the Music Technology Group of the Pompeu Fabra University assembled a diverse of audio segments and corresponding melody transcriptions including audio excerpts from such genres as Rock, R&B, Pop, Jazz, Opera, and MIDI. (full test set with the reference transcriptions (28.6 MB))
Graham's collection: you find the test set here and further explanations on the pages http://www.ee.columbia.edu/~graham/mirex_melody/ and http://labrosa.ee.columbia.edu/projects/melody/

Potential Participants

Vishweshwara Rao & Preeti Rao (Indian Institute of Technology Bombay, India)
Jean-Louis Durrieu, Ga├½l Richard and Bertrand David (Institut T├⌐l├⌐com, T├⌐l├⌐com ParisTech, CNRS LTCI, Paris, France)
Chao-Ling Leon Hsu, Jyh-Shing Roger Jang, and Liang-Yu Davidson Chen (Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan)
Morten Wendelboe (Institute of Computer Science, Copenhagen University, Denmark)
Sihyun Joo & Seokhwan Jo & Chan D. Yoo (Korea Advanced Institute of Science and Technology, Daejeon, Korea)

@@ Line 48: / Line 48: @@
 * Chao-Ling Leon Hsu, Jyh-Shing Roger Jang, and Liang-Yu Davidson Chen (Department of Computer Science, National Tsing-Hua University, Hsinchu, Taiwan)
 * Morten Wendelboe (Institute of Computer Science, Copenhagen University, Denmark)
+* Sihyun Joo & Seokhwan Jo & Chan D. Yoo (Korea Advanced Institute of Science and Technology, Daejeon, Korea)

Difference between revisions of "2009:Audio Melody Extraction"

Revision as of 11:57, 15 July 2009

Contents

Description

Discussions for 2009

New evaluations for 2009?

Chao-Ling's Comments 14/07/2009

Dataset

Output Format

Relevant Test Collections

Potential Participants

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools