Difference between revisions of "2006:Audio Melody Extraction"

From MIREX Wiki
m (Potential Participants)
m (Robot: Automated text replacement (-[[2006:User: +[[User:))
 
(7 intermediate revisions by 3 users not shown)
Line 1: Line 1:
== Overview ==
+
== Results ==
 +
Results are on [[2006:Audio Melody Extraction Results]] page.
  
In the absence of better suggestions, I propose that we re-run the 2005 audio melody extraction in 2006 i.e. use the same test data at least. We may subsequently come up with some improved metrics (particularly with a view to significance testing) but basically the results will be directly comparable with last year--[[User:Dpwe|Dpwe]] 15:28, 23 March 2006 (CST)
+
== Goal ==
 +
To extract the melody line from polyphonic audio.
 +
 
 +
== Description ==
 +
The aim of the MIREX audio melody extraction evaluation is to identify the melody pitch contour from polyphonic musical audio.
 +
The task consists of two parts: Voicing detection (deciding whether a particular time frame contains a "melody pitch" or not), and pitch detection (deciding the most likely melody pitch for each time frame). We structure the submission to allow these parts to be done independently, i.e. it is possible (via a negative pitch value) to guess a pitch even for frames that were being judged unvoiced. Algorithms which don't perform a discrimination between melodic and non-melodic parts are also welcome!
 +
 
 +
(The audio melody extraction evaluation will be essentially a re-run of last years contest i.e. the same test data is used.)
 +
 
 +
'''Dataset:'''
 +
* 25 phrase excerpts of 10-40 sec from the following genres: Rock, R&B, Pop, Jazz, Solo classical piano
 +
* CD-quality (PCM, 16-bit, 44100 Hz)
 +
* single channel (mono)
 +
* manually annotated reference data (10 ms time grid)
 +
 
 +
'''Output Format:'''
 +
* In order to allow for generalization among potential approaches (i.e. frame size, hop size, etc), submitted algorithms should output pitch estimates, in Hz, at discrete instants in time
 +
* so the output file successively contains the time stamp [space or tab] the corresponding frequency value [new line]
 +
* the time grid of the reference file is 10 ms, yet the submission may use a different time grid as output (for example 5.8 ms)
 +
* Instants which are identified unvoiced (there is no dominant melody) can either be scored as 0 Hz or as a negative pitch value. If negative pitch values are given the statistics for Raw Pitch Accuracy and Raw Chroma Accuracy may be improved.
 +
 
 +
'''Relevant Test Collections'''
 +
* For the ISMIR 2004 Audio Description Contest, the Music Technology Group of the Pompeu Fabra University assembled a diverse of audio segments and corresponding melody transcriptions including audio excerpts from such genres as Rock, R&B, Pop, Jazz, Opera, and MIDI. ([http://www.iua.upf.es/mtg/ismir2004/contest/melodyContest/FullSet.zip full test set with the reference transcriptions (28.6 MB))]
 +
* Graham's collection: you find the test set [http://www.ee.columbia.edu/~graham/mirex_melody/melody_example_files.zip here] and further explanations on the pages [http://www.ee.columbia.edu/~graham/mirex_melody/  http://www.ee.columbia.edu/~graham/mirex_melody/]and [http://labrosa.ee.columbia.edu/projects/melody/ http://labrosa.ee.columbia.edu/projects/melody/]
 +
 
 +
'''ATTENTION!''' The timing offset (and time grids) of the test collections vary. Use Graham's collection to adjust the timing offset of your algorithm
  
 +
== Submission Procedure ==
 +
See the official set of [https://www.music-ir.org/mirex2006/index.php/MIREX_2006_Submission_Instructions submission guidelines] for MIREX 2006 and additionally Andreas' instructions about the [https://www.music-ir.org/mirex2006/index.php/Best_Coding_Practices_for_MIREX "Best Coding Practices for MIREX"]
 
==Potential Participants==
 
==Potential Participants==
  
Line 8: Line 36:
 
* Karin Dressler (Fraunhofer IDMT) - dresslkn@idmt.fraunhofer.de - Likely
 
* Karin Dressler (Fraunhofer IDMT) - dresslkn@idmt.fraunhofer.de - Likely
 
* Matti Ryynänen and Anssi Klapuri (Tampere University of Technology) - matti.ryynanen@tut.fi - Likely
 
* Matti Ryynänen and Anssi Klapuri (Tampere University of Technology) - matti.ryynanen@tut.fi - Likely
 +
* Graham Poliner and Dan Ellis (Columbia University)
 +
* Paul Brossier (Queen Mary, University of London)
  
 
Additional potential participants include:
 
Additional potential participants include:
* Graham Poliner and Dan Ellis (Columbia University)
 
 
* Emmanuel Vincent (Queen Mary, University of London)
 
* Emmanuel Vincent (Queen Mary, University of London)
 
* Rui Pedro Paiva (University of Coimbra)
 
* Rui Pedro Paiva (University of Coimbra)
 
* Matija Marolt (University of Ljubljana)
 
* Matija Marolt (University of Ljubljana)
 
* Masataka Goto (AIST)
 
* Masataka Goto (AIST)
* Paul Brossier (Queen Mary, University of London)
+
 
 +
== Evaluation Results ==
 +
see [https://www.music-ir.org/mirex2006/index.php/Audio_Melody_Extraction_Results https://www.music-ir.org/mirex2006/index.php/Audio_Melody_Extraction_Results]
 +
 
 +
== Dan's comments ==
 +
 
 +
In the absence of better suggestions, I propose that we re-run the 2005 audio melody extraction in 2006 i.e. use the same test data at least.  We may subsequently come up with some improved metrics (particularly with a view to significance testing) but basically the results will be directly comparable with last year.  --[[User:Dpwe|Dpwe]] 15:28, 23 March 2006 (CST)
  
 
==Karin's comments==
 
==Karin's comments==

Latest revision as of 13:15, 13 May 2010

Results

Results are on 2006:Audio Melody Extraction Results page.

Goal

To extract the melody line from polyphonic audio.

Description

The aim of the MIREX audio melody extraction evaluation is to identify the melody pitch contour from polyphonic musical audio. The task consists of two parts: Voicing detection (deciding whether a particular time frame contains a "melody pitch" or not), and pitch detection (deciding the most likely melody pitch for each time frame). We structure the submission to allow these parts to be done independently, i.e. it is possible (via a negative pitch value) to guess a pitch even for frames that were being judged unvoiced. Algorithms which don't perform a discrimination between melodic and non-melodic parts are also welcome!

(The audio melody extraction evaluation will be essentially a re-run of last years contest i.e. the same test data is used.)

Dataset:

  • 25 phrase excerpts of 10-40 sec from the following genres: Rock, R&B, Pop, Jazz, Solo classical piano
  • CD-quality (PCM, 16-bit, 44100 Hz)
  • single channel (mono)
  • manually annotated reference data (10 ms time grid)

Output Format:

  • In order to allow for generalization among potential approaches (i.e. frame size, hop size, etc), submitted algorithms should output pitch estimates, in Hz, at discrete instants in time
  • so the output file successively contains the time stamp [space or tab] the corresponding frequency value [new line]
  • the time grid of the reference file is 10 ms, yet the submission may use a different time grid as output (for example 5.8 ms)
  • Instants which are identified unvoiced (there is no dominant melody) can either be scored as 0 Hz or as a negative pitch value. If negative pitch values are given the statistics for Raw Pitch Accuracy and Raw Chroma Accuracy may be improved.

Relevant Test Collections

ATTENTION! The timing offset (and time grids) of the test collections vary. Use Graham's collection to adjust the timing offset of your algorithm

Submission Procedure

See the official set of submission guidelines for MIREX 2006 and additionally Andreas' instructions about the "Best Coding Practices for MIREX"

Potential Participants

The following researchers have confirmed their interest in participating:

  • Karin Dressler (Fraunhofer IDMT) - dresslkn@idmt.fraunhofer.de - Likely
  • Matti Ryyn├ñnen and Anssi Klapuri (Tampere University of Technology) - matti.ryynanen@tut.fi - Likely
  • Graham Poliner and Dan Ellis (Columbia University)
  • Paul Brossier (Queen Mary, University of London)

Additional potential participants include:

  • Emmanuel Vincent (Queen Mary, University of London)
  • Rui Pedro Paiva (University of Coimbra)
  • Matija Marolt (University of Ljubljana)
  • Masataka Goto (AIST)

Evaluation Results

see https://www.music-ir.org/mirex2006/index.php/Audio_Melody_Extraction_Results

Dan's comments

In the absence of better suggestions, I propose that we re-run the 2005 audio melody extraction in 2006 i.e. use the same test data at least. We may subsequently come up with some improved metrics (particularly with a view to significance testing) but basically the results will be directly comparable with last year. --Dpwe 15:28, 23 March 2006 (CST)

Karin's comments

published and "secret" test data:

I would find it very usefull if melody extraction results are computed for two different data sets - one published (for example the ISMIR2004 data set), one "secret" (the MIREX2005 data set). This way the new results will be comparable with last year's results, and at the same time we gain a better insight into the performance of the algorithms in very specific situations, because we have access to the music files. I think this knowledge would be very valuable for the further development of the algorithms.