Difference between revisions of "2007:Multiple Fundamental Frequency Estimation & Tracking"

From MIREX Wiki
(Evaluation)
Line 24: Line 24:
 
The evaluation will be similar to the previous [https://www.music-ir.org/mirex2006/index.php/Audio_Melody_Extraction Audio Melody Extraction Tasks], based on the voicing and F0 detection for each source. Each  F0-contour extracted from the song by the proposed system will be scored by one of the ground truth contours for that song that results in the highest score.  
 
The evaluation will be similar to the previous [https://www.music-ir.org/mirex2006/index.php/Audio_Melody_Extraction Audio Melody Extraction Tasks], based on the voicing and F0 detection for each source. Each  F0-contour extracted from the song by the proposed system will be scored by one of the ground truth contours for that song that results in the highest score.  
 
Another score based on the raw frequency estimates per frame without tracking is also going to be reported.
 
Another score based on the raw frequency estimates per frame without tracking is also going to be reported.
 +
 +
==Comments==
 +
 +
===chunghsin yeh===
 +
 +
Reading the above suggestion we don't understand exactly how the contours are defined. If a contour is like a melody the problem seems ill-posed. Therefore, we suppose the different contours are related to f0 note contours. The task would then consist of multiple levels of evaluation using different data sets.
 +
 +
1. single frame evaluation
 +
  using either artificially mixed monophonic samples:
 +
  -- mixing with equal/non-equal energy
 +
  -- random mix or musical mix
 +
  or midi recordings as suggested above
 +
 +
Note, however, that even with midi recordings, the ground truth is not perfect, because note end events
 +
will not necessarily align with the end of the instruments sound, unless you are not planning to interrupt the
 +
sound. One may define a tolerance range after the note off event, where the
 +
f0 of the  note may or may not be detected by the algorithms. The tolerance areas are not going to be
 +
evaluated as long as the f0 detected in this area is the correct f0 of the previous note.
 +
 +
2. multiple frames (tracking) evaluation
 +
  using the midi database as above.
 +
 +
We're willing to share our single frame database (artificial mixtures) as well as some scripts for building the reference data.
 +
 +
[mailto:cyeh(at)ircam.fr cyeh(at)ircam.fr]
  
 
==Moderators==
 
==Moderators==
 
Mert Bay [mailto:mertbay@uiuc.edu mertbay@uiuc.edu],Andreas Ehmann [mailto:aehmann@uiuc.edu aehmann@uiuc.edu],Anssi Klaupri [mailto:klap@cs.tut.fi klap@cs.tut.fi]
 
Mert Bay [mailto:mertbay@uiuc.edu mertbay@uiuc.edu],Andreas Ehmann [mailto:aehmann@uiuc.edu aehmann@uiuc.edu],Anssi Klaupri [mailto:klap@cs.tut.fi klap@cs.tut.fi]

Revision as of 15:09, 15 February 2007

Description

A complex music signal can be represented by the F0`s contours of its constituent sources which is very useful in most music information retrieval systems. There have been many attempts in multi-F0 estimation, and related area melody extraction. The goal of multiple F0 tracking is to extract contours of each source from a complex music signal. In this task we would like to evaluate the state-of-art multi-F0 tracking algorithms. Since F0 tracking of all sources in a complex audio mixture can be very hard, we have to restrict our problem space. The possible cases are:

1. Multiple instruments active at the same time but each playing monophonically (one note at a time) and each instrument having a different timbre in a single channel input.

2. Multiple sources each playing polyphonically (e.g. chords…) in a single channel input.

3. Multiple sources each playing polyphonically in a stereo panned mixture.

We are more interested in the more general but feasible first case. The third case, which is subset of first case should be considered as a subtask since in most professional recordings, sources are recorded individually and panned across two stereo channels, researchers should take advantage of that.

Data

Since extracting F0 contours of all sources is a challenging task, the number of sources should be limited to 4-5 pitched instruments (no percussions). Annotating the ground truth data is an important issue, one option is to start with midi files and use a realistic synthesizer to create the data, to have completely accurate ground truth. A real world data set can be the RWC database but this database is already available to participants. Please make your recommendations on creating a database for this task.

Evaluation

The evaluation will be similar to the previous Audio Melody Extraction Tasks, based on the voicing and F0 detection for each source. Each F0-contour extracted from the song by the proposed system will be scored by one of the ground truth contours for that song that results in the highest score. Another score based on the raw frequency estimates per frame without tracking is also going to be reported.

Comments

chunghsin yeh

Reading the above suggestion we don't understand exactly how the contours are defined. If a contour is like a melody the problem seems ill-posed. Therefore, we suppose the different contours are related to f0 note contours. The task would then consist of multiple levels of evaluation using different data sets.

1. single frame evaluation

 using either artificially mixed monophonic samples:
 -- mixing with equal/non-equal energy
 -- random mix or musical mix
 or midi recordings as suggested above

Note, however, that even with midi recordings, the ground truth is not perfect, because note end events will not necessarily align with the end of the instruments sound, unless you are not planning to interrupt the sound. One may define a tolerance range after the note off event, where the f0 of the note may or may not be detected by the algorithms. The tolerance areas are not going to be evaluated as long as the f0 detected in this area is the correct f0 of the previous note.

2. multiple frames (tracking) evaluation

  using the midi database as above.

We're willing to share our single frame database (artificial mixtures) as well as some scripts for building the reference data.

cyeh(at)ircam.fr

Moderators

Mert Bay mertbay@uiuc.edu,Andreas Ehmann aehmann@uiuc.edu,Anssi Klaupri klap@cs.tut.fi