Difference between revisions of "2008:Query by Singing/Humming"

From MIREX Wiki
(interface specification)
(interface changes)
Line 26: Line 26:
 
* '''Evaluation''': Mean Reciprocal Rank (MRR) and Top-X hit rate.
 
* '''Evaluation''': Mean Reciprocal Rank (MRR) and Top-X hit rate.
  
To make algorithms able to share intermediate steps, participants are encouraged to submit separate transcriber and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So transcribers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination. Besides, note based approaches (symbolic approaches) and pitch contour based approaches (non-symbolic approaches?) are compared.
+
To make algorithms able to share intermediate steps, participants are encouraged to submit separate transcriber and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So transcribers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination.
  
 
== Interface ==
 
== Interface ==
  
The following was suggested by Xiao Wu last year.
+
The following was based on the suggestion by Xiao Wu last year with some modifications.
  
 
1. Database indexing/building. Calling format should look like  
 
1. Database indexing/building. Calling format should look like  
Line 46: Line 46:
 
Output indexed files are placed into dir_workspace_root.  
 
Output indexed files are placed into dir_workspace_root.  
  
2. Note transcriber. Calling format:  
+
2. Pitch tracker. Calling format:  
  
  note_transcriber %query.list% %dir_query_note%
+
  pitch_transcriber %query.list% %dir_query_pitch%
  
Each input file dir_query/query_xxxxx.wav in query.list outputs a transcription dir_query_note/query_xxxxx.note, and each text line of the generated note file represents a query note formated as note %onset_time% %duration% %midi_note%. Example:  
+
Each input file dir_query/query_xxxxx.wav in query.list outputs a transcription dir_query_pitch/query_xxxxx.pitch, and each text line of the generated pitch file represents a query pitch formated as pitch %onset_time% %duration% %midi_pitch%. Example:  
  
  note 2000 250 62.25
+
  pitch 2000 250 297.93
  note 2250 250 62.03
+
  pitch 2250 250 294.17
  note 2500 500 64.42
+
  pitch 2500 500 337.72
  note 3200 220 62.30
+
  pitch 3200 220 298.80
 
  ...
 
  ...
  
 
Here onset_time and duration are counted in millisecond.  
 
Here onset_time and duration are counted in millisecond.  
  
3. Note matcher. Calling format:  
+
3. Pitch matcher. Calling format:  
  
  note_matcher %note.list% %result%
+
  pitch_matcher %db_list% %pitch.list% %result%
  
where note.list looks like  
+
where pitch.list looks like  
  
  dir_query_note/query_00001.note
+
  dir_query_pitch/query_00001.pitch
  dir_query_note/query_00002.note
+
  dir_query_pitch/query_00002.pitch
  dir_query_note/query_00003.note
+
  dir_query_pitch/query_00003.pitch
 
  ...
 
  ...
  
Line 77: Line 77:
 
  query_00003: 03142 00320 00973 ...  
 
  query_00003: 03142 00320 00973 ...  
 
  ...
 
  ...
 
4. Pitch tracker. Calling format:
 
 
pitch_tracker %query.list% %dir_query_pitch%
 
Each input file dir_query/query_xxxxx.wav in query.list outputs a corresponding transcription dir_query_pitch/query_xxxxx.pitch which gives the pitch sequence in midi note scale with the resolution of 10ms:
 
 
0
 
0
 
62.23
 
62.25
 
62.21
 
...
 
 
Thus a query with x seconds should output a pitch file with 100*x lines. Places of silence/rest are set to be 0.
 
 
5. Pitch matcher. Similar with note matcher:
 
 
pitch_matcher %pitch.list% %result%
 
 
6. Hybrid matcher. Both note and pitch are utilized. Calling format:
 
 
note_pitch_matcher %note.list% %pitch.list% %result%
 
 
  
 
== Participants ==
 
== Participants ==

Revision as of 12:33, 13 August 2008

Status

The goal of the Query-by-Singing/Humming (QBSH) task is the evaluation of MIR systems that take as query input queries sung or hummed by real-world users. More information can be found in:

Please feel free to edit this page.

Query Data

1. Roger Jang's corpus (MIREX2006 QBSH corpus) which is comprised of 2797 queries along with 48 ground-truth MIDI files. All queries are from the beginning of references.

2. ThinkIT corpus comprised of 355 queries and 106 monophonic ground-truth MIDI files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee. This corpus will be published after the task running.

3. Noise MIDI will be the 5000+ Essen collection(can be accessed from http://www.esac-data.org/).

To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus.

Evaluation Corpus Contribution

Every participant will be asked to contribute 100~200 wave queries as test data. These test data will be released after the competition as a public-domain QBSH dataset. Programs for recording wave queries will be provided shortly. We wish to have an evaluation dataset around 1000 ~ 2000 wave queries in total.

Task description

Classic QBSH evaluation:

  • Input: human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
  • Database: ground-truth and noise MIDI files(which are monophonic). Comprised of 48+106 Roger Jang's and ThinkIT's ground-truth along with a cleaned version of Essen Database(2000+ MIDIs which are used last year)
  • Output: top-20 candidate list.
  • Evaluation: Mean Reciprocal Rank (MRR) and Top-X hit rate.

To make algorithms able to share intermediate steps, participants are encouraged to submit separate transcriber and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So transcribers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination.

Interface

The following was based on the suggestion by Xiao Wu last year with some modifications.

1. Database indexing/building. Calling format should look like

indexing %db_list% %dir_workspace_root%

where db_list is the input list of database midi files named as uniq_key.mid. For example:

./QBSH/Database/00001.mid
./QBSH/Database/00002.mid
./QBSH/Database/00003.mid
./QBSH/Database/00004.mid
...

Output indexed files are placed into dir_workspace_root.

2. Pitch tracker. Calling format:

pitch_transcriber %query.list% %dir_query_pitch%

Each input file dir_query/query_xxxxx.wav in query.list outputs a transcription dir_query_pitch/query_xxxxx.pitch, and each text line of the generated pitch file represents a query pitch formated as pitch %onset_time% %duration% %midi_pitch%. Example:

pitch 2000 250 297.93
pitch 2250 250 294.17
pitch 2500 500 337.72
pitch 3200 220 298.80
...

Here onset_time and duration are counted in millisecond.

3. Pitch matcher. Calling format:

pitch_matcher %db_list% %pitch.list% %result%

where pitch.list looks like

dir_query_pitch/query_00001.pitch
dir_query_pitch/query_00002.pitch
dir_query_pitch/query_00003.pitch
...

and the result file gives top-20 candidates(if has) for each query:

query_00001: 00025 01003 02200 ... 
query_00002: 01547 02313 07653 ... 
query_00003: 03142 00320 00973 ... 
...

Participants

If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list

  • Liang-Yu Davidson Chen (davidson833 at mirlab dot org)
  • Lei Wang (leiwang.mir at gmail dot com)
  • Xiao Wu (xwu2006 at gmail dot com)
  • Matti Ryyn├ñnen and Anssi Klapuri (Tampere University of Technology), matti.ryynanen <at> tut.fi, anssi.klapuri <at> tut.fi

Xiao Wu's Comments

In my opinion, QBSH (even for QBH in monophonic database) is still far from "a solved problem". Many problems are still chanllenging our systems (robustness in noise environment, efficiency in 10000-larger database, etc.). So, this year we may setup a more tough test for the participants.