Difference between revisions of "2007:Query by Singing/Humming"

From MIREX Wiki
(Interface suggestion commented by xwu)
(Participants)
Line 35: Line 35:
 
* Rainer Typke (rainer dot typke at ofai dot at) (note matcher; I need a MySQL database to participate)
 
* Rainer Typke (rainer dot typke at ofai dot at) (note matcher; I need a MySQL database to participate)
 
* Carlos G├│mez (cgomez at ldc dot usb dot ve) (note matcher)
 
* Carlos G├│mez (cgomez at ldc dot usb dot ve) (note matcher)
 +
* Ean Nugent (nugente at andrews dot edu) (I would like more background information)
  
 
== Interface suggestion commented by xwu ==
 
== Interface suggestion commented by xwu ==

Revision as of 10:28, 24 May 2007

Status

The goal of the Query-by-Singing/Humming (QBSH) task is the evaluation of MIR systems that take as query input queries sung or hummed by real-world users. More information can be found in:

Please feel free to edit this page.

Query Data

1. Roger Jang's corpus (MIREX2006 QBSH corpus) which is comprised of 2797 queries along with 48 ground-truth MIDI files. All queries are from the beginning of references.

2. ThinkIT corpus comprised of 355 queries and 106 monophonic ground-truth midi files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee. This corpus will be published after the task running.

3. Noise MIDI will be the 5000+ Essen collection(can be accessed from http://www.esac-data.org/).

To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus.

Task description

Classic QBSH evaluation:

  • Input: human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
  • Database: ground-truth and noise midi files(which are monophonic). Comprised of 48+106 Roger Jang's and ThinkIT's ground-truth along with 5000+ essen noise midifiles.
  • Output: top-20 candidate list.
  • Evaluation: Mean Reciprocal Rank (MRR) and Top-X hit rate.

To make algorithms able to share intermediate steps, participants are encouraged to submit separate transcriber and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So transcribers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination. Besides, note based approaches (symbolic approaches) and pitch contour based approaches (non-symbolic approaches?) are compared.

File:Framework.jpg

Participants

If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list

  • Xiao Wu (xwu at hccl dot ioa dot ac dot cn)
  • Maarten Grachten (maarten dot grachten at jku dot at)
  • Jiang Danning (jiangdn at cn dot ibm dot com)
  • Niko Mikkila (mikkila at cs dot helsinki dot fi)
  • Rainer Typke (rainer dot typke at ofai dot at) (note matcher; I need a MySQL database to participate)
  • Carlos G├│mez (cgomez at ldc dot usb dot ve) (note matcher)
  • Ean Nugent (nugente at andrews dot edu) (I would like more background information)

Interface suggestion commented by xwu

1. Database indexing/building. Calling format should look like

indexing %db_list% %dir_workspace_root%

where db_list is the input list of database midi files named as uniq_key.mid. For example:

./QBSH/Database/00001.mid
./QBSH/Database/00002.mid
./QBSH/Database/00003.mid
./QBSH/Database/00004.mid
...

Output indexed files are placed into dir_workspace_root.

2. Note transcriber. Calling format:

note_transcriber %query.list% %dir_query_note%

Each input file dir_query/query_xxxxx.wav in query.list outputs a transcription dir_query_note/query_xxxxx.note, and each text line of the generated note file represents a query note formated as note %onset_time% %duration% %midi_note%. Example:

note 2000 250 62.25
note 2250 250 62.03
note 2500 500 64.42
note 3200 220 62.30
...

Here onset_time and duration are counted in millisecond.

3. Note matcher. Calling format:

note_matcher %note.list% %result%

where note.list looks like

dir_query_note/query_00001.note
dir_query_note/query_00002.note
dir_query_note/query_00003.note
...

and the result file gives top-20 candidates(if has) for each query:

query_00001: 00025 01003 02200 ... 
query_00002: 01547 02313 07653 ... 
query_00003: 03142 00320 00973 ... 
...

4. Pitch tracker. Calling format:

pitch_tracker %query.list% %dir_query_pitch%

Each input file dir_query/query_xxxxx.wav in query.list outputs a corresponding transcription dir_query_pitch/query_xxxxx.pitch which gives the pitch sequence in midi note scale with the resolution of 10ms:

0
0
62.23
62.25
62.21
...

Thus a query with x seconds should output a pitch file with 100*x lines. Places of silence/rest are set to be 0.

5. Pitch matcher. Similar with note matcher:

 pitch_matcher %pitch.list% %result%

6. Hybrid matcher. Both note and pitch are utilized. Calling format:

 note_pitch_matcher %note.list% %pitch.list% %result%