2007:Query by Singing/Humming

From MIREX Wiki

Status

The goal of the Query-by-Singing/Humming (QBSH) task is the evaluation of MIR systems that take as query input queries sung or hummed by real-world users. More information can be found in:

Please feel free to edit this page.

Query Data

1. Roger Jang's corpus (MIREX2006 QBSH corpus) which is comprised of 2797 queries along with 48 ground-truth MIDI files. All queries are from the beginning of references.

2. ThinkIT corpus comprised of 355 queries and 106 monophonic ground-truth midi files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee. This corpus will be published after the task running.

3. Noise MIDI will be the 5000+ Essen collection(can be accessed from http://www.esac-data.org/).

To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus.

Task description

Classic QBSH evaluation:

  • Input: human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
  • Database: ground-truth and noise midi files(which are monophonic). Comprised of 48+106 Roger Jang's and ThinkIT's ground-truth along with a cleaned version of Essen Database(2000+ MIDIs which are used last year)
  • Output: top-20 candidate list.
  • Evaluation: Mean Reciprocal Rank (MRR) and Top-X hit rate.

To make algorithms able to share intermediate steps, participants are encouraged to submit separate transcriber and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So transcribers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination. Besides, note based approaches (symbolic approaches) and pitch contour based approaches (non-symbolic approaches?) are compared.

2007 framework.jpg

Participants

If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list

  • Xiao Wu & Ming Li ({xwu,mli} at hccl dot ioa dot ac dot cn)
  • Mohamed Sordo & Maarten Grachten (e-mail at my homepage)
  • Niko Mikkila (mikkila at cs dot helsinki dot fi)
  • Rainer Typke (rainer dot typke at ofai dot at) (note matcher; I need a MySQL database to participate)
  • Carlos G├│mez (cgomez at ldc dot usb dot ve) (note matcher)
  • Ean Nugent (nugente at andrews dot edu) (I would like more background information)
  • Jiang Danning (jiangdn at cn dot ibm dot com)
  • J.-S. Roger Jang (jang at cs dot nthu dot edu dot tw)
  • Alexandra Uitdenbogerd (sandrau at rmit dot edu dot au)
  • Ming Xuan, Zou (felix at wayne dot cs dot nthu dot edu dot tw)
  • Shen huang & Lei wang ((shenhuang,leiwang) at hitic dot ia dot ac dot cn )

Interface suggestion commented by Xiao Wu

1. Database indexing/building. Calling format should look like

indexing %db_list% %dir_workspace_root%

where db_list is the input list of database midi files named as uniq_key.mid. For example:

./QBSH/Database/00001.mid
./QBSH/Database/00002.mid
./QBSH/Database/00003.mid
./QBSH/Database/00004.mid
...

Output indexed files are placed into dir_workspace_root.

2. Note transcriber. Calling format:

note_transcriber %query.list% %dir_query_note%

Each input file dir_query/query_xxxxx.wav in query.list outputs a transcription dir_query_note/query_xxxxx.note, and each text line of the generated note file represents a query note formated as note %onset_time% %duration% %midi_note%. Example:

note 2000 250 62.25
note 2250 250 62.03
note 2500 500 64.42
note 3200 220 62.30
...

Here onset_time and duration are counted in millisecond.

3. Note matcher. Calling format:

note_matcher %note.list% %result%

where note.list looks like

dir_query_note/query_00001.note
dir_query_note/query_00002.note
dir_query_note/query_00003.note
...

and the result file gives top-20 candidates(if has) for each query:

query_00001: 00025 01003 02200 ... 
query_00002: 01547 02313 07653 ... 
query_00003: 03142 00320 00973 ... 
...

4. Pitch tracker. Calling format:

pitch_tracker %query.list% %dir_query_pitch%

Each input file dir_query/query_xxxxx.wav in query.list outputs a corresponding transcription dir_query_pitch/query_xxxxx.pitch which gives the pitch sequence in midi note scale with the resolution of 10ms:

0
0
62.23
62.25
62.21
...

Thus a query with x seconds should output a pitch file with 100*x lines. Places of silence/rest are set to be 0.

5. Pitch matcher. Similar with note matcher:

 pitch_matcher %pitch.list% %result%

6. Hybrid matcher. Both note and pitch are utilized. Calling format:

 note_pitch_matcher %note.list% %pitch.list% %result%

Comments from Xiao Wu

ThinkIT QBH corpus now is available at TITcorpus. In all there are 355 audio files along with 106 MIDI files.

Comments from Roger Jang

Is there any time constraint on running each query against 5000+ MIDIs? [It seems a little bit quiet on this page.]

Comments from Xiao Wu

To Roger: Stephen suggested to use the "cleaned version" of essen folks(2000+ MIDIs which are also adopted in MIREX QBSH 2006). So the problem size is not that large.

Comments from Xiao Wu

It should be noticed the list parameters in command line such as "query.list" and "note.list" are list FILEs instead of multiline arguments. Thanks Carlos for pointing out this ambiguity.