Difference between revisions of "2007:Query by Singing/Humming"

From MIREX Wiki
(Interface suggestion commented by xwu)
(Status)
 
(27 intermediate revisions by 13 users not shown)
Line 1: Line 1:
 
== Status ==
 
== Status ==
This is only a very basic draft version of a task proposal. Once more people show interest we can fill in the details.
 
 
 
The goal of the Query-by-Singing/Humming (QBSH) task is the evaluation of MIR systems that take as query input queries sung or hummed by real-world users. More information can be found in:
 
The goal of the Query-by-Singing/Humming (QBSH) task is the evaluation of MIR systems that take as query input queries sung or hummed by real-world users. More information can be found in:
  
* [https://www.music-ir.org/mirex2006/index.php/QBSH:_Query-by-Singing/Humming MIREX2006 QBSH Task Proposal]
+
* [[2006:QBSH:_Query-by-Singing/Humming]] MIREX2006 QBSH Task Proposal
* [https://www.music-ir.org/mirex2006/index.php/QBSH_Discussion_Page MIREX2006 QBSH Task Discussion]
+
* [[2006:QBSH_Discussion_Page]] MIREX2006 QBSH Task Discussion
  
 
Please feel free to edit this page.
 
Please feel free to edit this page.
Line 21: Line 19:
 
Classic QBSH evaluation:
 
Classic QBSH evaluation:
 
* '''Input''': human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
 
* '''Input''': human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
* '''Database''': ground-truth and noise midi files(which are monophonic). Comprised of Roger 48+106 Jang's and ThinkIT's ground-truth along with 5000+ essen noise midifiles.
+
* '''Database''': ground-truth and noise midi files(which are monophonic). Comprised of 48+106 Roger Jang's and ThinkIT's ground-truth along with a cleaned version of Essen Database(2000+ MIDIs which are used last year)
 
* '''Output''': top-20 candidate list.  
 
* '''Output''': top-20 candidate list.  
 
* '''Evaluation''': Mean Reciprocal Rank (MRR) and Top-X hit rate.
 
* '''Evaluation''': Mean Reciprocal Rank (MRR) and Top-X hit rate.
Line 27: Line 25:
 
To make algorithms able to share intermediate steps, participants are encouraged to submit separate transcriber and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So transcribers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination. Besides, note based approaches (symbolic approaches) and pitch contour based approaches (non-symbolic approaches?) are compared.
 
To make algorithms able to share intermediate steps, participants are encouraged to submit separate transcriber and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So transcribers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination. Besides, note based approaches (symbolic approaches) and pitch contour based approaches (non-symbolic approaches?) are compared.
  
[[Image:framework.jpg]]
+
[[Image:2007_framework.jpg]]
  
 
== Participants ==  
 
== Participants ==  
 
If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list
 
If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list
* Xiao Wu (xwu at hccl dot ioa dot ac dot cn)
+
* Xiao Wu & Ming Li ({xwu,mli} at hccl dot ioa dot ac dot cn)
* Maarten Grachten (maarten dot grachten at jku dot at)
+
* Mohamed Sordo & Maarten Grachten (e-mail at my [http://waits.cp.jku.at/~maarten/ homepage])
 +
* Niko Mikkila (mikkila at cs dot helsinki dot fi)
 +
* Rainer Typke (rainer dot typke at ofai dot at) (note matcher; I need a MySQL database to participate)
 +
* Carlos G├│mez (cgomez at ldc dot usb dot ve) (note matcher)
 +
* Ean Nugent (nugente at andrews dot edu) (I would like more background information)
 
* Jiang Danning (jiangdn at cn dot ibm dot com)
 
* Jiang Danning (jiangdn at cn dot ibm dot com)
* Niko Mikkila (mikkila at cs dot helsinki dot fi)
+
* J.-S. Roger Jang (jang at cs dot nthu dot edu dot tw)
 +
* Alexandra Uitdenbogerd (sandrau at rmit dot edu dot au)
 +
* Ming Xuan, Zou (felix at wayne dot cs dot nthu dot edu dot tw)
 +
* Shen huang & Lei wang ((shenhuang,leiwang) at hitic dot ia dot ac dot cn )
  
== Interface suggestion commented by xwu ==
+
== Interface suggestion commented by Xiao Wu ==
 
1. Database indexing/building. Calling format should look like
 
1. Database indexing/building. Calling format should look like
 
  ''indexing %db_list% %dir_workspace_root%''
 
  ''indexing %db_list% %dir_workspace_root%''
Line 50: Line 55:
 
  ''note_transcriber %query.list% %dir_query_note%''
 
  ''note_transcriber %query.list% %dir_query_note%''
 
Each input file ''dir_query/query_xxxxx.wav'' in ''query.list'' outputs a transcription ''dir_query_note/query_xxxxx.note'', and each text line of the generated note file represents a query note formated as ''note %onset_time% %duration% %midi_note%''. Example:
 
Each input file ''dir_query/query_xxxxx.wav'' in ''query.list'' outputs a transcription ''dir_query_note/query_xxxxx.note'', and each text line of the generated note file represents a query note formated as ''note %onset_time% %duration% %midi_note%''. Example:
  ''note 2000 2250 62.25''
+
  ''note 2000 250 62.25''
  ''note 2250 2500 62.03''
+
  ''note 2250 250 62.03''
  ''note 2500 3000 64.42''
+
  ''note 2500 500 64.42''
  ''note 3200 3450 62.30''
+
  ''note 3200 220 62.30''
 
  ...
 
  ...
 
Here ''onset_time'' and ''duration'' are counted in millisecond.
 
Here ''onset_time'' and ''duration'' are counted in millisecond.
Line 84: Line 89:
 
5. Pitch matcher. Similar with note matcher:
 
5. Pitch matcher. Similar with note matcher:
 
   ''pitch_matcher %pitch.list% %result%''
 
   ''pitch_matcher %pitch.list% %result%''
 +
 +
6. Hybrid matcher. Both note and pitch are utilized. Calling format:
 +
  ''note_pitch_matcher %note.list% %pitch.list% %result%''
 +
 +
== Comments from Xiao Wu ==
 +
ThinkIT QBH corpus now is available at [http://159.226.60.224/en/Thinkit.QBH.corpus.rar TITcorpus]. In all there are 355 audio files along with 106 MIDI files.
 +
 +
== Comments from Roger Jang ==
 +
Is there any time constraint on running each query against 5000+ MIDIs? [It seems a little bit quiet on this page.]
 +
== Comments from Xiao Wu ==
 +
To Roger: Stephen suggested to use the "cleaned version" of essen folks(2000+ MIDIs which are also adopted in MIREX QBSH 2006). So the problem size is not that large.
 +
 +
== Comments from Xiao Wu ==
 +
It should be noticed the list parameters in command line such as "query.list" and "note.list" are list FILEs instead of multiline arguments. Thanks Carlos for pointing out this ambiguity.

Latest revision as of 22:00, 19 December 2011

Status

The goal of the Query-by-Singing/Humming (QBSH) task is the evaluation of MIR systems that take as query input queries sung or hummed by real-world users. More information can be found in:

Please feel free to edit this page.

Query Data

1. Roger Jang's corpus (MIREX2006 QBSH corpus) which is comprised of 2797 queries along with 48 ground-truth MIDI files. All queries are from the beginning of references.

2. ThinkIT corpus comprised of 355 queries and 106 monophonic ground-truth midi files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee. This corpus will be published after the task running.

3. Noise MIDI will be the 5000+ Essen collection(can be accessed from http://www.esac-data.org/).

To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus.

Task description

Classic QBSH evaluation:

  • Input: human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
  • Database: ground-truth and noise midi files(which are monophonic). Comprised of 48+106 Roger Jang's and ThinkIT's ground-truth along with a cleaned version of Essen Database(2000+ MIDIs which are used last year)
  • Output: top-20 candidate list.
  • Evaluation: Mean Reciprocal Rank (MRR) and Top-X hit rate.

To make algorithms able to share intermediate steps, participants are encouraged to submit separate transcriber and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So transcribers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination. Besides, note based approaches (symbolic approaches) and pitch contour based approaches (non-symbolic approaches?) are compared.

2007 framework.jpg

Participants

If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list

  • Xiao Wu & Ming Li ({xwu,mli} at hccl dot ioa dot ac dot cn)
  • Mohamed Sordo & Maarten Grachten (e-mail at my homepage)
  • Niko Mikkila (mikkila at cs dot helsinki dot fi)
  • Rainer Typke (rainer dot typke at ofai dot at) (note matcher; I need a MySQL database to participate)
  • Carlos G├│mez (cgomez at ldc dot usb dot ve) (note matcher)
  • Ean Nugent (nugente at andrews dot edu) (I would like more background information)
  • Jiang Danning (jiangdn at cn dot ibm dot com)
  • J.-S. Roger Jang (jang at cs dot nthu dot edu dot tw)
  • Alexandra Uitdenbogerd (sandrau at rmit dot edu dot au)
  • Ming Xuan, Zou (felix at wayne dot cs dot nthu dot edu dot tw)
  • Shen huang & Lei wang ((shenhuang,leiwang) at hitic dot ia dot ac dot cn )

Interface suggestion commented by Xiao Wu

1. Database indexing/building. Calling format should look like

indexing %db_list% %dir_workspace_root%

where db_list is the input list of database midi files named as uniq_key.mid. For example:

./QBSH/Database/00001.mid
./QBSH/Database/00002.mid
./QBSH/Database/00003.mid
./QBSH/Database/00004.mid
...

Output indexed files are placed into dir_workspace_root.

2. Note transcriber. Calling format:

note_transcriber %query.list% %dir_query_note%

Each input file dir_query/query_xxxxx.wav in query.list outputs a transcription dir_query_note/query_xxxxx.note, and each text line of the generated note file represents a query note formated as note %onset_time% %duration% %midi_note%. Example:

note 2000 250 62.25
note 2250 250 62.03
note 2500 500 64.42
note 3200 220 62.30
...

Here onset_time and duration are counted in millisecond.

3. Note matcher. Calling format:

note_matcher %note.list% %result%

where note.list looks like

dir_query_note/query_00001.note
dir_query_note/query_00002.note
dir_query_note/query_00003.note
...

and the result file gives top-20 candidates(if has) for each query:

query_00001: 00025 01003 02200 ... 
query_00002: 01547 02313 07653 ... 
query_00003: 03142 00320 00973 ... 
...

4. Pitch tracker. Calling format:

pitch_tracker %query.list% %dir_query_pitch%

Each input file dir_query/query_xxxxx.wav in query.list outputs a corresponding transcription dir_query_pitch/query_xxxxx.pitch which gives the pitch sequence in midi note scale with the resolution of 10ms:

0
0
62.23
62.25
62.21
...

Thus a query with x seconds should output a pitch file with 100*x lines. Places of silence/rest are set to be 0.

5. Pitch matcher. Similar with note matcher:

 pitch_matcher %pitch.list% %result%

6. Hybrid matcher. Both note and pitch are utilized. Calling format:

 note_pitch_matcher %note.list% %pitch.list% %result%

Comments from Xiao Wu

ThinkIT QBH corpus now is available at TITcorpus. In all there are 355 audio files along with 106 MIDI files.

Comments from Roger Jang

Is there any time constraint on running each query against 5000+ MIDIs? [It seems a little bit quiet on this page.]

Comments from Xiao Wu

To Roger: Stephen suggested to use the "cleaned version" of essen folks(2000+ MIDIs which are also adopted in MIREX QBSH 2006). So the problem size is not that large.

Comments from Xiao Wu

It should be noticed the list parameters in command line such as "query.list" and "note.list" are list FILEs instead of multiline arguments. Thanks Carlos for pointing out this ambiguity.