Difference between revisions of "2008:Query by Singing/Humming"

From MIREX Wiki
 
(36 intermediate revisions by 8 users not shown)
Line 2: Line 2:
 
The goal of the Query-by-Singing/Humming (QBSH) task is the evaluation of MIR systems that take as query input queries sung or hummed by real-world users. More information can be found in:
 
The goal of the Query-by-Singing/Humming (QBSH) task is the evaluation of MIR systems that take as query input queries sung or hummed by real-world users. More information can be found in:
  
* [[2006:QBSH:_Query-by-Singing/Humming MIREX2006 QBSH Task Proposal]]
+
* [[2006:QBSH:_Query-by-Singing/Humming]]
* [[2006:QBSH_Discussion_Page MIREX2006 QBSH Task Discussion]]
+
* [[2006:QBSH_Discussion_Page]]
  
 
Please feel free to edit this page.
 
Please feel free to edit this page.
Line 10: Line 10:
 
1. Roger Jang's corpus ([http://neural.cs.nthu.edu.tw/jang2/dataSet/childSong4public/QBSH-corpus/ MIREX2006 QBSH corpus]) which is comprised of 2797 queries along with 48 ground-truth MIDI files. All queries are from the beginning of references.  
 
1. Roger Jang's corpus ([http://neural.cs.nthu.edu.tw/jang2/dataSet/childSong4public/QBSH-corpus/ MIREX2006 QBSH corpus]) which is comprised of 2797 queries along with 48 ground-truth MIDI files. All queries are from the beginning of references.  
  
2. ThinkIT corpus comprised of 355 queries and 106 monophonic ground-truth midi files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee. This corpus will be published after the task running.
+
2. ThinkIT corpus comprised of 355 queries and 106 monophonic ground-truth MIDI files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee. This corpus will be published after the task running.
  
 
3. Noise MIDI will be the 5000+ Essen collection(can be accessed from http://www.esac-data.org/).
 
3. Noise MIDI will be the 5000+ Essen collection(can be accessed from http://www.esac-data.org/).
Line 16: Line 16:
 
To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus.
 
To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus.
  
== Query Data Contribution ==
+
== Evaluation Corpus Contribution ==
Every participant will be asked to contribute 100~200 wave queries as test data. These test data will be released after the competition as a public-domain QBSH dataset. Programs for recording wave queries will be provided shortly.
+
Every participant will be asked to contribute 100~200 wave queries (8k 16bits) as well as the ground truth MIDI as test data. Please make your contributed data conformed to the format used in the ThinkIT corpus ([http://159.226.60.224/en/Thinkit.QBH.corpus.rar TITcorpus]). These test data will be released after the competition as a public-domain QBSH dataset.
 +
 
 +
[http://mirlab.org/users/davidson833/code/downloads/QBSH_RecordingProgram.rar Here] is a simple tool for recording query data. You may need to have .NET 2.0 or above installed in your system in order to run this program. The generated files conform to the format used in the ThinkIT corpus. Of course you are also welcomed to use your own program to record the query data.
  
 
== Task description ==  
 
== Task description ==  
 
Classic QBSH evaluation:
 
Classic QBSH evaluation:
 
* '''Input''': human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
 
* '''Input''': human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
* '''Database''': ground-truth and noise midi files(which are monophonic). Comprised of 48+106 Roger Jang's and ThinkIT's ground-truth along with a cleaned version of Essen Database(2000+ MIDIs which are used last year)  
+
* '''Database''': ground-truth and noise MIDI files(which are monophonic). Comprised of 48+106 Roger Jang's and ThinkIT's ground-truth along with a cleaned version of Essen Database(2000+ MIDIs which are used last year)  
 
* '''Output''': top-20 candidate list.  
 
* '''Output''': top-20 candidate list.  
* '''Evaluation''': Mean Reciprocal Rank (MRR) and Top-X hit rate.
+
* '''Evaluation''': Top-10 hit rate (1 point is scored for a hit in the top 10 and 0 is scored otherwise).
 +
 
 +
To make algorithms able to share intermediate steps, participants are encouraged to submit separate tracker and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So trackers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination.
 +
 
 +
== Interface I: Breakdown Version ==
 +
 
 +
The following was based on the suggestion by Xiao Wu last year with some modifications.
 +
 
 +
1. Database indexing/building. Calling format should look like
 +
 
 +
indexing %dbMidi.list% %dir_workspace_root%
 +
 
 +
where %dbMidi.list% is the input list of database midi files named as uniq_key.mid. For example:
 +
 
 +
./QBSH/Database/00001.mid
 +
./QBSH/Database/00002.mid
 +
./QBSH/Database/00003.mid
 +
./QBSH/Database/00004.mid
 +
...
 +
 
 +
Output indexed files are placed into %dir_workspace_root%.
 +
 
 +
2. Pitch tracker. Calling format:
 +
 
 +
pitch_tracker %queryWave.list% %dir_query_pitch%
 +
 
 +
Each input file dir_query/query_xxxxx.wav in %queryWave.list% outputs a corresponding transcription %dir_query_pitch%/query_xxxxx.pitch which gives the pitch sequence in midi note scale with the resolution of 10ms:
 +
 
 +
0
 +
0
 +
62.23
 +
62.25
 +
62.21
 +
...
 +
 
 +
Thus a query with x seconds should output a pitch file with 100*x lines. Places of silence/rest are set to be 0. 
 +
 
 +
3. Pitch matcher. Calling format:
 +
 
 +
pitch_matcher %dbMidi.list% %queryPitch.list% %resultFile%
 +
 
 +
where %queryPitch.list% looks like
 +
 
 +
dir_query_pitch/query_00001.pitch
 +
dir_query_pitch/query_00002.pitch
 +
dir_query_pitch/query_00003.pitch
 +
...
 +
 
 +
and the result file gives top-20 candidates(if has) for each query:
 +
 
 +
query_00001: 00025 01003 02200 ...
 +
query_00002: 01547 02313 07653 ...
 +
query_00003: 03142 00320 00973 ...
 +
...
 +
 
 +
== Interface II: Integrated Version ==
 +
If you want to pack everything together, the calling format should be much simpler:
 +
 
 +
qbshMainProgram %dbMidi.list% %queryWave.list% %resultFile% %dir_workspace_root%
  
To make algorithms able to share intermediate steps, participants are encouraged to submit separate transcriber and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So transcribers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination. Besides, note based approaches (symbolic approaches) and pitch contour based approaches (non-symbolic approaches?) are compared.
+
You can use %dir_workspace_root% to store any temporary indexing/database structures. The result file should have the same format as mentioned previously.
  
 
== Participants ==
 
== Participants ==
 
If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list  
 
If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list  
  
* Liang-Yu Davidson Chen (davidson833 at mirlab dot org)
+
* Liang-Yu Davidson Chen and Jyh-Shing Roger Jang (National Tsing Hua University), davidson833 at mirlab dot org, jang at cs dot nthu dot edu dot tw
 +
* Lei Wang (leiwang.mir at gmail dot com)
 +
* Xiao Wu (xwu2006 at gmail dot com)
 +
* Matti Ryynänen and Anssi Klapuri (Tampere University of Technology), matti.ryynanen <at> tut.fi, anssi.klapuri <at> tut.fi
 +
 
 +
== Xiao Wu's Comments ==
 +
In my opinion, QBSH (even for QBH in monophonic database) is still far from "a solved problem". Many problems are still chanllenging our systems (robustness in noise environment, efficiency in 10000-larger database, etc.). So, this year we may setup a more tough test for the participants.
 +
 
 +
== Lei Wang's Comments ==
 +
I looked through the Essen collection today, and i found that there are some songs is the same as the ground-truth midi in Roger's Corpus. These midi files should not be added as the noise midi, right?
 +
 
 +
== Davidson's Comments ==
 +
Since there are 5000+ noise MIDI in the Essen's collection, and the MIREX committee will add more extra MIDIs, the effect of repetitive MIDIs should be minimal.
 +
 
 +
And in terms of the evaluation method, shall we use Mean Reciprocal Rank (MRR) as last year?
 +
 
 +
== Mert`s Comments ==
 +
Hi everyone. The noise collection used in this tasks are a subset of the Essen collection. 2000 files that were selected manually  that are different than the queries.
 +
 
 +
== Davidson's Comments ==
 +
A method to alleviate the repetitive MIDI problem is to change the evaluation method to "hit in the top 10", i.e. 1 point is scored if the ground truth MIDI is selected in the top 10 or 0 points otherwise.
 +
 
 +
== Matti's Comments (August 22, 2008) ==
 +
Hi everyone. I noticed that the submission format just changed for the pitch tracking part from <onset> <pitch> format to <frame_time> <pitch> where frame times are on a discrete time grid. However, I already submitted my codes with the previous format and also my method requires note onsets instead of frame-wise pitch tracks. I'd prefer the <onset> <pitch> format, since it allows using both note and pitch track based approaches. Anyhow, my submitted code will run as it is and I hope that this is no problem for the evaluation.
 +
 
 +
== Leiwang's Comments(August 23, 2008) ==
 +
Hi, everyone. I found that the evaluation method this year is "hit in the top 10". It is quite different from the previous. Do we need additional methods for evaluation? such as "top 1 hit", "top 3 hit" or "MRR"?
 +
 
 +
== Thinkit's query contribution ==
 +
ThinkIT QBH corpus is available at [http://159.226.60.224/en/Thinkit.QBH.corpus.rar TITcorpus]. (355 audio files along with 106 MIDI files)
 +
 
 +
== Morten's Comments(August 29, 2008) ==
 +
Where can we find the cleaned version of Essen Database? At http://www.esac-data.org/ I can find all the files but how is the "cleaned version" defined? It would be nice if someone could provide the cleaned version in MIDI.

Latest revision as of 04:14, 29 August 2008

Status

The goal of the Query-by-Singing/Humming (QBSH) task is the evaluation of MIR systems that take as query input queries sung or hummed by real-world users. More information can be found in:

Please feel free to edit this page.

Query Data

1. Roger Jang's corpus (MIREX2006 QBSH corpus) which is comprised of 2797 queries along with 48 ground-truth MIDI files. All queries are from the beginning of references.

2. ThinkIT corpus comprised of 355 queries and 106 monophonic ground-truth MIDI files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee. This corpus will be published after the task running.

3. Noise MIDI will be the 5000+ Essen collection(can be accessed from http://www.esac-data.org/).

To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus.

Evaluation Corpus Contribution

Every participant will be asked to contribute 100~200 wave queries (8k 16bits) as well as the ground truth MIDI as test data. Please make your contributed data conformed to the format used in the ThinkIT corpus (TITcorpus). These test data will be released after the competition as a public-domain QBSH dataset.

Here is a simple tool for recording query data. You may need to have .NET 2.0 or above installed in your system in order to run this program. The generated files conform to the format used in the ThinkIT corpus. Of course you are also welcomed to use your own program to record the query data.

Task description

Classic QBSH evaluation:

  • Input: human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
  • Database: ground-truth and noise MIDI files(which are monophonic). Comprised of 48+106 Roger Jang's and ThinkIT's ground-truth along with a cleaned version of Essen Database(2000+ MIDIs which are used last year)
  • Output: top-20 candidate list.
  • Evaluation: Top-10 hit rate (1 point is scored for a hit in the top 10 and 0 is scored otherwise).

To make algorithms able to share intermediate steps, participants are encouraged to submit separate tracker and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So trackers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination.

Interface I: Breakdown Version

The following was based on the suggestion by Xiao Wu last year with some modifications.

1. Database indexing/building. Calling format should look like

indexing %dbMidi.list% %dir_workspace_root%

where %dbMidi.list% is the input list of database midi files named as uniq_key.mid. For example:

./QBSH/Database/00001.mid
./QBSH/Database/00002.mid
./QBSH/Database/00003.mid
./QBSH/Database/00004.mid
...

Output indexed files are placed into %dir_workspace_root%.

2. Pitch tracker. Calling format:

pitch_tracker %queryWave.list% %dir_query_pitch%

Each input file dir_query/query_xxxxx.wav in %queryWave.list% outputs a corresponding transcription %dir_query_pitch%/query_xxxxx.pitch which gives the pitch sequence in midi note scale with the resolution of 10ms:

0
0
62.23
62.25
62.21
...

Thus a query with x seconds should output a pitch file with 100*x lines. Places of silence/rest are set to be 0.

3. Pitch matcher. Calling format:

pitch_matcher %dbMidi.list% %queryPitch.list% %resultFile%

where %queryPitch.list% looks like

dir_query_pitch/query_00001.pitch
dir_query_pitch/query_00002.pitch
dir_query_pitch/query_00003.pitch
...

and the result file gives top-20 candidates(if has) for each query:

query_00001: 00025 01003 02200 ... 
query_00002: 01547 02313 07653 ... 
query_00003: 03142 00320 00973 ... 
...

Interface II: Integrated Version

If you want to pack everything together, the calling format should be much simpler:

qbshMainProgram %dbMidi.list% %queryWave.list% %resultFile% %dir_workspace_root%

You can use %dir_workspace_root% to store any temporary indexing/database structures. The result file should have the same format as mentioned previously.

Participants

If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list

  • Liang-Yu Davidson Chen and Jyh-Shing Roger Jang (National Tsing Hua University), davidson833 at mirlab dot org, jang at cs dot nthu dot edu dot tw
  • Lei Wang (leiwang.mir at gmail dot com)
  • Xiao Wu (xwu2006 at gmail dot com)
  • Matti Ryyn├ñnen and Anssi Klapuri (Tampere University of Technology), matti.ryynanen <at> tut.fi, anssi.klapuri <at> tut.fi

Xiao Wu's Comments

In my opinion, QBSH (even for QBH in monophonic database) is still far from "a solved problem". Many problems are still chanllenging our systems (robustness in noise environment, efficiency in 10000-larger database, etc.). So, this year we may setup a more tough test for the participants.

Lei Wang's Comments

I looked through the Essen collection today, and i found that there are some songs is the same as the ground-truth midi in Roger's Corpus. These midi files should not be added as the noise midi, right?

Davidson's Comments

Since there are 5000+ noise MIDI in the Essen's collection, and the MIREX committee will add more extra MIDIs, the effect of repetitive MIDIs should be minimal.

And in terms of the evaluation method, shall we use Mean Reciprocal Rank (MRR) as last year?

Mert`s Comments

Hi everyone. The noise collection used in this tasks are a subset of the Essen collection. 2000 files that were selected manually that are different than the queries.

Davidson's Comments

A method to alleviate the repetitive MIDI problem is to change the evaluation method to "hit in the top 10", i.e. 1 point is scored if the ground truth MIDI is selected in the top 10 or 0 points otherwise.

Matti's Comments (August 22, 2008)

Hi everyone. I noticed that the submission format just changed for the pitch tracking part from <onset> <pitch> format to <frame_time> <pitch> where frame times are on a discrete time grid. However, I already submitted my codes with the previous format and also my method requires note onsets instead of frame-wise pitch tracks. I'd prefer the <onset> <pitch> format, since it allows using both note and pitch track based approaches. Anyhow, my submitted code will run as it is and I hope that this is no problem for the evaluation.

Leiwang's Comments(August 23, 2008)

Hi, everyone. I found that the evaluation method this year is "hit in the top 10". It is quite different from the previous. Do we need additional methods for evaluation? such as "top 1 hit", "top 3 hit" or "MRR"?

Thinkit's query contribution

ThinkIT QBH corpus is available at TITcorpus. (355 audio files along with 106 MIDI files)

Morten's Comments(August 29, 2008)

Where can we find the cleaned version of Essen Database? At http://www.esac-data.org/ I can find all the files but how is the "cleaned version" defined? It would be nice if someone could provide the cleaned version in MIDI.