Difference between revisions of "2008:Query by Singing/Humming"

From MIREX Wiki
m (hit in the top 10)
 
(16 intermediate revisions by 4 users not shown)
Line 12: Line 12:
 
2. ThinkIT corpus comprised of 355 queries and 106 monophonic ground-truth MIDI files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee. This corpus will be published after the task running.
 
2. ThinkIT corpus comprised of 355 queries and 106 monophonic ground-truth MIDI files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee. This corpus will be published after the task running.
  
3. Noise MIDI will be the 2000 Essen collection(can be accessed from http://www.esac-data.org/).
+
3. Noise MIDI will be the 5000+ Essen collection(can be accessed from http://www.esac-data.org/).
  
 
To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus.
 
To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus.
Line 18: Line 18:
 
== Evaluation Corpus Contribution ==
 
== Evaluation Corpus Contribution ==
 
Every participant will be asked to contribute 100~200 wave queries (8k 16bits) as well as the ground truth MIDI as test data. Please make your contributed data conformed to the format used in the ThinkIT corpus ([http://159.226.60.224/en/Thinkit.QBH.corpus.rar TITcorpus]). These test data will be released after the competition as a public-domain QBSH dataset.
 
Every participant will be asked to contribute 100~200 wave queries (8k 16bits) as well as the ground truth MIDI as test data. Please make your contributed data conformed to the format used in the ThinkIT corpus ([http://159.226.60.224/en/Thinkit.QBH.corpus.rar TITcorpus]). These test data will be released after the competition as a public-domain QBSH dataset.
 +
 +
[http://mirlab.org/users/davidson833/code/downloads/QBSH_RecordingProgram.rar Here] is a simple tool for recording query data. You may need to have .NET 2.0 or above installed in your system in order to run this program. The generated files conform to the format used in the ThinkIT corpus. Of course you are also welcomed to use your own program to record the query data.
  
 
== Task description ==  
 
== Task description ==  
Line 24: Line 26:
 
* '''Database''': ground-truth and noise MIDI files(which are monophonic). Comprised of 48+106 Roger Jang's and ThinkIT's ground-truth along with a cleaned version of Essen Database(2000+ MIDIs which are used last year)  
 
* '''Database''': ground-truth and noise MIDI files(which are monophonic). Comprised of 48+106 Roger Jang's and ThinkIT's ground-truth along with a cleaned version of Essen Database(2000+ MIDIs which are used last year)  
 
* '''Output''': top-20 candidate list.  
 
* '''Output''': top-20 candidate list.  
* '''Evaluation''': Mean Reciprocal Rank (MRR) and Top-X hit rate.
+
* '''Evaluation''': Top-10 hit rate (1 point is scored for a hit in the top 10 and 0 is scored otherwise).
  
To make algorithms able to share intermediate steps, participants are encouraged to submit separate transcriber and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So transcribers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination.
+
To make algorithms able to share intermediate steps, participants are encouraged to submit separate tracker and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So trackers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination.
  
== Interface ==
+
== Interface I: Breakdown Version ==
  
 
The following was based on the suggestion by Xiao Wu last year with some modifications.
 
The following was based on the suggestion by Xiao Wu last year with some modifications.
Line 34: Line 36:
 
1. Database indexing/building. Calling format should look like  
 
1. Database indexing/building. Calling format should look like  
  
  indexing %db_list% %dir_workspace_root%
+
  indexing %dbMidi.list% %dir_workspace_root%
  
where db_list is the input list of database midi files named as uniq_key.mid. For example:  
+
where %dbMidi.list% is the input list of database midi files named as uniq_key.mid. For example:  
  
 
  ./QBSH/Database/00001.mid
 
  ./QBSH/Database/00001.mid
Line 44: Line 46:
 
  ...
 
  ...
  
Output indexed files are placed into dir_workspace_root.  
+
Output indexed files are placed into %dir_workspace_root%.  
  
 
2. Pitch tracker. Calling format:  
 
2. Pitch tracker. Calling format:  
  
  pitch_transcriber %query.list% %dir_query_pitch%
+
  pitch_tracker %queryWave.list% %dir_query_pitch%
  
Each input file dir_query/query_xxxxx.wav in query.list outputs a transcription dir_query_pitch/query_xxxxx.pitch, and each text line of the generated pitch file represents a query pitch formated as pitch %onset_time% %duration% %midi_pitch%. Example:  
+
Each input file dir_query/query_xxxxx.wav in %queryWave.list% outputs a corresponding transcription %dir_query_pitch%/query_xxxxx.pitch which gives the pitch sequence in midi note scale with the resolution of 10ms:  
  
  pitch 2000 250 297.93
+
  0
  pitch 2250 250 294.17
+
0
  pitch 2500 500 337.72
+
  62.23
  pitch 3200 220 298.80
+
  62.25
 +
  62.21
 
  ...
 
  ...
  
Here onset_time and duration are counted in millisecond.  
+
Thus a query with x seconds should output a pitch file with 100*x lines. Places of silence/rest are set to be 0.
  
 
3. Pitch matcher. Calling format:  
 
3. Pitch matcher. Calling format:  
  
  pitch_matcher %db_list% %pitch.list% %result%
+
  pitch_matcher %dbMidi.list% %queryPitch.list% %resultFile%
  
where pitch.list looks like  
+
where %queryPitch.list% looks like  
  
 
  dir_query_pitch/query_00001.pitch
 
  dir_query_pitch/query_00001.pitch
Line 77: Line 80:
 
  query_00003: 03142 00320 00973 ...  
 
  query_00003: 03142 00320 00973 ...  
 
  ...
 
  ...
 +
 +
== Interface II: Integrated Version ==
 +
If you want to pack everything together, the calling format should be much simpler:
 +
 +
qbshMainProgram %dbMidi.list% %queryWave.list% %resultFile% %dir_workspace_root%
 +
 +
You can use %dir_workspace_root% to store any temporary indexing/database structures. The result file should have the same format as mentioned previously.
  
 
== Participants ==
 
== Participants ==
 
If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list  
 
If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list  
  
* Liang-Yu Davidson Chen (davidson833 at mirlab dot org)
+
* Liang-Yu Davidson Chen and Jyh-Shing Roger Jang (National Tsing Hua University), davidson833 at mirlab dot org, jang at cs dot nthu dot edu dot tw
 
* Lei Wang (leiwang.mir at gmail dot com)
 
* Lei Wang (leiwang.mir at gmail dot com)
 
* Xiao Wu (xwu2006 at gmail dot com)
 
* Xiao Wu (xwu2006 at gmail dot com)
Line 102: Line 112:
 
== Davidson's Comments ==
 
== Davidson's Comments ==
 
A method to alleviate the repetitive MIDI problem is to change the evaluation method to "hit in the top 10", i.e. 1 point is scored if the ground truth MIDI is selected in the top 10 or 0 points otherwise.
 
A method to alleviate the repetitive MIDI problem is to change the evaluation method to "hit in the top 10", i.e. 1 point is scored if the ground truth MIDI is selected in the top 10 or 0 points otherwise.
 +
 +
== Matti's Comments (August 22, 2008) ==
 +
Hi everyone. I noticed that the submission format just changed for the pitch tracking part from <onset> <pitch> format to <frame_time> <pitch> where frame times are on a discrete time grid. However, I already submitted my codes with the previous format and also my method requires note onsets instead of frame-wise pitch tracks. I'd prefer the <onset> <pitch> format, since it allows using both note and pitch track based approaches. Anyhow, my submitted code will run as it is and I hope that this is no problem for the evaluation.
 +
 +
== Leiwang's Comments(August 23, 2008) ==
 +
Hi, everyone. I found that the evaluation method this year is "hit in the top 10". It is quite different from the previous. Do we need additional methods for evaluation? such as "top 1 hit", "top 3 hit" or "MRR"?
  
 
== Thinkit's query contribution ==  
 
== Thinkit's query contribution ==  
 
ThinkIT QBH corpus is available at [http://159.226.60.224/en/Thinkit.QBH.corpus.rar TITcorpus]. (355 audio files along with 106 MIDI files)
 
ThinkIT QBH corpus is available at [http://159.226.60.224/en/Thinkit.QBH.corpus.rar TITcorpus]. (355 audio files along with 106 MIDI files)
 +
 +
== Morten's Comments(August 29, 2008) ==
 +
Where can we find the cleaned version of Essen Database? At http://www.esac-data.org/ I can find all the files but how is the "cleaned version" defined? It would be nice if someone could provide the cleaned version in MIDI.

Latest revision as of 05:14, 29 August 2008

Status

The goal of the Query-by-Singing/Humming (QBSH) task is the evaluation of MIR systems that take as query input queries sung or hummed by real-world users. More information can be found in:

Please feel free to edit this page.

Query Data

1. Roger Jang's corpus (MIREX2006 QBSH corpus) which is comprised of 2797 queries along with 48 ground-truth MIDI files. All queries are from the beginning of references.

2. ThinkIT corpus comprised of 355 queries and 106 monophonic ground-truth MIDI files (with MIDI 0 or 1 format). There are no "singing from beginning" gurantee. This corpus will be published after the task running.

3. Noise MIDI will be the 5000+ Essen collection(can be accessed from http://www.esac-data.org/).

To build a large test set which can reflect real-world queries, it is suggested that every participant makes a contribution to the evaluation corpus.

Evaluation Corpus Contribution

Every participant will be asked to contribute 100~200 wave queries (8k 16bits) as well as the ground truth MIDI as test data. Please make your contributed data conformed to the format used in the ThinkIT corpus (TITcorpus). These test data will be released after the competition as a public-domain QBSH dataset.

Here is a simple tool for recording query data. You may need to have .NET 2.0 or above installed in your system in order to run this program. The generated files conform to the format used in the ThinkIT corpus. Of course you are also welcomed to use your own program to record the query data.

Task description

Classic QBSH evaluation:

  • Input: human singing/humming snippets (.wav). Queries are from Roger Jang's corpus and ThinkIT corpus.
  • Database: ground-truth and noise MIDI files(which are monophonic). Comprised of 48+106 Roger Jang's and ThinkIT's ground-truth along with a cleaned version of Essen Database(2000+ MIDIs which are used last year)
  • Output: top-20 candidate list.
  • Evaluation: Top-10 hit rate (1 point is scored for a hit in the top 10 and 0 is scored otherwise).

To make algorithms able to share intermediate steps, participants are encouraged to submit separate tracker and matcher modules instead of integrated ones, which is according to Rainer Typke's suggestion. So trackers and matchers from different submissions could work together with the same pre-defined interface and thus for us it's possible to find the best combination.

Interface I: Breakdown Version

The following was based on the suggestion by Xiao Wu last year with some modifications.

1. Database indexing/building. Calling format should look like

indexing %dbMidi.list% %dir_workspace_root%

where %dbMidi.list% is the input list of database midi files named as uniq_key.mid. For example:

./QBSH/Database/00001.mid
./QBSH/Database/00002.mid
./QBSH/Database/00003.mid
./QBSH/Database/00004.mid
...

Output indexed files are placed into %dir_workspace_root%.

2. Pitch tracker. Calling format:

pitch_tracker %queryWave.list% %dir_query_pitch%

Each input file dir_query/query_xxxxx.wav in %queryWave.list% outputs a corresponding transcription %dir_query_pitch%/query_xxxxx.pitch which gives the pitch sequence in midi note scale with the resolution of 10ms:

0
0
62.23
62.25
62.21
...

Thus a query with x seconds should output a pitch file with 100*x lines. Places of silence/rest are set to be 0.

3. Pitch matcher. Calling format:

pitch_matcher %dbMidi.list% %queryPitch.list% %resultFile%

where %queryPitch.list% looks like

dir_query_pitch/query_00001.pitch
dir_query_pitch/query_00002.pitch
dir_query_pitch/query_00003.pitch
...

and the result file gives top-20 candidates(if has) for each query:

query_00001: 00025 01003 02200 ... 
query_00002: 01547 02313 07653 ... 
query_00003: 03142 00320 00973 ... 
...

Interface II: Integrated Version

If you want to pack everything together, the calling format should be much simpler:

qbshMainProgram %dbMidi.list% %queryWave.list% %resultFile% %dir_workspace_root%

You can use %dir_workspace_root% to store any temporary indexing/database structures. The result file should have the same format as mentioned previously.

Participants

If you think there is a slight chance that you might want to participate, please add your name and e-mail address to this list

  • Liang-Yu Davidson Chen and Jyh-Shing Roger Jang (National Tsing Hua University), davidson833 at mirlab dot org, jang at cs dot nthu dot edu dot tw
  • Lei Wang (leiwang.mir at gmail dot com)
  • Xiao Wu (xwu2006 at gmail dot com)
  • Matti Ryyn├ñnen and Anssi Klapuri (Tampere University of Technology), matti.ryynanen <at> tut.fi, anssi.klapuri <at> tut.fi

Xiao Wu's Comments

In my opinion, QBSH (even for QBH in monophonic database) is still far from "a solved problem". Many problems are still chanllenging our systems (robustness in noise environment, efficiency in 10000-larger database, etc.). So, this year we may setup a more tough test for the participants.

Lei Wang's Comments

I looked through the Essen collection today, and i found that there are some songs is the same as the ground-truth midi in Roger's Corpus. These midi files should not be added as the noise midi, right?

Davidson's Comments

Since there are 5000+ noise MIDI in the Essen's collection, and the MIREX committee will add more extra MIDIs, the effect of repetitive MIDIs should be minimal.

And in terms of the evaluation method, shall we use Mean Reciprocal Rank (MRR) as last year?

Mert`s Comments

Hi everyone. The noise collection used in this tasks are a subset of the Essen collection. 2000 files that were selected manually that are different than the queries.

Davidson's Comments

A method to alleviate the repetitive MIDI problem is to change the evaluation method to "hit in the top 10", i.e. 1 point is scored if the ground truth MIDI is selected in the top 10 or 0 points otherwise.

Matti's Comments (August 22, 2008)

Hi everyone. I noticed that the submission format just changed for the pitch tracking part from <onset> <pitch> format to <frame_time> <pitch> where frame times are on a discrete time grid. However, I already submitted my codes with the previous format and also my method requires note onsets instead of frame-wise pitch tracks. I'd prefer the <onset> <pitch> format, since it allows using both note and pitch track based approaches. Anyhow, my submitted code will run as it is and I hope that this is no problem for the evaluation.

Leiwang's Comments(August 23, 2008)

Hi, everyone. I found that the evaluation method this year is "hit in the top 10". It is quite different from the previous. Do we need additional methods for evaluation? such as "top 1 hit", "top 3 hit" or "MRR"?

Thinkit's query contribution

ThinkIT QBH corpus is available at TITcorpus. (355 audio files along with 106 MIDI files)

Morten's Comments(August 29, 2008)

Where can we find the cleaned version of Essen Database? At http://www.esac-data.org/ I can find all the files but how is the "cleaned version" defined? It would be nice if someone could provide the cleaned version in MIDI.