Difference between revisions of "2007:Symbolic Melodic Similarity"
(→Monophonic-monophonic) |
(Data) |
||
Line 23: | Line 23: | ||
* 5,274 tunes belonging to the Essen folksong collection. The tunes are in standard MIDI file format. [http://www.ldc.usb.ve/~cgomez/essen.tar.gz Download] (< 1 MB) | * 5,274 tunes belonging to the Essen folksong collection. The tunes are in standard MIDI file format. [http://www.ldc.usb.ve/~cgomez/essen.tar.gz Download] (< 1 MB) | ||
− | === Task | + | === Task 2a === |
− | * | + | * 1,000 polyphonic Karaoke files. [http://www.ldc.usb.ve/~cgomez/SMS-kar.tgz Download] (9.5 MB) |
− | + | ||
− | * | + | === Task 2b === |
− | + | ||
+ | * 10,000 mostly polyphonic MIDI files crawled from the Web. [http://www.ldc.usb.ve/~cgomez/SMS-mix.tgz Download] (69 MB) | ||
== Evaluation and ground truth == | == Evaluation and ground truth == |
Revision as of 21:32, 10 August 2007
Contents
Overview
This page is devoted to discussions of the MIREX 07 Symbolic Melodic Similarity contest. Discussions on the MIREX 07 Symbolic Melodic Similarity contest planning list will be briefly digested on this page. A full digest of the discussions is available to subscribers from the MIREX 07 Symbolic Melodic Similarity contest planning list archives. You can subscribe to this list to participate in the discussion.
Additionally, you can read information about the Symbolic Melodic Similarity tasks that were run in the 2005 and 2006 MIREX editions.
Task description
Retrieve the most similar items from a collection of symbolic documents, given a query, and rank them by melodic similarity. The following tasks could be defined this year:
Task 1: Monophonic to monophonic. Both the query and the documents in the collection will be monophonic.
Task 2: Monophonic to polyphonic. The documents will be polyphonic (i.e. can have simultaneous notes), but the query will still be monophonic.
Task 3: Polyphonic to polyphonic. Both the query and documents will be polyphonic.
For now, the description of these tasks is intentionally open; the details are to be determined in the discussion section. Also note that the realization of these tasks is subject to the numbers of participants interested in each task.
Data
Task 1
- 5,274 tunes belonging to the Essen folksong collection. The tunes are in standard MIDI file format. Download (< 1 MB)
Task 2a
- 1,000 polyphonic Karaoke files. Download (9.5 MB)
Task 2b
- 10,000 mostly polyphonic MIDI files crawled from the Web. Download (69 MB)
Evaluation and ground truth
The same method for building the ground truth as last year can be used. This method has the advantage that no ground truth needs to be built in advance. After the algorithms have been submitted, their results are pooled for every query, and human evaluators are asked to judge the relevance of the matches for some queries. To make this evaluation feasible, it is important that the algorithms do not only return the names of the matching MIDI files for task 2 and 3, but also where the matching fragment starts and ends in the matching MIDI file.
Potential participants
If you think there is a slight chance that you might consider participating, please add your name here. Please indicate as well in which tasks you wish to participate.
- Carlos G├│mez (monophonic-to-monophonic)
- Rainer Typke (monophonic to monophonic and monophonic to polyphonic, maybe polyphonic-polyphonic)
- Ean Nugent (polyphonic-to-polyphonic) (in_christ_by_grace@yahoo.com)
- Sven Ahlbäck (monophonic-to-monophonic and maybe monophonic-to-polyphonic)
- Kjell Lemström and Niko Mikkilä (all tasks)
Confirmed participants
Monophonic-monophonic
- Carlos G├│mez
- Kjell Lemström and Niko Mikkilä
- Mohamed Sordo and Maarten Grachten
Monophonic-polyphonic
- Kjell Lemström and Niko Mikkilä
Polyphonic-polyphonic
- Kjell Lemström and Niko Mikkilä
- Ean Nugent
Discussion
Comments from Carlos G├│mez
For the monophonic task, an interesting variation this year would be to use a collection of different source than RISM. As RISM snippets have been used in the two previous competitions, past participants can have more interest in participating this year if data different in some aspect is used. These are some possible sources for the collection this year:
- The Themefinder database (www.themefinder.org) is comprised of three collections, which consist of classical themes, folksong themes and renaissance incipits. Incipits in the renaissance collection are tagged with their RISM number, so this collection is probably similar to the previously ones used, but the other two collections could be tried. The difference lies in that the music in those collections comes from other periods or genres, but the format is still the same (around 15 notes fragments).
- The Meldex (NZDL) database contains around 10,000 melodies [1], which appear to be longer on average than the Themefinder and RISM snippets.
- A perhaps less feasible option is that there exists a digital version of the Barlow and Morgenstern dictionary of musical themes, that can be browsed at www.multimedialibrary.com. This database is copyrighted by the publishers of the book and the authors of the website, but we could try to ask permission to use it.
- Another large monophonic database is HymnQuest.
These references where taken from the list of candidate music IR test collections maintained by Donald Byrd.
Comments from Rainer Typke
- I would suggest, just for the sake of clarity, to add the word "melodic" to the task description: "...Retrieve the most similar items from a collection of symbolic documents, given a query, and rank them by *melodic* similarity." I think this is intended with the sentence anyways.
- Would it possibly make sense to merge this task with the Query by Singing/Humming task? This could save everybody some work and also lead to interesting combinations of different algorithms.