Difference between revisions of "2007:Symbolic Melodic Similarity"

From MIREX Wiki
(Data)
Line 25: Line 25:
 
=== Task 2 ===
 
=== Task 2 ===
  
* 10,000 mostly polyphonic MIDI files crawled from the Web. [http://www.ldc.usb.ve/~cgomez/mixed_polyphonic.tar.gz Download] (68 MB)
+
* '''Collection 1:''' 10,000 mostly polyphonic MIDI files crawled from the Web. [http://www.ldc.usb.ve/~cgomez/mixed_polyphonic.tar.gz Download] (68 MB)
 
+
* '''Collection 2:''' 500 MIDI files of classical works and individual movements. These files have two sources: the [http://www.musedata.org MuseData] database and the [http://www.mutopia.org Mutopia Project] library.
* An additional polyphonic collection of classical works in MIDI format, consisting of about 1,000 movements, should be available soon.
+
** ''MuseData files'' <br />'''Important notice:''' This data comes from CCARH and it is used under a license signed by me (Carlos G├│mez). The data can be downloaded from the link below, but it may not be redistributed. See the full text of the [http://www.musedata.org/legal/lcr.html license].<br /> [http://esf.ccarh.org/ismir2007/MuseData_Testset1_07.zip Download] (< 1 MB) You can also read a detailed description of the [http://esf.ccarh.org/ismir2007/2_Contents_MuseData_TestSet1.pdf contents].
  
 
== Evaluation and ground truth ==
 
== Evaluation and ground truth ==

Revision as of 16:11, 25 July 2007

Overview

This page is devoted to discussions of the MIREX 07 Symbolic Melodic Similarity contest. Discussions on the MIREX 07 Symbolic Melodic Similarity contest planning list will be briefly digested on this page. A full digest of the discussions is available to subscribers from the MIREX 07 Symbolic Melodic Similarity contest planning list archives. You can subscribe to this list to participate in the discussion.

Additionally, you can read information about the Symbolic Melodic Similarity tasks that were run in the 2005 and 2006 MIREX editions.

Task description

Retrieve the most similar items from a collection of symbolic documents, given a query, and rank them by melodic similarity. The following tasks could be defined this year:

Task 1: Monophonic to monophonic. Both the query and the documents in the collection will be monophonic.

Task 2: Monophonic to polyphonic. The documents will be polyphonic (i.e. can have simultaneous notes), but the query will still be monophonic.

Task 3: Polyphonic to polyphonic. Both the query and documents will be polyphonic.

For now, the description of these tasks is intentionally open; the details are to be determined in the discussion section. Also note that the realization of these tasks is subject to the numbers of participants interested in each task.

Data

Task 1

  • 5,274 tunes belonging to the Essen folksong collection. The tunes are in standard MIDI file format. Download (< 1 MB)

Task 2

  • Collection 1: 10,000 mostly polyphonic MIDI files crawled from the Web. Download (68 MB)
  • Collection 2: 500 MIDI files of classical works and individual movements. These files have two sources: the MuseData database and the Mutopia Project library.
    • MuseData files
      Important notice: This data comes from CCARH and it is used under a license signed by me (Carlos G├│mez). The data can be downloaded from the link below, but it may not be redistributed. See the full text of the license.
      Download (< 1 MB) You can also read a detailed description of the contents.

Evaluation and ground truth

The same method for building the ground truth as last year can be used. This method has the advantage that no ground truth needs to be built in advance. After the algorithms have been submitted, their results are pooled for every query, and human evaluators are asked to judge the relevance of the matches for some queries. To make this evaluation feasible, it is important that the algorithms do not only return the names of the matching MIDI files for task 2 and 3, but also where the matching fragment starts and ends in the matching MIDI file.

Potential participants

If you think there is a slight chance that you might consider participating, please add your name here. Please indicate as well in which tasks you wish to participate.

  • Carlos G├│mez (monophonic-to-monophonic)
  • Rainer Typke (monophonic to monophonic and monophonic to polyphonic, maybe polyphonic-polyphonic)
  • Ean Nugent (monophonic-to-monophonic and maybe monophonic-to-polyphonic) (in_christ_by_grace@yahoo.com)
  • Sven Ahlb├ñck (monophonic-to-monophonic and maybe monophonic-to-polyphonic)

Discussion

Comments from Carlos G├│mez

For the monophonic task, an interesting variation this year would be to use a collection of different source than RISM. As RISM snippets have been used in the two previous competitions, past participants can have more interest in participating this year if data different in some aspect is used. These are some possible sources for the collection this year:

  • The Themefinder database (www.themefinder.org) is comprised of three collections, which consist of classical themes, folksong themes and renaissance incipits. Incipits in the renaissance collection are tagged with their RISM number, so this collection is probably similar to the previously ones used, but the other two collections could be tried. The difference lies in that the music in those collections comes from other periods or genres, but the format is still the same (around 15 notes fragments).
  • The Meldex (NZDL) database contains around 10,000 melodies [1], which appear to be longer on average than the Themefinder and RISM snippets.
  • A perhaps less feasible option is that there exists a digital version of the Barlow and Morgenstern dictionary of musical themes, that can be browsed at www.multimedialibrary.com. This database is copyrighted by the publishers of the book and the authors of the website, but we could try to ask permission to use it.
  • Another large monophonic database is HymnQuest.

These references where taken from the list of candidate music IR test collections maintained by Donald Byrd.

Comments from Rainer Typke

  • I would suggest, just for the sake of clarity, to add the word "melodic" to the task description: "...Retrieve the most similar items from a collection of symbolic documents, given a query, and rank them by *melodic* similarity." I think this is intended with the sentence anyways.
  • Would it possibly make sense to merge this task with the Query by Singing/Humming task? This could save everybody some work and also lead to interesting combinations of different algorithms.