Difference between revisions of "2006:Symbolic Melodic Similarity"

From MIREX Wiki
(Proposed tasks)
(Some points for discussion)
Line 34: Line 34:
 
Suggestion: Pool the top M results of all participating algorithms and let every participant judge the relevance of the matches for some queries.
 
Suggestion: Pool the top M results of all participating algorithms and let every participant judge the relevance of the matches for some queries.
 
To make that a manageable burden, it is important that the algorithms do not only return the names of the matching MIDI files for tasks 2 and 3, but also where the matching bit starts and ends in the matching MIDI file.
 
To make that a manageable burden, it is important that the algorithms do not only return the names of the matching MIDI files for tasks 2 and 3, but also where the matching bit starts and ends in the matching MIDI file.
 
=== Some points for discussion ===
 
 
- We would need to establish ground truths for both data sets.
 
 
- For the RISM dataset, we already have some software that facilitates creating a ground truth: a filtering script that can help with the task of selecting approximately 50 candidates for matches for each query (to be presented to human experts), and another web-based script for collecting human experts' opinions on what an ideal search result would be, and for consolidating those opinions.
 
 
- Selecting 50 candidates out of 35,000 to be shown to human experts is error-prone - we might well miss some incipits that would be true positives. We need to decide before the competition how we are going to correct such problems that might show up only when some submitted algorithm finds an incipit that should reasonably be classified as a true positive but was accidentally excluded in the filtering process for the ground truth.
 
 
- For the not yet existing polyphonic dataset, we also don't have any ground truth-related tools yet.
 
 
- Establishing a ground truth is a lot of work, and it would be desirable to split it in some way, for example among the participants of the competition.
 
 
Please share your thoughts!
 
 
Best regards,
 
 
Anna Pienimäki and Rainer Typke
 

Revision as of 09:35, 16 May 2006

Overview

This page is devoted to discussions of The MIREX06 Symbolic Melodic Similarity contest. Discussions on the MIREX 06 Symbolic Melodic Similarity contest planning list will be briefly digested on this page. A full digest of the discussions is available to subscribers from the MIREX 06 Symbolic Melodic Similarity contest planning list archives.

Task suggestion: Symbolic Melodic Similarity

We are willing to organize another symbolic melodic similarity task.

Proposed tasks

1. Retrieve the most similar incipits from the UK subset of the RISM A/II collection (about 15,000 incipits), given one of the incipits as a query, and rank them by melodic similarity. Both the query and the collection are monophonic.

2. Like task 1, but with a collection of mostly polyphonic MIDI files to be searched for matches. The query would still be monophonic. The MIDI files 10,000 randomly picked files from a collection of about 60,000 MIDI files that were harvested from the Web. They include different genres (Western and Asian popular music, classical music, ringtones, just to name a few).

3. Like task 2, but with a smaller, more focused collection: about 1000 .kar files (Karaoke MIDI files) with mostly Western popular music.


Inputs/Outputs: Task 1: Input: about 15,000 MIDI files containing mostly monophonic incipits, and a MIDI file containing the monophonic query. Expected Output: a list of the names of the X most similar matching MIDI files, ordered by similarity. (the value of X is to be decided)

Task 2: Input: about 10,000 mostly polyphonic MIDI files plus a MIDI file containing a monophonic query. Output: a list of the X most similar file names, ordered by similarity, plus for each file the time (offset from the beginning in seconds) where the query matches and where the matching bit ends.

Task 3: Like task 2.


Building of the ground truth

Unlike last year, it is now nearly impossible to manually build a proper ground truth in advance.

Suggestion: Pool the top M results of all participating algorithms and let every participant judge the relevance of the matches for some queries. To make that a manageable burden, it is important that the algorithms do not only return the names of the matching MIDI files for tasks 2 and 3, but also where the matching bit starts and ends in the matching MIDI file.