Difference between revisions of "2010:Symbolic Melodic Similarity"

From MIREX Wiki
(Submission opening date)
(Modifying task calling formats and IO)
Line 27: Line 27:
 
== Submission Format ==  
 
== Submission Format ==  
  
 +
=== I/O formats ===
 +
In this section the input and output files used in this task are described as are the command line calling format requirements for submissions.
  
=== Input ===
 
  
Parameters:<br/>
+
==== Collection and Query list files ====
- the name of a directory containing about 5,000 MIDI files containing monophonic folk songs and <br/>
+
UTF8 text list file containing paths to the files to be used as the collection or as queries (one path per line).
- the name of one MIDI file containing a monophonic query.
 
  
E.g.
+
e.g.
myAlgo.sh /path/to/folder/withMIDIfile/ /path/to/query.mid
+
  /aDirectory/collectionFolder/b002342.wav
 +
  /aDirectory/collectionFolder/a005921.wav
 +
  ...
  
  
 +
=== Output file ===
 +
The name of the query file, followed by a list of the 10 most similar matching MIDI files, ordered by melodic similarity.  Write one query per line.
  
The program will be called once for each query.
+
E.g.
 +
query1.mid song242.mid song213.mid song1242.mid ...
 +
query2.mid song5454.mid song423.mid song454.mid ...
 +
...
  
  
=== Output ===
+
=== Command line calling format ===
 +
Parameters:<br/>
 +
- path to a list file containing paths 5,000 MIDI files encoding collection to be searched.
 +
- path to a list file containing paths to the files to be used as queries (one path per line).
 +
- (optional) path to scratch directory.
 +
- path to write the output file to.
  
A list of the names of the 10 most similar matching MIDI files, ordered by melodic similarity. Write the file name in separate lines, without empty lines in between.
+
E.g.
 +
  doSMS.sh -numThreads 4 /path/to/scratch/folder /path/to/collectionListFile.txt /path/to/queryFile.txt /path/to/outputResultsFile.txt
 +
 
 +
The program will be called only once to index the collection and perform the queries.
  
E.g.
 
query1.mid song242.mid song213.mid song1242.mid ...
 
query2.mid song5454.mid song423.mid song454.mid ...
 
...
 
  
 
=== Packaging submissions ===
 
=== Packaging submissions ===
 
 
* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
 
* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
 
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
 
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
Line 72: Line 82:
 
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.
 
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.
  
A hard limit of 24 hours will be imposed on feature extraction times.
+
A hard limit of 48 hours total runtime will be imposed on each submission. Submissions exceeding this runtime may not receive a result.
 
 
A hard limit of 48 hours will be imposed on the 3 training/classification cycles, leading to a total runtime limit of 72 hours for each submission.
 
  
  

Revision as of 19:48, 24 June 2010

Description

The goal of SMS is to retrieve the most similar items from a collection of symbolic pieces, given a symbolic query, and rank them by melodic similarity. There will be only 1 task this year which comprises a set of six "base" monophonic MIDI queries to be matched against a monophonic MIDI collection.

Each system will be given a query and is asked to return the 10 most melodically similar songs from those taken from the Essen Collection (5274 pieces in the MIDI format; see ESAC Data Homepage for more information). For each of the six "base" queries, we have created four classes of error-mutations, thus the query set comprises the following query classes:

  1. No errors (i.e., "base")
  2. One note deleted
  3. One note inserted
  4. One interval enlarged
  5. One interval compressed

Each system will be asked to return the top ten items for each of the 30 total queries. That is to say, 6(base queries) X 5(versions) = 30 query/candidate lists to be returned.


Data

  • 5,274 tunes belonging to the Essen folksong collection. The tunes are in standard MIDI file format. Download (< 1 MB)


Evaluation

The 2010 SMS task replicates the 2007 task. After the algorithms have been submitted, their results will be pooled for every query, and human evaluators, using the Evalutron 6000 system, will asked to judge the relevance of the matches to the queries.

For each query (and its four mutations), the returned results (candidates) from all systems will be anonymously grouped together (query set) for evaluation by the human graders. The graders will be provided with only the "base" perfect version against which to evaluate the candidates and thus did not know whether the candidates came from a perfect or mutated query. We expect that each query/candidate set will be evaluated by one individual grader. Using the Evalutron 6000 system, the graders will give each query/candidate pair two types of scores. Graders will be asked to provide one "BROAD" categorical score with three categories: NS,SS,VS as explained below, and one "FINE" score (in the range from 0 to 10).

For more information, do take a look at the 2007 SMS Results Page.

Submission Format

I/O formats

In this section the input and output files used in this task are described as are the command line calling format requirements for submissions.


Collection and Query list files

UTF8 text list file containing paths to the files to be used as the collection or as queries (one path per line).

e.g.

 /aDirectory/collectionFolder/b002342.wav
 /aDirectory/collectionFolder/a005921.wav
 ...


Output file

The name of the query file, followed by a list of the 10 most similar matching MIDI files, ordered by melodic similarity. Write one query per line.

E.g.

query1.mid song242.mid song213.mid song1242.mid ...
query2.mid song5454.mid song423.mid song454.mid ...
...


Command line calling format

Parameters:
- path to a list file containing paths 5,000 MIDI files encoding collection to be searched. - path to a list file containing paths to the files to be used as queries (one path per line). - (optional) path to scratch directory. - path to write the output file to.

E.g.

  doSMS.sh -numThreads 4 /path/to/scratch/folder /path/to/collectionListFile.txt /path/to/queryFile.txt /path/to/outputResultsFile.txt

The program will be called only once to index the collection and perform the queries.


Packaging submissions

  • All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). IMIRSEL should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
  • Be sure to follow the Best Coding Practices for MIREX
  • Be sure to follow the MIREX 2010 Submission Instructions

All submissions should include a README file including the following the information:

  • Command line calling format for all executables including examples
  • Number of threads/cores used or whether this should be specified on the command line
  • Expected memory footprint
  • Expected runtime
  • Approximately how much scratch disk space will the submission need to store any feature/cache files?
  • Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
  • Any special notice regarding to running your algorithm

Note that the information that you place in the README file is extremely important in ensuring that your submission is evaluated properly.

Time and hardware limits

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 48 hours total runtime will be imposed on each submission. Submissions exceeding this runtime may not receive a result.


Submission opening date

Friday 4th June 2010

Submission closing date

TBA