2010:Audio Cover Song Identification
Contents
2010 AUDIO COVER SONG IDENTIFICATION TASK OVERVIEW
The text of this section is copied from the 2009 page. Please add your comments and discussions for 2010.
The Audio Cover Song task was a new task for MIREX 2006 and was last run in 2008. It was closely related to the 2010:Audio Music Similarity and Retrieval (AMS) task as the cover songs were embedded in the Audio Music Similarity and Retrieval test collection.
Description
Within the a collection of pieces in the cover song datasets, there are embedded a number of different "original songs" or compositions each represented by a number of different "versions". The "cover songs" or "versions" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.
Using each of these version files in turn as as the "seed/query" file, we will examine the returned lists of items for the presence of the other versions of the "seed/query" file.
On top of the previous Audio Cover Song dataset, we are going to use the Mazurka dataset. We are going to randomly choose 11 versions from 49 mazurkas and run it as a separate subtask. The I/O format will be the same as previous years. Systems will return a distance matrix of 539x539.
Task specific mailing list
A specific mailing list is provided for the discussion of this task and related tasks ( 2010:Audio Classification (Test/Train) tasks, 2010:Audio_Cover_Song_Identification, 2010:Audio_Tag_Classification, 2010:Audio_Music_Similarity_and_Retrieval) at: https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00. If you wish to participate in any of these tasks please sign up to this mailing listas discussion of the task format and evaluation should be conducted there.
Data
Two datasets will be used to evaluate cover song identification:
US Pop Music Collection (aka Mixed Collection)
This is the "original" ACS collection. Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different "cover songs" each represented by 11 different "versions" for a total of 330 audio files.
Using each of these cover song files in turn as as the "seed/query" file, we will examine the returned lists of items for the presence of the other 10 versions of the "seed/query" file.
Collection statistics:
- 16bit, monophonic, 22.05khz, wav
- The "cover songs" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.
- Size: 1000 tracks
- Queries: 330 tracks
Sapp's Mazurka Collection Information
In addition to our original ACS dataset, we used the Mazurka.org dataset put together by Craig Sapp. We randomly chose 11 versions from 49 mazurkas and ran it as a separate ACS subtask. Systems should return a distance matrix of 539x539 from which we located the ranks of each of the associated cover versions.
Collection statistics:
- 16bit, monophonic, 22.05khz, wav
- Size: 539 tracks
- Queries: 539 tracks
Evaluation
The following evaluation metrics will be computed for each submission:
- Total number of covers identified in top 10
- Mean number of covers identified in top 10 (average performance)
- Mean (arithmetic) of Avg. Precisions
- Mean rank of first correctly identified cover
Ranking and significance testing
Friedman's ANOVA with Tukey-Kramer HSD will be run against the Average Precision summary data over the individual song groups to assess the significance of differences in performance and to rank the performances.
For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see:
@InProceedings{jones2007hsj, title={"Human Similarity Judgements: Implications for the Design of Formal Evaluations"}, author="M.C. Jones and J.S. Downie and A.F. Ehmann", BOOKTITLE ="Proceedings of ISMIR 2007 International Society of Music Information Retrieval", year="2007" }
Runtime performance
In addition computation times for feature extraction and training/classification will be measured.
Submission Format
Submission to this task will have to conform to a specified format detailed below.
Implementation details
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.
The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.
Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.
I/O formats
Input Files
The feature extraction list file format will be of the form:
/path/to/audio/file/000.wav\n /path/to/audio/file/001.wav\n /path/to/audio/file/002.wav\n ...
The query list file format will be very similar, taking the form, and listing a subset of files from the feature extraction list file:
/path/to/audio/file/182.wav\n /path/to/audio/file/245.wav\n /path/to/audio/file/432.wav\n ...
For a total of <number of queries> rows -- query ids are assigned from the pool of <number of candidates> collection ids and should match the ids within the candidate collection.
Lines will be terminated by a '\n' character.
Output File
The only output will be a distance matrix file that is <number of queries> rows by <number of candidates> columns in the following format:
Distance matrix header text with system name 1\t</path/to/audio/file/track1.wav> 2\t</path/to/audio/file/track2.wav> 3\t</path/to/audio/file/track3.wav> 4\t</path/to/audio/file/track4.wav> ... N\t</path/to/audio/file/trackN.wav> Q/R\t1\t2\t3\t4\t...\tN 1\t<dist 1 to 1>\t<dist 1 to 2>\t<dist 1 to 3>\t<dist 1 to 4>\t...\t<dist 1 to N> 3\t<dist 3 to 2>\t<dist 3 to 2>\t<dist 3 to 3>\t<dist 3 to 4>\t...\t<dist 3 to N>
where N is <number of candidates> and the queries are drawn from this set (and bear the same track indexes if possible).
which might look like:
Example distance matrix 0.1 1 /path/to/audio/file/track1.wav 2 /path/to/audio/file/track2.wav 3 /path/to/audio/file/track3.wav 4 /path/to/audio/file/track4.wav 5 /path/to/audio/file/track5.wav Q/R 1 2 3 4 5 1 0.00000 1.24100 0.2e-4 0.42559 0.21313 3 50.2e-4 0.62640 0.00000 0.38000 0.15152
Note that indexes of the queries refer back to the track list at the top of the distance matrix file to identify the query track. However, as long as you ensure that the query songs are listed in exactly the same order as they appear in the query list file you are passed we will be able to interpret the data.
All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a TAB.
To summarize, the distance matrix should be preceded by a system name, <number of candidates> rows of file paths and should be composed of <number of candidates> columns of distance (separated by tab characters) and <number of queries> rows (one for each original track query). Each row corresponds to a particular query song (the track to find covers of).
Command Line Calling Format
/path/to/submission <collection_list_file> <query_list_file> <working_directory> <output_file> <collection_list_file>: Text file containing <number of candidates> full path file names for the <number of candidates> audio files in the collection (including the <number of queries> query documents). Example: /path/to/coversong/collection.txt <query_list_file> : Text file containing the <number of queries> full path file names for the <number of queries> query documents. Example: /path/to/coversong/queries.txt <working_directory> : Full path to a temporary directory where submission will have write access for caching features or calculations. Example: /tmp/submission_id/ <output_file> : Full path to file where submission should output the similarity matrix (<number of candidates> header rows + <number of queries> x <number of candidates> data matrix). Example: /path/to/coversong/results/submission_id.txt
E.g.
/path/to/m/submission.sh /path/to/feat_extract_file.txt /path/to/query_file.txt /path/to/scratch/dir /path/to/output_file.txt
Packaging submissions
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).
All submissions should include a README file including the following the information:
- Command line calling format for all executables and an example formatted set of commands
- Number of threads/cores used or whether this should be specified on the command line
- Expected memory footprint
- Expected runtime
- Any required environments (and versions), e.g. python, java, bash, matlab.
Time and hardware limits
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.
A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.
Submission opening date
TBA
Submission closing date
TBA