2024:Cover Song Identification
Contents
Description
This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or "cover songs".
Within the a collection of pieces in the cover song datasets, there are embedded a number of different "original songs" or compositions each represented by a number of different "versions". The "cover songs" or "versions" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.
Using each of these version files in turn as as the "seed/query" file, we examine the returned ranked lists of items from each algorithm for the presence of the other versions of the "seed/query" file.
Data
For this contest, we have created a new in-house cover song dataset (Mirex2024 CSI) to minimize overlap with publicly available datasets, such as SHS100K and Da-Tacos. This dataset was processed using an existing CSI model to ensure minimal duplication from these public datasets.
This dataset consists of 1000 tracks representing 80 distinct songs, with each song having multiple cover versions, spanning various genres and styles (e.g., classical, jazz, gospel, rock, folk-rock, etc.). Each of these songs is represented by several different versions, allowing the evaluation of algorithms across a diverse range of musical interpretations.
Collection statistics:
16-bit, monophonic, 22.05kHz, WAV format Size: 1000 tracks Queries: 1000 tracks (since each track can be a query)
Evaluation
The following evaluation metrics will be computed for each submission:
- Mean (arithmetic) of Avg. Precisions
- Mean rank of first correctly identified cover
Runtime performance
In addition computation times for feature extraction and training/classification will be measured.
Submission Format
Submission to this task will have to conform to a specified format detailed below.
Implementation details
Scratch folders will be provided for all submissions for the storage of feature files and any model or index files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.
The audio files to be used in the task will be specified in a simple ASCII list file. This file will contain one path per line with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.
Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants could attempt to use parrallelism. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.
I/O formats
Input Files
The feature extraction list file format will be of the form:
/path/to/audio/file/000.wav\n /path/to/audio/file/001.wav\n /path/to/audio/file/002.wav\n ...
The query list file format will be very similar, taking the form, and listing a subset of files from the feature extraction list file:
/path/to/audio/file/182.wav\n /path/to/audio/file/245.wav\n /path/to/audio/file/432.wav\n ...
For a total of <number of queries> rows -- query ids are assigned from the pool of <number of candidates> collection ids and should match the ids within the candidate collection.
Lines will be terminated by a '\n' character.
Output File
The only output will be a distance matrix file that is <number of queries> rows by <number of candidates> columns in the following format:
Distance matrix header text with system name 1\t</path/to/audio/file/track1.wav> 2\t</path/to/audio/file/track2.wav> 3\t</path/to/audio/file/track3.wav> 4\t</path/to/audio/file/track4.wav> ... N\t</path/to/audio/file/trackN.wav> Q/R\t1\t2\t3\t4\t...\tN 1\t<dist 1 to 1>\t<dist 1 to 2>\t<dist 1 to 3>\t<dist 1 to 4>\t...\t<dist 1 to N> 3\t<dist 3 to 2>\t<dist 3 to 2>\t<dist 3 to 3>\t<dist 3 to 4>\t...\t<dist 3 to N>
where N is <number of candidates> and the queries are drawn from this set (and bear the same track indexes if possible).
which might look like:
Example distance matrix 0.1 1 /path/to/audio/file/track1.wav 2 /path/to/audio/file/track2.wav 3 /path/to/audio/file/track3.wav 4 /path/to/audio/file/track4.wav 5 /path/to/audio/file/track5.wav Q/R 1 2 3 4 5 1 0.00000 1.24100 0.2e-4 0.42559 0.21313 3 50.2e-4 0.62640 0.00000 0.38000 0.15152
Note that indexes of the queries refer back to the track list at the top of the distance matrix file to identify the query track. However, as long as you ensure that the query songs are listed in exactly the same order as they appear in the query list file you are passed we will be able to interpret the data.
All distances should be zero or positive (0.0+) and should not be infinite or NaN. Values should be separated by a TAB.
To summarize, the distance matrix should be preceded by a system name, <number of candidates> rows of file paths and should be composed of <number of candidates> columns of distance (separated by tab characters) and <number of queries> rows (one for each original track query). Each row corresponds to a particular query song (the track to find covers of).
Command Line Calling Format
/path/to/submission <collection_list_file> <query_list_file> <working_directory> <output_file> <collection_list_file>: Text file containing <number of candidates> full path file names for the <number of candidates> audio files in the collection (including the <number of queries> query documents). Example: /path/to/coversong/collection.txt <query_list_file> : Text file containing the <number of queries> full path file names for the <number of queries> query documents. Example: /path/to/coversong/queries.txt <working_directory> : Full path to a temporary directory where submission will have write access for caching features or calculations. Example: /tmp/submission_id/ <output_file> : Full path to file where submission should output the similarity matrix (<number of candidates> header rows + <number of queries> x <number of candidates> data matrix). Example: /path/to/coversong/results/submission_id.txt
E.g.
/path/to/m/submission.sh /path/to/feat_extract_file.txt /path/to/query_file.txt /path/to/scratch/dir /path/to/output_file.txt
Packaging submissions
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).
All submissions should include a README file including the following the information:
- Command line calling format for all executables and an example formatted set of commands
- Number of threads/cores used or whether this should be specified on the command line
- Expected memory footprint
- Expected runtime
- Any required environments (and versions), e.g. python, java, bash, matlab.
Time and hardware limits
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.
A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.