2008:Audio Artist Identification
Contents
Audio Artist Identification
Status
A provisional specification of the artist identification task is detailed below. This proposal may be refined based on feedback from the participants.
Note that audio artist identification algorithms have been evaluated at ISMIR 2004 and MIREX 2005 and MIREX 2007. However, there was no artist identification task in 2006.
Please feel free to edit this page but please conduct discussion of the task format and evaluation on the MRX-COM00 mailing list (List interface).
Data
The collection used at MIREX 2007 will be re-used. Suggestions for alternative datasets are welcomed.
Collection statistics: 3150 30-second 22.05kHz mono wav audio clips drawn from 105 artists (30 clips per artist drawn from 3 albums).
Evaluation
Participating algorithms will be evaluated with 3-fold cross validation. Album filtering will be used the test and training splits, I.e. training and test sets will contain tracks from different albums.
The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.
Otherwise standard techniques used to evaluate classification performances will be used. (Including McNemars test and Friedman's ANOVA). Additional proposals for statistical significance testing are more than welcome.
In addition computation times for feature extraction and training/classification will be measured.
Submission format
Submission to this task will have to conform to a specified format detailed below.
Audio formats
Participating algorithms will have to read audio in the following format:
- Sample rate: 22 KHz
- Sample size: 16 bit
- Number of channels: 1 (mono)
- Encoding: WAV
Implementation details
Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique file names will be assigned to each audio track.
The audio files to be used in the task will be specified in a simple ASCII list file. For feature extraction and classification this file will contain one path per line with no header line. For model training this file will contain one path per line, followed by a tab character and the artist label, again with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.
Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.
Hence, participants should provide two executables or command line parameters for a single executable to run the two separate processes.
Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants should attempt to use parallelism where-ever possible. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.
I/O formats
In this section the input and output files used in this task are described as are the command line calling format requirements for submissions.
Feature extraction list file
The list file passed for feature extraction will a simple ASCII list file. This file will contain one path per line with no header line.
Training list file
The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.
E.g. <example path and filename>\t<artist classification>
Test (classification) list file
The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.
Classification output file
Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.
E.g. <example path and filename>\t<genre classification>
The path to which this list file should be written must be accepted as a parameter on the command line.
Example submission calling formats
extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
extractFeatures.sh -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt TrainAndClassify.sh -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt Train.sh /path/to/scratch/folder /path/to/trainListFile.txt Classify.sh /path/to/testListFile.txt /path/to/outputListFile.txt
myAlgo.sh -extract -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt myAlgo.sh -TrainAndClassify -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt myAlgo.sh -classify /path/to/testListFile.txt /path/to/outputListFile.txt
Packaging submissions
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). IMIRSEL should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
All submissions should include a README file including the following the information:
- Command line calling format for all executables
- Number of threads/cores used or whether this should be specified on the command line
- Expected memory footprint
- Expected runtime
- Any required environments (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
Time and hardware limits
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.
A hard limit of 24 hours will be imposed on feature extraction times.
A hard limit of 24 hours will be imposed on each training/classification cycle, leading to a total runtime limit of 72 hours.
Submission opening date
7th August 2007 - provisional
Submission closing date
TBA