Difference between revisions of "2007:Audio Artist Identification"

From MIREX Wiki
(Participants)
m (Robot: Automated text replacement (-\[\[([A-Z][^:]+)\]\] +2007:\1))
 
(14 intermediate revisions by 7 users not shown)
Line 1: Line 1:
 
== Status ==
 
== Status ==
This is only a very basic draft version of a task proposal. Once more people show interest we can fill in the details.
+
A provisional specification of the artist identification task is detailed below. This proposal may be refined based on feedback from the particpants.
  
 
Note that audio artist identifcation algorithms have been evaluated at ISMIR 2004 and MIREX 2005. However, there was no artist identification task in 2006.
 
Note that audio artist identifcation algorithms have been evaluated at ISMIR 2004 and MIREX 2005. However, there was no artist identification task in 2006.
  
 
Related MIREX 2007 task proposals:  
 
Related MIREX 2007 task proposals:  
* [[Audio Music Mood Classification]]
+
* [[2007:Audio Music Mood Classification]]
* [[Audio Artist Similarity]]
+
* [[2007:Audio Music Similarity and Retrieval]]
* [[Audio Music Similarity and Retrieval]]
+
* [[2007:Audio Genre Classification]]
* [[Audio Genre Classification]]
 
  
 
Please feel free to edit this page.
 
Please feel free to edit this page.
  
 
== Data ==
 
== Data ==
The data used for last year's audio similarity retrieval task (USPOP + USCRAP) could be used. In addition, the Magnatune data used for the ISMIR 2004 genre classification contest could be used.  
+
Collection statistics: 3150 30-second 22.05kHz mono wav audio clips drawn from 105 artists (30 clips per artist drawn from 3 albums).
  
Please edit this if you have suggestions to add or if you disagree.
+
== Evaluation ==
 +
Participating algorithms will be evaluated with 3-fold cross validation. Album filtering will be used the test and training splits, I.e. training and test sets will contain tracks from different albums.
 +
 
 +
The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.
 +
 
 +
Otherwise standard techniques used to evaluate classification performances will be used. (Including techniques to estimate error bars or statistical significance.) Further, proposals for statistical significane testing are more than welcome.
 +
 
 +
In addition computation times for feature extraction and training/classification will be measured.
 +
 
 +
== Submission format ==
 +
Submission to this task will have to conform to a specified format detailed
 +
below.
 +
 
 +
=== Audio formats ===
 +
Participating algorithms will have to read audio in the following format:
 +
 
 +
* Sample rate: 22 KHz
 +
* Sample size: 16 bit
 +
* Number of channels: 1 (mono)
 +
* Encoding: WAV
 +
 
 +
=== Implementation details ===
 +
Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.
 +
 
 +
The audio files to be used in the task will be specified in a simple ASCII list file. For feature extraction and classification this file will contain one path per line with no header line. For model training this file will contain one path per line, followed by a tab character and the artist label, again with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.
 +
 
 +
Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.
 +
 
 +
Hence, particpants should provide two executables or command line parameters for a single executable to run the two separate processes.
 +
 
 +
Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants should attempt to use parrallelism where-ever possible. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.
 +
 
 +
=== I/O formats ===
 +
In this section the input and output files used in this task are described as are the command line calling format requirements for submissions.
 +
 
 +
 
 +
==== Feature extraction list file ====
 +
The list file passed for feature extraction will a simple ASCII list
 +
file. This file will contain one path per line with no header line.
 +
 
 +
==== Training list file ====
 +
The list file passed for model training will be a simple ASCII list
 +
file. This file will contain one path per line, followed by a tab character and
 +
the artist label, again with no header line.
 +
 
 +
E.g. <example path and filename>\t<artist classification>
 +
 
 +
==== Test (classification) list file ====
 +
The list file passed for testing classification will be a simple ASCII list
 +
file identical in format to the Feature extraction list file. This file will
 +
contain one path per line with no header line.
 +
 
 +
==== Classification output file ====
 +
Participating algorithms should produce a simple ASCII list file identical in
 +
format to the Training list file. This file will contain one path per line,
 +
followed by a tab character and the artist label, again with no header line.
 +
 
 +
E.g. <example path and filename>\t<genre classification>
 +
 
 +
The path to which this list file should be written must be accepted as a
 +
parameter on the command line.
 +
 
 +
==== Example submission calling formats ====
 +
 
 +
  extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 +
  TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
 +
 
 +
  extractFeatures.sh -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 +
  TrainAndClassify.sh -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
 +
 
 +
  extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 +
  Train.sh /path/to/scratch/folder /path/to/trainListFile.txt
 +
  Classify.sh /path/to/testListFile.txt /path/to/outputListFile.txt
 +
 
 +
  myAlgo.sh -extract -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 +
  myAlgo.sh -TrainAndClassify -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
 +
 
 +
  myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 +
  myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt
 +
  myAlgo.sh -classify /path/to/testListFile.txt /path/to/outputListFile.txt
 +
 
 +
=== Packaging submissions ===
 +
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).
 +
 
 +
All submissions should include a README file including the following the information:
 +
 
 +
* Command line calling format for all executables
 +
* Number of threads/cores used or whether this should be specified on the command line
 +
* Expected memory footprint
 +
* Expected runtime
 +
* Any required environments (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
 +
 
 +
 
 +
 
 +
== Time and hardware limits ==
 +
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions will be specified.
 +
 
 +
A hard limit of 24 hours will be imposed on feature extraction times.
 +
 
 +
A hard limit of 24 hours will be imposed on each training/classificaiton cycle, leading to a total runtime limit of 72 hours.
 +
 
 +
== Submission opening date ==
 +
14th August 2007 - provisional
  
== Evaluation ==
+
== Submission closing date ==
The same procedures used in 2005 could be used. Standard techniques can be used to compute error bars or the statistical significance of differences between algorithms.
+
28th August 2007 - provisional
In addition to identification accuracies computation times will be measured.
+
 
 +
 
 +
== Audio format poll ==
 +
 
 +
<poll>
 +
Use clips from tracks for analysis to reduce processing load (and perhaps increase size of dataset)?
 +
Yes
 +
No
 +
</poll>
  
As Magnatune and USPOP are freely available overfitting is possible. More interesting than the final ranking will be the accompanying papers in which the participants describe their work.
+
<poll>
 +
What is your preferred clip length if we do end up using clips?
 +
30 secs
 +
60 secs
 +
90 secs
 +
120 secs
 +
</poll>
  
Please edit this if you have suggestions to add or if you disagree.
+
<poll>
 +
What is your preferred audio format? Remember that the less audio data we have to process the larger the dataset can be...
 +
22 khz mono WAV
 +
22 khz stereo WAV
 +
44 khz mono WAV
 +
44 khz stereo WAV
 +
22 khz mono MP3 128kb
 +
22 khz stereo MP3 128kb
 +
44 khz mono MP3 128kb
 +
44 khz stereo MP3 128kb
 +
</poll>
  
 
== Participants ==
 
== Participants ==
Line 33: Line 158:
 
* Tim Pohle (''firstname.lastname''@jku.at)
 
* Tim Pohle (''firstname.lastname''@jku.at)
 
* Kris West (kw at cmp dot uea dot ac dot uk)
 
* Kris West (kw at cmp dot uea dot ac dot uk)
 +
* James Bergstra (bergstrj at iro umontreal ca)
 +
* Vitor Soares (''firstname.lastname''@clustermedialabs.com)
 
* ...
 
* ...

Latest revision as of 15:57, 13 May 2010

Status

A provisional specification of the artist identification task is detailed below. This proposal may be refined based on feedback from the particpants.

Note that audio artist identifcation algorithms have been evaluated at ISMIR 2004 and MIREX 2005. However, there was no artist identification task in 2006.

Related MIREX 2007 task proposals:

Please feel free to edit this page.

Data

Collection statistics: 3150 30-second 22.05kHz mono wav audio clips drawn from 105 artists (30 clips per artist drawn from 3 albums).

Evaluation

Participating algorithms will be evaluated with 3-fold cross validation. Album filtering will be used the test and training splits, I.e. training and test sets will contain tracks from different albums.

The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.

Otherwise standard techniques used to evaluate classification performances will be used. (Including techniques to estimate error bars or statistical significance.) Further, proposals for statistical significane testing are more than welcome.

In addition computation times for feature extraction and training/classification will be measured.

Submission format

Submission to this task will have to conform to a specified format detailed below.

Audio formats

Participating algorithms will have to read audio in the following format:

  • Sample rate: 22 KHz
  • Sample size: 16 bit
  • Number of channels: 1 (mono)
  • Encoding: WAV

Implementation details

Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique filenames will be assigned to each audio track.

The audio files to be used in the task will be specified in a simple ASCII list file. For feature extraction and classification this file will contain one path per line with no header line. For model training this file will contain one path per line, followed by a tab character and the artist label, again with no header line. Executables will have to accept the path to these list files as a command line parameter. The formats for the list files are specified below.

Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.

Hence, particpants should provide two executables or command line parameters for a single executable to run the two separate processes.

Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants should attempt to use parrallelism where-ever possible. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.

I/O formats

In this section the input and output files used in this task are described as are the command line calling format requirements for submissions.


Feature extraction list file

The list file passed for feature extraction will a simple ASCII list file. This file will contain one path per line with no header line.

Training list file

The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.

E.g. <example path and filename>\t<artist classification>

Test (classification) list file

The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.

Classification output file

Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.

E.g. <example path and filename>\t<genre classification>

The path to which this list file should be written must be accepted as a parameter on the command line.

Example submission calling formats

  extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
  TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
  extractFeatures.sh -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
  TrainAndClassify.sh -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
  extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
  Train.sh /path/to/scratch/folder /path/to/trainListFile.txt 
  Classify.sh /path/to/testListFile.txt /path/to/outputListFile.txt
  myAlgo.sh -extract -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
  myAlgo.sh -TrainAndClassify -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
  myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt
  myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt 
  myAlgo.sh -classify /path/to/testListFile.txt /path/to/outputListFile.txt

Packaging submissions

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the information:

  • Command line calling format for all executables
  • Number of threads/cores used or whether this should be specified on the command line
  • Expected memory footprint
  • Expected runtime
  • Any required environments (and versions) such as Matlab, Java, Python, Bash, Ruby etc.


Time and hardware limits

Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions will be specified.

A hard limit of 24 hours will be imposed on feature extraction times.

A hard limit of 24 hours will be imposed on each training/classificaiton cycle, leading to a total runtime limit of 72 hours.

Submission opening date

14th August 2007 - provisional

Submission closing date

28th August 2007 - provisional


Audio format poll

<poll> Use clips from tracks for analysis to reduce processing load (and perhaps increase size of dataset)? Yes No </poll>

<poll> What is your preferred clip length if we do end up using clips? 30 secs 60 secs 90 secs 120 secs </poll>

<poll> What is your preferred audio format? Remember that the less audio data we have to process the larger the dataset can be... 22 khz mono WAV 22 khz stereo WAV 44 khz mono WAV 44 khz stereo WAV 22 khz mono MP3 128kb 22 khz stereo MP3 128kb 44 khz mono MP3 128kb 44 khz stereo MP3 128kb </poll>

Participants

If you think there is a slight chance that you might want to participate please add your name and email address here.

  • Thomas Lidy (lastname@ifs.tuwien.ac.at)
  • Francois Pachet and Pierre Roy (lastname@csl.sony.fr)
  • Elias Pampalk (firstname.lastname@gmail.com)
  • Tim Pohle (firstname.lastname@jku.at)
  • Kris West (kw at cmp dot uea dot ac dot uk)
  • James Bergstra (bergstrj at iro umontreal ca)
  • Vitor Soares (firstname.lastname@clustermedialabs.com)
  • ...