Difference between revisions of "2010:Audio Classification (Train/Test) Tasks"

From MIREX Wiki
(Data)
(Example submission calling formats)
 
(21 intermediate revisions by 3 users not shown)
Line 1: Line 1:
 
== Description ==
 
== Description ==
Many tasks in music classification can be characterized into a two-stage process: training classification models using labeled data and testing the models using new/unseen data. Therefore, we propose this "super" task which includes various audio classification tasks that follow this Train/Test process. In this year, three classification tasks are included:  
+
Many tasks in music classification can be characterized into a two-stage process: training classification models using labeled data and testing the models using new/unseen data. Therefore, we propose this "meta" task which includes various audio classification tasks that follow this Train/Test process. For MIREX 2010, five classification sub-tasks are included:  
  
 
*Audio Artist Identification
 
*Audio Artist Identification
*Audio Genre Classification
+
*Audio Classical Composer Identification
 +
*Audio US Pop Music Genre Classification
 +
*Audio Latin Music Genre Classification
 
*Audio Mood Classification
 
*Audio Mood Classification
  
All three classification tasks were conducted in previous MIREX runs (please see [[#Links to Previous MIREX Runs of These Classification Tasks]]). This page presents the evaluation of these tasks, including the datasets as well as the submission rules and formats. Please feel free to edit this page and conduct discussion of the task format and evaluation on the [mailto:mrx-com00@lists.lis.uiuc.edu MRX-COM00] mailing list ([https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com00 List interface]).
+
All five classification tasks were conducted in previous MIREX runs (please see [[#Links to Previous MIREX Runs of These Classification Tasks]]). This page presents the evaluation of these tasks, including the datasets as well as the submission rules and formats.  
 +
 
 +
 
 +
=== Task specific mailing list ===
 +
In the past we have use a specific mailing list for the discussion of this task and related tasks (e.g., [[2010:Audio Classification (Train/Test) Tasks]], [[2010:Audio Cover Song Identification]], [[2010:Audio Tag Classification]], [[2010:Audio Music Similarity and Retrieval]]). This year, however, we are asking that all discussions take place on the MIREX  [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have an question or comment, simply include the task name in the subject heading.
  
 
== Data ==
 
== Data ==
  
 
=== Audio Artist Identification ===  
 
=== Audio Artist Identification ===  
There are two datasets for this task:
+
This dataset requires algorithms to classify music audio according to the performing artist. The collection used at MIREX 2009 will be re-used.
  
1) The collection used at MIREX 2009 will be re-used. Collection statistics:  
+
Collection statistics:  
 +
* 3150 30-second 22.05kHz mono wav audio clips drawn from a collection US Pop music.
 +
* 105 artists (30 clips per artist drawn from 3 albums).
  
* 3150 30-second 22.05kHz mono wav audio clips drawn from 105 artists (30 clips per artist drawn from 3 albums).
 
  
2) The second collection is composed classical composers:
+
=== Audio Classical Composer Identification ===
 +
This dataset requires algorithms to classify music audio according to the composer of the track (drawn from a collection of performances of a variety of classical music genres). The collection used at MIREX 2009 will be re-used.
  
* 2772 30-second 22.05 kHz mono wav clips organised into 11 "classical" composers (252 clips per composer). At present the database contains tracks for:
+
Collection statistics:
 +
* 2772 30-second 22.05 kHz mono wav clips
 +
* 11 "classical" composers (252 clips per composer), including:
 
** Bach
 
** Bach
 
** Beethoven
 
** Beethoven
Line 32: Line 42:
 
** Vivaldi
 
** Vivaldi
  
=== Audio Genre Classification ===
 
This task will use two different datasets:
 
 
1) The MIREX 2007 Genre Collection:
 
 
The first collection may either be the MIREX 2007 genre classification set (details below) or a new dataset drawn from the same distribution of over 22,000 tracks. If a new set is selected it is expected to contain 10-12 genres, with between 700 and 1000 tracks per genre.
 
 
MIREX 2007 collection statistics: 7000 30-second audio clips in 22.05kHz mono WAV format drawn from 10 genres (700 clips from each genre). Genres:
 
  
* Blues
+
=== Audio US Pop Music Genre Classification ===
* Jazz
+
This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of US Pop music tracks). The MIREX 2007 Genre dataset will be re-used, which was drawn from the USPOP 2002 and USCRAP collections.
* Country/Western
 
* Baroque
 
* Classical
 
* Romantic
 
* Electronica
 
* Hip-Hop
 
* Rock
 
* HardRock/Metal
 
  
 +
Collection statistics:
 +
* 7000 30-second audio clips in 22.05kHz mono WAV format
 +
* 10 genres (700 clips from each genre), including:
 +
** Blues
 +
** Jazz
 +
** Country/Western
 +
** Baroque
 +
** Classical
 +
** Romantic
 +
** Electronica
 +
** Hip-Hop
 +
** Rock
 +
** HardRock/Metal
  
2) Latin Genre Collection:
 
  
Carlos Silla (cns2 (at) kent (dot) ac (dot) uk) has contributed a second dataset of Latin popular and dance music sourced from Brazil and hand labeled by music experts. This collection is likely to contain a greater number of styles of music that will be differentiated by rhythmic characteristics than the MIREX 2007 dataset.
+
===  Audio Latin Music Genre Classification ===
 +
This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of Latin popular and dance music, sourced from Brazil and hand labeled by music experts). Carlos Silla's (cns2 (at) kent (dot) ac (dot) uk) Latin popular and dance music dataset [http://ismir2008.ismir.net/papers/ISMIR2008_106.pdf] will be re-used. This collection is likely to contain a greater number of styles of music that will be differentiated by rhythmic characteristics than the MIREX 2007 dataset.
  
More precisely, the Latin Music Database has 3,227 audio files from 10 Latin music genres:
+
Collection statistics:
 +
* 3,227 audio files in 22.05kHz mono WAV format
 +
* 10 Latin music genres, including:
 +
** Axe
 +
** Bachata
 +
** Bolero
 +
** Forro
 +
** Gaucha
 +
** Merengue
 +
** Pagode
 +
** Sertaneja
 +
** Tango
  
* Axé
 
* Bachata
 
* Bolero
 
* Forr├│
 
* Ga├║cha
 
* Merengue
 
* Pagode
 
* Sertaneja
 
* Tango
 
  
 
=== Audio Mood Classification ===
 
=== Audio Mood Classification ===
 +
This dataset requires algorithms to classify music audio according to the mood of the track (drawn from a collection of production msuic sourced from the APM collection [http://www.apmmusic.com]). The MIREX 2007 Mood Classification dataset [http://ismir2008.ismir.net/papers/ISMIR2008_263.pdf] will be re-used.
  
The MIREX 2007 Mood Classification dataset will be used.  
+
Collection statistics:
 +
* 600 30 second audio clips in 22.05kHz mono WAV format selected from the APM collection [http://www.apmmusic.com], and labeled by human judges using the Evalutron6000 system.
 +
* 5 mood categories [http://ismir2007.ismir.net/proceedings/ISMIR2007_p067_hu.pdf] each of which contains 120 clips:
 +
**Cluster_1: passionate, rousing, confident,boisterous, rowdy
 +
**Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured
 +
**Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding
 +
**Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry
 +
**Cluster_5: aggressive, fiery,tense/anxious, intense, volatile,visceral
  
The dataset consists 600 30second audio clips selected from the APM collection (www.apmmusic.com), and labeled by human judges using the Evalutron6000 system.
 
 
There are 5 mood categories each of which contains 120 clips:
 
*Cluster_1: passionate, rousing, confident,boisterous, rowdy
 
*Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured
 
*Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding
 
*Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry
 
*Cluster_5: aggressive, fiery,tense/anxious, intense, volatile,visceral
 
  
 
== Audio Formats ==
 
== Audio Formats ==
For all three tasks, participating algorithms will have to read audio in the following format:
+
For all datasets, participating algorithms will have to read audio in the following format:
 +
 
 +
* Sample rate: 22 KHz
 +
* Sample size: 16 bit
 +
* Number of channels: 1 (mono)
 +
* Encoding: WAV
  
*Sample rate: 22 KHz
 
*Sample size: 16 bit
 
*Number of channels: 1 (mono)
 
*Encoding: WAV
 
  
 
== Evaluation ==
 
== Evaluation ==
This section first describes evaluation methods common to all the three tasks, then specifies settings unique to each of the tasks.  
+
This section first describes evaluation methods common to all the datasets, then specifies settings unique to each of the tasks.  
  
For all the three tasks, participating algorithms will be evaluated with 3-fold cross validation. For '''Artist Identification''', album filtering will be used the test and training splits, i.e. training and test sets will contain tracks from different albums; for '''Genre Classification''', artist filtering will be used the test and training splits, i.e. training and test sets will contain different artists.  
+
Participating algorithms will be evaluated with 3-fold cross validation. For '''Artist Identification''' and '''Classical Composer Classification''', album filtering will be used the test and training splits, i.e. training and test sets will contain tracks from different albums; for '''US Pop Genre Classification''' and '''Latin Genre Classification''', artist filtering will be used the test and training splits, i.e. training and test sets will contain different artists.  
  
 
The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.
 
The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.
  
Classification accuracies will be tested for statistically significant differences using two techniques:
+
Classification accuracies will be tested for statistically significant differences using Friedman's Anova with Tukey-Kramer honestly significant difference (HSD) tests for multiple comparisons. This test will be used to rank the algorithms and to group them into sets of equivalent performance.  
 
 
* McNemar's test (Dietterich, 1997) is a statistical process that can validate the significance of differences between two classifiers
 
 
 
A significance test matrix will be provided to display significant differences between algorithms at p-values of 0.05 and 0.01)
 
 
 
* Friedman's Anova with Tukey-Kramer honestly significant difference (HSD) tests for multiple comparisons. This test will be used to rank the algorithms and to group them into sets of equivalent performance.  
 
  
 
In addition computation times for feature extraction and training/classification will be measured.
 
In addition computation times for feature extraction and training/classification will be measured.
  
=== Audio Genre Classification ===
 
A hierarchical genre taxonomy will be provided to all participating algorithms. This taxonomy will have at most two or three levels depending on the collection composition.
 
 
In addition to the aforementioned measures, accuracy statistic will be computed that discounts confusion between similar classes - as was used in the MIREX 2005 audio genre task. This will be defined as follows:
 
 
* 1.0 point will be scored for correctly assigning the genre label. i.e. for a two level hierarchy correctly assigning the the labels Jazz&Blues and Blues to an example scores 1.0 point.
 
* Tracks misclassified as a class on the same branch of the genre hierarchy as the true class will score a number of points equal to the number of nodes in the hierarchy shared with the true class, divided by the length of the correct branch. I.e. in a two level hierarchy containing the following branches:
 
 
  JazzBlues, Jazz
 
  JazzBlues, Blues
 
  CountryWestern
 
  GeneralClassical, Baroque
 
  GeneralClassical, Classical
 
  GeneralClassical, Romantic
 
  Electronica
 
  HipHop
 
  GeneralRock, Rock
 
  GeneralRock, HardRockMetal
 
 
 
misclassifying a Jazz example as blues will score 0.5 points.
 
 
* Tracks missclassifed as a completely dissimilar class will score 0.0 points.
 
* Test significance of differences in error rates of each system at each iteration using McNemar's test, mean average and standard deviation of P-values.
 
  
== Submission ==
+
== Submission Format ==
 
=== File I/O Format ===
 
=== File I/O Format ===
 +
The audio files to be used in these tasks will be specified in a simple ASCII list file. The formats for the list files are specified below:
  
For all the three tasks, scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique file names will be assigned to each audio track.
 
  
The audio files to be used in these tasks will be specified in a simple ASCII list file. The formats for the list files are specified below:
+
==== Feature extraction list file ====
 +
The list file passed for feature extraction will be a simple ASCII list file. This file will contain one path per line with no header line.
 +
I.e.
 +
<example path and filename>
  
==== Feature extraction list file ====
+
E.g.
 +
/path/to/track1.wav
 +
/path/to/track2.wav
 +
...
  
The list file passed for feature extraction will be a simple ASCII list file. This file will contain one path per line with no header line.
 
E.g. <example path and filename>
 
  
 
==== Training list file ====
 
==== Training list file ====
 +
The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the class (artist, genre or mood) label, again with no header line.
  
The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the class (artist, genre or mood) label, again with no header line.
+
I.e.
 +
<example path and filename>\t<class label>
 +
 
 +
E.g.
 +
/path/to/track1.wav rock
 +
/path/to/track2.wav blues
 +
...
  
  E.g. <example path and filename>\t<class label>
 
  
 
==== Test (classification) list file ====
 
==== Test (classification) list file ====
 +
The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.
 +
 +
I.e.
 +
<example path and filename>
  
The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.
+
E.g.
 +
/path/to/track1.wav
 +
/path/to/track2.wav
 +
...
  
  
 
==== Classification output file ====
 
==== Classification output file ====
 +
Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.
  
Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.
+
I.e.
 +
<example path and filename>\t<class label>
 +
 
 +
E.g.
 +
/path/to/track1.wav classical
 +
/path/to/track2.wav blues
 +
...
  
  E.g. <example path and filename>\t<class label>
 
  
 
=== Submission calling formats ===
 
=== Submission calling formats ===
 
 
Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.
 
Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.
  
 
Hence, participants should provide two executables or command line parameters for a single executable to run the two separate processes.
 
Hence, participants should provide two executables or command line parameters for a single executable to run the two separate processes.
  
Also, executables will have to accept the paths to the aforementioned list files as command line parameters.
+
Executables will have to accept the paths to the aforementioned list files as command line parameters.
 +
 
 +
Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique file names will be assigned to each audio track.
 +
 
  
 
==== Example submission calling formats ====
 
==== Example submission calling formats ====
Line 176: Line 181:
 
   extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 
   extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 
   Train.sh /path/to/scratch/folder /path/to/trainListFile.txt  
 
   Train.sh /path/to/scratch/folder /path/to/trainListFile.txt  
   Classify.sh /path/to/testListFile.txt /path/to/outputListFile.txt
+
   Classify.sh /path/to/scratch/folder /path/to/testListFile.txt /path/to/outputListFile.txt
  
 
   myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 
   myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 
   myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt  
 
   myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt  
   myAlgo.sh -classify /path/to/testListFile.txt /path/to/outputListFile.txt
+
   myAlgo.sh -classify /path/to/scratch/folder /path/to/testListFile.txt /path/to/outputListFile.txt
  
Multi-processor compute nodes (2, 4 or 8 cores) will be used to run this task. Hence, participants should attempt to use parallelism where-ever possible. Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 2, 4 or 8 thread configurations. Single threaded submissions will, of course, be accepted but may be disadvantaged by time constraints.
+
Multi-processor compute nodes will be used to run this task, however, we ask that submissions use no more than 4 cores (as we will be running a lot of submissions and will need to run some in parallel). Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 1, 2 or 4 thread/core configurations.  
  
   extractFeatures.sh -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
+
   extractFeatures.sh -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
   TrainAndClassify.sh -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
+
   TrainAndClassify.sh -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
  
   myAlgo.sh -extract -numThreads 8 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
+
   myAlgo.sh -extract -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
   myAlgo.sh -TrainAndClassify -numThreads 8 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
+
   myAlgo.sh -TrainAndClassify -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
  
 
=== Packaging submissions ===
 
=== Packaging submissions ===
  
 
* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
 
* All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). [mailto:mirproject@lists.lis.uiuc.edu IMIRSEL] should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
* Be sure to follow the [[https://www.music-ir.org/mirex/2006/index.php/Best_Coding_Practices_for_MIREX Best Coding Practices for MIREX]]
+
* Be sure to follow the [[2006:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
 
* Be sure to follow the [[MIREX 2010 Submission Instructions]]
 
* Be sure to follow the [[MIREX 2010 Submission Instructions]]
  
 
All submissions should include a README file including the following the information:
 
All submissions should include a README file including the following the information:
  
* Command line calling format for all executables
+
* Command line calling format for all executables including examples
 
* Number of threads/cores used or whether this should be specified on the command line
 
* Number of threads/cores used or whether this should be specified on the command line
 
* Expected memory footprint
 
* Expected memory footprint
 
* Expected runtime
 
* Expected runtime
 
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
 
* Approximately how much scratch disk space will the submission need to store any feature/cache files?
* Any required environments (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
+
* Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
 
* Any special notice regarding to running your algorithm
 
* Any special notice regarding to running your algorithm
  
Line 209: Line 214:
  
 
=== Time and hardware limits ===
 
=== Time and hardware limits ===
 
 
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.
 
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.
  
 
A hard limit of 24 hours will be imposed on feature extraction times.
 
A hard limit of 24 hours will be imposed on feature extraction times.
  
A hard limit of 24 hours will be imposed on each training/classification cycle, leading to a total runtime limit of 72 hours.
+
A hard limit of 48 hours will be imposed on the 3 training/classification cycles, leading to a total runtime limit of 72 hours for each submission.
 
 
=== Specific to Audio Genre Classification: Genre hierarchy ===
 
 
 
A genre hierarchy file will be provided to submissions requesting one. There is no guarantee that the tree defined by this file will be balanced (all branches being the same length). Therefore, the tree defined may have branches of length 1, 2 or 3 (excluding the root node).
 
 
 
This file will have a number of lines equal to the number fo genres (with no header line). Each line in the file will conform to one of the following formats:
 
 
 
  Highest_level_classification\tMid_level_classificaiton\tLowest_level_classification
 
  Highest_level_classification\tLowest_level_classification
 
  Lowest_level_classification
 
 
 
where \t represents a tab character and Lowest_level_classification is the actual genre label applied to files.
 
 
 
E.g. a simple file for a 4 class genre taxonomy might look like:
 
 
 
  Rock&Pop Rock Alternative Rock
 
  Rock&Pop Rock
 
  Rock&Pop Pop
 
  Classical
 
 
 
  
 
=== Submission opening date ===
 
=== Submission opening date ===
  
TBA
+
Friday 4th June 2010
  
 
=== Submission closing date ===
 
=== Submission closing date ===
  
 
TBA
 
TBA
 +
  
 
== Links to Previous MIREX Runs of These Classification Tasks ==
 
== Links to Previous MIREX Runs of These Classification Tasks ==

Latest revision as of 10:15, 14 July 2010

Description

Many tasks in music classification can be characterized into a two-stage process: training classification models using labeled data and testing the models using new/unseen data. Therefore, we propose this "meta" task which includes various audio classification tasks that follow this Train/Test process. For MIREX 2010, five classification sub-tasks are included:

  • Audio Artist Identification
  • Audio Classical Composer Identification
  • Audio US Pop Music Genre Classification
  • Audio Latin Music Genre Classification
  • Audio Mood Classification

All five classification tasks were conducted in previous MIREX runs (please see #Links to Previous MIREX Runs of These Classification Tasks). This page presents the evaluation of these tasks, including the datasets as well as the submission rules and formats.


Task specific mailing list

In the past we have use a specific mailing list for the discussion of this task and related tasks (e.g., 2010:Audio Classification (Train/Test) Tasks, 2010:Audio Cover Song Identification, 2010:Audio Tag Classification, 2010:Audio Music Similarity and Retrieval). This year, however, we are asking that all discussions take place on the MIREX "EvalFest" list. If you have an question or comment, simply include the task name in the subject heading.

Data

Audio Artist Identification

This dataset requires algorithms to classify music audio according to the performing artist. The collection used at MIREX 2009 will be re-used.

Collection statistics:

  • 3150 30-second 22.05kHz mono wav audio clips drawn from a collection US Pop music.
  • 105 artists (30 clips per artist drawn from 3 albums).


Audio Classical Composer Identification

This dataset requires algorithms to classify music audio according to the composer of the track (drawn from a collection of performances of a variety of classical music genres). The collection used at MIREX 2009 will be re-used.

Collection statistics:

  • 2772 30-second 22.05 kHz mono wav clips
  • 11 "classical" composers (252 clips per composer), including:
    • Bach
    • Beethoven
    • Brahms
    • Chopin
    • Dvorak
    • Handel
    • Haydn
    • Mendelssohn
    • Mozart
    • Schubert
    • Vivaldi


Audio US Pop Music Genre Classification

This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of US Pop music tracks). The MIREX 2007 Genre dataset will be re-used, which was drawn from the USPOP 2002 and USCRAP collections.

Collection statistics:

  • 7000 30-second audio clips in 22.05kHz mono WAV format
  • 10 genres (700 clips from each genre), including:
    • Blues
    • Jazz
    • Country/Western
    • Baroque
    • Classical
    • Romantic
    • Electronica
    • Hip-Hop
    • Rock
    • HardRock/Metal


Audio Latin Music Genre Classification

This dataset requires algorithms to classify music audio according to the genre of the track (drawn from a collection of Latin popular and dance music, sourced from Brazil and hand labeled by music experts). Carlos Silla's (cns2 (at) kent (dot) ac (dot) uk) Latin popular and dance music dataset [1] will be re-used. This collection is likely to contain a greater number of styles of music that will be differentiated by rhythmic characteristics than the MIREX 2007 dataset.

Collection statistics:

  • 3,227 audio files in 22.05kHz mono WAV format
  • 10 Latin music genres, including:
    • Axe
    • Bachata
    • Bolero
    • Forro
    • Gaucha
    • Merengue
    • Pagode
    • Sertaneja
    • Tango


Audio Mood Classification

This dataset requires algorithms to classify music audio according to the mood of the track (drawn from a collection of production msuic sourced from the APM collection [2]). The MIREX 2007 Mood Classification dataset [3] will be re-used.

Collection statistics:

  • 600 30 second audio clips in 22.05kHz mono WAV format selected from the APM collection [4], and labeled by human judges using the Evalutron6000 system.
  • 5 mood categories [5] each of which contains 120 clips:
    • Cluster_1: passionate, rousing, confident,boisterous, rowdy
    • Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured
    • Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding
    • Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry
    • Cluster_5: aggressive, fiery,tense/anxious, intense, volatile,visceral


Audio Formats

For all datasets, participating algorithms will have to read audio in the following format:

  • Sample rate: 22 KHz
  • Sample size: 16 bit
  • Number of channels: 1 (mono)
  • Encoding: WAV


Evaluation

This section first describes evaluation methods common to all the datasets, then specifies settings unique to each of the tasks.

Participating algorithms will be evaluated with 3-fold cross validation. For Artist Identification and Classical Composer Classification, album filtering will be used the test and training splits, i.e. training and test sets will contain tracks from different albums; for US Pop Genre Classification and Latin Genre Classification, artist filtering will be used the test and training splits, i.e. training and test sets will contain different artists.

The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.

Classification accuracies will be tested for statistically significant differences using Friedman's Anova with Tukey-Kramer honestly significant difference (HSD) tests for multiple comparisons. This test will be used to rank the algorithms and to group them into sets of equivalent performance.

In addition computation times for feature extraction and training/classification will be measured.


Submission Format

File I/O Format

The audio files to be used in these tasks will be specified in a simple ASCII list file. The formats for the list files are specified below:


Feature extraction list file

The list file passed for feature extraction will be a simple ASCII list file. This file will contain one path per line with no header line. I.e.

<example path and filename>

E.g.

/path/to/track1.wav
/path/to/track2.wav
...


Training list file

The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the class (artist, genre or mood) label, again with no header line.

I.e.

<example path and filename>\t<class label>

E.g.

/path/to/track1.wav	rock
/path/to/track2.wav	blues
...


Test (classification) list file

The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.

I.e.

<example path and filename>

E.g.

/path/to/track1.wav
/path/to/track2.wav
...


Classification output file

Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.

I.e.

<example path and filename>\t<class label>

E.g.

/path/to/track1.wav	classical
/path/to/track2.wav	blues
...


Submission calling formats

Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.

Hence, participants should provide two executables or command line parameters for a single executable to run the two separate processes.

Executables will have to accept the paths to the aforementioned list files as command line parameters.

Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique file names will be assigned to each audio track.


Example submission calling formats

 extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
 extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 Train.sh /path/to/scratch/folder /path/to/trainListFile.txt 
 Classify.sh /path/to/scratch/folder /path/to/testListFile.txt /path/to/outputListFile.txt
 myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt 
 myAlgo.sh -classify /path/to/scratch/folder /path/to/testListFile.txt /path/to/outputListFile.txt

Multi-processor compute nodes will be used to run this task, however, we ask that submissions use no more than 4 cores (as we will be running a lot of submissions and will need to run some in parallel). Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 1, 2 or 4 thread/core configurations.

 extractFeatures.sh -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 TrainAndClassify.sh -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
 myAlgo.sh -extract -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt
 myAlgo.sh -TrainAndClassify -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt

Packaging submissions

  • All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). IMIRSEL should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
  • Be sure to follow the Best Coding Practices for MIREX
  • Be sure to follow the MIREX 2010 Submission Instructions

All submissions should include a README file including the following the information:

  • Command line calling format for all executables including examples
  • Number of threads/cores used or whether this should be specified on the command line
  • Expected memory footprint
  • Expected runtime
  • Approximately how much scratch disk space will the submission need to store any feature/cache files?
  • Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
  • Any special notice regarding to running your algorithm

Note that the information that you place in the README file is extremely important in ensuring that your submission is evaluated properly.

Time and hardware limits

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.

A hard limit of 24 hours will be imposed on feature extraction times.

A hard limit of 48 hours will be imposed on the 3 training/classification cycles, leading to a total runtime limit of 72 hours for each submission.

Submission opening date

Friday 4th June 2010

Submission closing date

TBA


Links to Previous MIREX Runs of These Classification Tasks

Audio Artist Identification

Artist Identification in MIREX 2009 || Results(Classical Composer)

Artist Identification in MIREX 2008 || Results(Classical Composer) || Results(Artist Identification)

Artist Identification in MIREX 2007 || Results

Classical Composer Identification in MIREX 2007 || Results

Artist Identification in MIREX 2005 || Results

Audio Artist Identification in ISMIR2004 Audio Description Contest

Audio Genre Classification

Audio Genre Classification in MIREX 2009 || Results(Latin Set) || Results(Mixed Set)

Audio Genre Classification in MIREX 2008 || Results

Audio Genre Classification in MIREX 2007 || Results

Audio Genre Classification in MIREX 2005 || Results

Audio Artist Identification in ISMIR2004 Audio Description Contest

Audio Mood Classification

Audio Mood Classification in MIREX 2009 || Results

Audio Mood Classification in MIREX 2008 || Results

Audio Mood Classification in MIREX 2007 || Results