Difference between revisions of "2013:Audio K-POP Mood Classification"
(→Data) |
Kahyun Choi (talk | contribs) (→Description) |
||
Line 3: | Line 3: | ||
== Description == | == Description == | ||
− | This year, IMIRSEL and KETI collaboratively developed a mood classification dataset with K-POP music. There are 1894 K-POP songs in five mood categories, annotated by a number of American annotators and Korean annotators. The goals of this task are 1)to see if mood classification models developed on Western (or other cultural) music can be applied to K-POP music; 2)to see if classification models can be equally effective on predicting mood labels by American annotator and predicting | + | This year, IMIRSEL and KETI collaboratively developed a mood classification dataset with K-POP music. There are 1894 K-POP songs in five mood categories, annotated by a number of American annotators and Korean annotators. The goals of this task are 1)to see if mood classification models developed on Western (or other cultural) music can be applied to K-POP music; 2)to see if classification models can be equally effective on predicting mood labels by American annotator and predicting mood labels by Korean annotator. |
== Data == | == Data == |
Revision as of 10:17, 17 September 2013
Contents
K-POP Mood Classification
Description
This year, IMIRSEL and KETI collaboratively developed a mood classification dataset with K-POP music. There are 1894 K-POP songs in five mood categories, annotated by a number of American annotators and Korean annotators. The goals of this task are 1)to see if mood classification models developed on Western (or other cultural) music can be applied to K-POP music; 2)to see if classification models can be equally effective on predicting mood labels by American annotator and predicting mood labels by Korean annotator.
Data
There are 1438 songs, and each song is labeled with one of the five mood categories (the same five mood categories as in the Audio Mood Classification sub-task in 2013:Audio Classification (Train/Test) Tasks :
- Cluster_1: passionate, rousing, confident,boisterous, rowdy
- Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured
- Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding
- Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry
- Cluster_5: aggressive, fiery,tense/anxious, intense, volatile,visceral
How the mood categories were identified can be found in Xiao Hu and J. Stephen Downie (2007). Exploring mood metadata: Relationships with genre, artist and usage metadata, Proceedings of the 8th International Conference on Music Information Retrieval (ISMIR) [1] .
The songs are NOT evenly distributed across the five moods. The folds in the cross validation will be split in a stratified manner (i.e., songs in one genre is evenly distributed across folds).
The audio is in 22.05kHz mono WAV format, each clip is 30 seconds long.
There will be two independent tasks: one uses genre annotations by American annotators, the other uses those by Korean annotators.
Annotations
Each song is annotated by three American annotators and three Korean annotators. The songs used in this task have majority vote (2 of 3 annotators agreed) or unanimous agreement (all 3 annotators agreed) on one of the five mood categories.
Evaluation
The evaluation method is the same as that in the mood classification sub-tasks in Audio Classification (Train/Test) Task. Participating algorithms will be evaluated with 3-fold cross validation. Artist filtering will be used in the test and training splits, i.e. training and test sets will contain different artists.
The raw classification (identification) accuracy, standard deviation and a confusion matrix for each algorithm will be computed.
Classification accuracies will be tested for statistically significant differences using Friedman's Anova with Tukey-Kramer honestly significant difference (HSD) tests for multiple comparisons. This test will be used to rank the algorithms and to group them into sets of equivalent performance.
In addition computation times for feature extraction and training/classification will be measured.
Submission Format
The following specifications are the same as those in 2013:Audio Classification (Train/Test) Tasks
File I/O Format
The audio files to be used in these tasks will be specified in a simple ASCII list file. The formats for the list files are specified below:
Feature extraction list file
The list file passed for feature extraction will be a simple ASCII list file. This file will contain one path per line with no header line. I.e.
<example path and filename>
E.g.
/path/to/track1.wav /path/to/track2.wav ...
Training list file
The list file passed for model training will be a simple ASCII list file. This file will contain one path per line, followed by a tab character and the class (artist, genre or mood) label, again with no header line.
I.e.
<example path and filename>\t<class label>
E.g.
/path/to/track1.wav rock /path/to/track2.wav blues ...
Test (classification) list file
The list file passed for testing classification will be a simple ASCII list file identical in format to the Feature extraction list file. This file will contain one path per line with no header line.
I.e.
<example path and filename>
E.g.
/path/to/track1.wav /path/to/track2.wav ...
Classification output file
Participating algorithms should produce a simple ASCII list file identical in format to the Training list file. This file will contain one path per line, followed by a tab character and the artist label, again with no header line.
I.e.
<example path and filename>\t<class label>
E.g.
/path/to/track1.wav classical /path/to/track2.wav blues ...
Submission calling formats
Algorithms should divide their feature extraction and training/classification into separate runs. This will facilitate a single feature extraction step for the task, while training and classification can be run for each cross-validation fold.
Hence, participants should provide two executables or command line parameters for a single executable to run the two separate processes.
Executables will have to accept the paths to the aforementioned list files as command line parameters.
Scratch folders will be provided for all submissions for the storage of feature files and any model files to be produced. Executables will have to accept the path to their scratch folder as a command line parameter. Executables will also have to track which feature files correspond to which audio files internally. To facilitate this process, unique file names will be assigned to each audio track.
Example submission calling formats
extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt TrainAndClassify.sh /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
extractFeatures.sh /path/to/scratch/folder /path/to/featureExtractionListFile.txt Train.sh /path/to/scratch/folder /path/to/trainListFile.txt Classify.sh /path/to/scratch/folder /path/to/testListFile.txt /path/to/outputListFile.txt
myAlgo.sh -extract /path/to/scratch/folder /path/to/featureExtractionListFile.txt myAlgo.sh -train /path/to/scratch/folder /path/to/trainListFile.txt myAlgo.sh -classify /path/to/scratch/folder /path/to/testListFile.txt /path/to/outputListFile.txt
Multi-processor compute nodes will be used to run this task, however, we ask that submissions use no more than 4 cores (as we will be running a lot of submissions and will need to run some in parallel). Ideally, the number of threads to use should be specified as a command line parameter. Alternatively, implementations may be provided in hard-coded 1, 2 or 4 thread/core configurations.
extractFeatures.sh -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt TrainAndClassify.sh -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
myAlgo.sh -extract -numThreads 4 /path/to/scratch/folder /path/to/featureExtractionListFile.txt myAlgo.sh -TrainAndClassify -numThreads 4 /path/to/scratch/folder /path/to/trainListFile.txt /path/to/testListFile.txt /path/to/outputListFile.txt
Packaging submissions
- All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed). IMIRSEL should be notified of any dependencies that you cannot include with your submission at the earliest opportunity (in order to give them time to satisfy the dependency).
- Be sure to follow the Best Coding Practices for MIREX
- Be sure to follow the MIREX 2013 Submission Instructions
All submissions should include a README file including the following the information:
- Command line calling format for all executables including examples
- Number of threads/cores used or whether this should be specified on the command line
- Expected memory footprint
- Expected runtime
- Approximately how much scratch disk space will the submission need to store any feature/cache files?
- Any required environments/architectures (and versions) such as Matlab, Java, Python, Bash, Ruby etc.
- Any special notice regarding to running your algorithm
Note that the information that you place in the README file is extremely important in ensuring that your submission is evaluated properly.
Time and hardware limits
Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.
A hard limit of 24 hours will be imposed on feature extraction times.
A hard limit of 48 hours will be imposed on the 3 training/classification cycles, leading to a total runtime limit of 72 hours for each submission.
Potential Participants
name / email