2007:Audio Music Mood Classification

Introduction

In music psychology and music education, emotion component of music has been recognized as the most strongly associated with music expressivity.(e.g. Juslin et al 2006#Related Papers). Music information behavior studies (e.g.Cunningham, Jones and Jones 2004, Cunningham, Vignoli 2004, Bainbridge and Falconer 2006 #Related Papers) have also identified music mood/ emotion as an important criterion used by people in music seeking and organization. Several experiments have been conducted in the MIR community to classify music by mood (e.g. Lu, Liu and Zhang 2006, Pohle, Pampalk, and Widmer 2005, Mandel, Poliner and Ellis 2006, Feng, Zhuang and Pan 2003#Related Papers). Please note: the MIR community tends to use the word "mood" while musicpsychologists like to use "emotion". We follow the MIR tradition to use "mood" thereafter.

However, evaluation of music mood classification is difficult as music mood is a very subjective notion. Each aforementioned experiement used different mood categories and different datasets, making comparison on previous work a virtually impossible mission. A contest on music mood classification in MIREX will help build the first ever community available test set and precious ground truth.

This is the first time in MIREX to attempt a music mood classification evaluation. There are many issues involved in this evaluation task, and let us start discuss them on this wiki. If needed, we will set up a mailing list devoting to the discussion.

Mood Categories

The IMIRSEL has derived a set of 5 mood clusters from the AMG mood repository (Hu & Downie 2007#Related Papers). The mood clusters effectively reduce the diverse mood space into a tangible set of categories, and yet root in the social-cultural context of pop music. Therefore, we propose to use the 5 mood clusters as the categories in this yearΓÇÖs audio mood classification contest. Each of the clusters is a collection of the AMG mood labels which collectively define the cluster:

Cluster_1: passionate, rousing, confident,boisterous, rowdy
Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured
Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding
Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry
Cluster_5: aggressive, fiery,tense/anxious, intense, volatile,visceral

At this moment, the IMIRSEL and Cyril Laurier at the Music Technology Group of Barcelona have manually validated the mood clusters and exemplar songs in each cluster. Please see #Exemplar Songs in Each Category for details.

Previous Discussion on Mood Taxonomy

Discussion on Mood Categories

Ground Truth

Corresponding to how the mood taxonomy is going to be set up, there are two ways to obtain ground truth for evaluation purpose.

1. human judgment: we can elicit subjective judgments by human evaluators by using an online application comparable to IMIRSEL's Evalutron 6000. Details need to be further discussed. To start, we propose human evaluators to choose one mood label from a set for each music piece. Each piece may get at least 3 eyeballs and a label with at least 2 votes will be assigned to this piece as ground truth. Of cause there will be disagreement and depending on the number of available categories, votes to some pieces may be too scattered and thus invalidate judgments on those pieces.

2. collect labels from popular music websites. A problem is AMG only provide labels for albums. And even if labels for tracks are available, they might not be available for the pieces in our contest dataset.

Yet another (good) way of obtaining ground truth: 3. obtain datasets used in existing research. Those datasets have been labeled by individual reseachers.

Data Collection

So far researchers have been using personal collections or those owned by their institutions. It would be the best if we could reuse their collections because ground truth is ready. Otherwise, the IMIRSEL lab has USPOP and USCRAP collections, but will need to obtain ground truth labels.

Data format

Many existing work stereo music to a mono signal with a sampling frequency of 22050 Hz with 16 bits precision. We will keep this format in this contest.

Tuning set

It is unlikely that we will be able to distribute a tuning dataset against which algorithms may be tested prior to the contest. Participants please feel free to use any data other than the contest collection to tune algorithms. The evaluation will use n-fold cross validation on the contest collection to eliminate any bias on data splitting.

Evaluation

Like the genre classification task before, we will use accuracy and its standard deviation as evaluation measures (in the cases of uneven class sizes, both measures will be normalized according to class size).

Test significance of differences in error rates of each system using McNemar's test, mean average and standard deviation of P-values.

Important Dates

TBD

File Format

TBD

Submission Format

TBD

Challenging Issues

Mood changeable pieces: some pieces may start from one mood but end up with another one. For each of those,we can either label it with the most salient mood or just inconsistent judgments rule it out.

Multiple label classification: it is possible that one piece can have two or more correct mood labels, but as a start, we strongly suggest to hold a less confusing contest and leave the challenge to future MIREXs.

Participants

If you think there is a slight chance that you might consider participating, please add your name and email address here.

Kris West (kw at cmp dot uea dot ac dot uk)
Cyril Laurier (claurier at iua dot upf dot edu)
Elias Pampalk (firstname.lastname@gmail.com)
Yuriy Molchanyuk (molchanyuk at onu.edu.ua)
Shigeki Sagayama (sagayama at hil dot t.u-tokyo.ac.jp)
Guillaume Nargeot (killy971 at gmail dot com)
Zhongzhe Xiao (zhongzhe dot xiao at ec-lyon dot fr)
Kyogu Lee (kglee at ccrma.stanford.edu)
Vitor Soares (firstname.lastname@clustermedialabs.com)
Wai Cheung (wlche1@infotech.monash.edu.au)

Moderators

J. Stephen Downie (IMIRSEL, University of Illinois, USA) - [1]
Xiao Hu (IMIRSEL, University of Illinois, USA) -[2]
Cyril Laurier (Music Technology Group, Barcelona, Spain) -[3]

Related Papers

Dietterich, T. (1997). Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10(7), 1895-1924.
Hu, Xiao and J. Stephen Downie (2007). Exploring mood metadata: Relationships with genre, artist and usage metadata. Accepted in the Eighth International Conference on Music Information Retrieval (ISMIR 2007),Vienna, September 23-27, 2007.
Juslin, P.N., Karlsson, J., Lindstr├╢m E., Friberg, A. and Schoonderwaldt, E(2006), Play It Again With Feeling: Computer Feedback in Musical Communication of Emotions. In Journal of Experimental Psychology: Applied 2006, Vol.12, No.2, 79-95.
Vignoli (ISMIR 2004) Digital Music Interaction Concepts: A User Study
Cunningham, Jones and Jones (ISMIR 2004) Organizing Digital Music For Use: An Examiniation of Personal Music Collections.
Cunningham, Bainbridge and Falconer (ISMIR 2006) More of an Art than a Science': Supporting the Creation of Playlists and Mixes.
Lu, Liu and Zhang (2006), Automatic Mood Detection and Tracking of Music Audio Signals. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 1, JANUARY 2006
Part of this paper appeared in ISMIR 2003 http://ismir2003.ismir.net/papers/Liu.PDF
Pohle, Pampalk, and Widmer (CBMI 2005) Evaluation of Frequently Used Audio Features for Classification of Music into Perceptual Categories.
It separates "mood" and "emotion" as two classifcation dimensions, which are mostly combined in other studies.
Mandel, Poliner and Ellis (2006) Support vector machine active learning for music retrieval. Multimedia Systems, Vol.12(1). Aug.2006.
Feng, Zhuang and Pan (SIGIR 2003) Popular music retrieval by detecting mood
Li and Ogihara (ISMIR 2003) Detecting emotion in music
Hilliges, Holzer, Kl├╝ber and Butz (2006) AudioRadar: A metaphorical visualization for the navigation of large music collections.In Proceedings of the International Symposium on Smart Graphics 2006, Vancouver Canada.
It summarized implicit problems in traditional genre/artist based music organization.
Juslin, P. N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening. Journal of New Music Research, 33(3), 217-238.

2007:Audio Music Mood Classification

Contents

Introduction

Mood Categories

Ground Truth

Data Collection

Data format

Tuning set

Evaluation

Important Dates

File Format

Submission Format

Challenging Issues

Participants

Moderators

Related Papers

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools