2007:Audio Music Mood Classification
Audio Music Mood Classification
Contents
Introduction
In musicpsychology and music education, emotion component of music has been regonized as
the most strongly associated with music expressivity.(e.g. Juslin et al 2006[[#Related
Papers]]). Music information behavior studies (e.g.Cunningham, Jones and Jones 2004,
Cunningham, Vignoli 2004, Bainbridge and Falconer 2006 #Related Papers) have also
identified music mood/ emotion as an important criterion used by people in music seeking
and organiztion. Several experiments have been conducted in the MIR community to classify
music by mood (e.g. Lu, Liu and Zhang 2006, Pohle, Pampalk, and Widmer 2005, Mandel,
Poliner and Ellis 2006, Feng, Zhuang and Pan 2003#Related Papers). Please note: the
MIR community tends to use the word "mood" while musicpsychologists like to use "emotion".
We follow the MIR tradition to use "mood" thereafter.
However, evaluation of music mood classification is difficult as music mood is a very
subjective notion. Each aforementioned experiement used different mood categories and
different dataset, making comparison on previous work a virtually impossible mission. A
contest on music mood classification in MIREX will help build the first ever well
recognized mood taxomony, a scalable test set and precious ground truth.
This is the first time in MIREX to attempt a music mood classification evaluation. There
are many issues involved in this evaluation task, and let us start discuss them on wiki.
If needed, we will set up a mailing list devoting to the discussion.
Mood taxonomy
There are mainly two approaches to set up a music mood taxomony:
1. Starts from theories in music perception: (Lu, Liu and Zhang 2006) adopted Thayer's two dimensional energy-stress mood model which
divides music mood into four clusters:
- Contentment (low energy, low stress)
- Depression (low energy, high stress)
- Exuberance (high energy, low stress)
- Anxious/Frantic (high energy, high stress)
The authors argue "these four clusters almost cover the basic mood response to music and
they are usually in the most highly rated emotions as discovered in [music perception and
education]". Also, they pointed out there are alternative adjectives that are equivalent
to the above four, and those may be better in describing music mood, such as
Tenderness, Sadness, Happiness, and Fear/Anger.
(Feng, Zhuang and Pan 2003) seemed to follow this approach and used four categories,
happiness, sadness, anger and fear.
(Li and Ogihara 2003) followed another mood model called Farnsworth model, and gave binary label (existence versus non-existence) based on the ten adjective groups in Farnsworth. We
agree with the authors that this many labels made the task too difficult, thus the
performance was very low.
2. Derive from practice of music information service: Popular music websites and software (e.g. AllMusicGuild [[1]]],
MoodLogic [[2]]]) seek to exploit emotional aspects of music and
provide mood labels for albums or sound tracks. (Mandel, Poliner and Ellis 2006) used mood
labels on AMG that included 50 or more songs which results in 100 mood labels. An
advantage of this approach is the ground truth is already provided by those websites.
However, such a large number of categories seems overwhelming for the evaluation and post
-evaluation analysis. It would be ideal if we could come up a method to cluster those
labels into a smaller number of categories (perhaps under the direction of
musicpsychological theories). In this way, we can leverage the available labels and keep
the contest in a managable scale.
Ground Truth
Corresponding to how the mood taxonomy is going to be set up, there are two ways to obtain
ground truth for evaluation purpose.
1. human judgment: we can elicit subjective judgments by human evaluators by using an
online application comparable to IMIRSEL's Evalutron 6000. Details need to be further
discussed. To start, we propose human evaluators to choose one mood label from a set for
each music piece. Each piece may get at least 3 eyeballs and a label with at least 2 votes
will be assigned to this piece as ground truth. Of cause there will be disagreement and
depending on the number of available categories, votes to some pieces may be too scattered
and thus invalidate judgments on those pieces.
2. collect labels from popular music websites. A problem is AMG only provide labels for
albums. And even if labels for tracks are available, they might not be available for the
pieces that we own.
3. obtain datasets used in existing research. Those datasets have been labeled by
individual reseachers.
Data Collection
So far researchers have been using personal collections or those owned by their
institutions. It would be the best if we could reuse their collections because ground
truth is ready. Otherwise, the IMIRSEL lab has USPOP and USCRAP collections, but will need
to obtain ground truth labels.
Data format
Many existing work stereo music to a mono signal with a sampling frequency of 22050 Hz with 16 bits precision. We will keep this format in this contest.
Training set
It is unlikely that the contest would distribute any training dataset. Participants please
feel free to use any data other than the contest collection to tune algorithms. The
evaluation will use n-fold cross validation on the contest collection to eliminate any
bias on data splitting.
Evaluation
Like the genre classification task before, We will use accuracy and standard deviation of
results (in the event of uneven class sizes both this will be normalised according to
class size).
Test significance of differences in error rates of each system at each iteration using
McNemar's test, mean average and standard deviation of P-values.
Important Dates
TBD
File Format
TBD
Submission Format
TBD
Challenging Issues
- Mood changeable pieces: some pieces may start from one mood but end up with another one.
For each of those,we can either label it with the most salient mood or just let
inconsistent judgments rule it out.
- Multiple label classification: it is possible that one piece can have two or more
correct mood labels, but we strongly suggest to start with a less confusing contest and
leave the challenge to future MIREXs.
Opt-in survey of Audio music mood classification researchers
In this section we would like to take a brief 'opt-in' survey of researchers actively
working in this field. Please feel free to add yourself to the list (or email your details
to the moderators listed below).