Difference between revisions of "2007:Audio Music Mood Classification"

From MIREX Wiki
(New page: Audio Music Mood Classification == Introduction == In musicpsychology and music education, emotion component of music has been regonized as the most strongly associated with music expre...)
 
Line 1: Line 1:
Audio Music Mood Classification
 
 
 
== Introduction ==
 
== Introduction ==
In musicpsychology and music education, emotion component of music has been regonized as  
+
In musicpsychology and music education, emotion component of music has been regonized as the most strongly associated with music expressivity.(e.g. Juslin et al 2006[[#Related Papers]]). Music information behavior studies (e.g.Cunningham, Jones and Jones 2004, Cunningham, Vignoli 2004, Bainbridge and Falconer 2006 [[#Related Papers]]) have also identified music mood/ emotion as an important criterion used by people in music seeking and organiztion. Several experiments have been conducted in the MIR community to classify music by mood (e.g. Lu, Liu and Zhang 2006, Pohle, Pampalk, and Widmer 2005, Mandel, Poliner and Ellis 2006, Feng, Zhuang and Pan 2003[[#Related Papers]]). Please note: the MIR community tends to use the word "mood" while musicpsychologists like to use "emotion". We follow the MIR tradition to use "mood" thereafter.  
 
 
the most strongly associated with music expressivity.(e.g. Juslin et al 2006[[#Related  
 
 
 
Papers]]). Music information behavior studies (e.g.Cunningham, Jones and Jones 2004,  
 
 
 
Cunningham, Vignoli 2004, Bainbridge and Falconer 2006 [[#Related Papers]]) have also  
 
 
 
identified music mood/ emotion as an important criterion used by people in music seeking  
 
 
 
and organiztion. Several experiments have been conducted in the MIR community to classify  
 
 
 
music by mood (e.g. Lu, Liu and Zhang 2006, Pohle, Pampalk, and Widmer 2005, Mandel,  
 
 
 
Poliner and Ellis 2006, Feng, Zhuang and Pan 2003[[#Related Papers]]). Please note: the  
 
 
 
MIR community tends to use the word "mood" while musicpsychologists like to use "emotion".  
 
 
 
We follow the MIR tradition to use "mood" thereafter.
 
 
 
However, evaluation of music mood classification is difficult as music mood is a very
 
 
 
subjective notion. Each aforementioned experiement used different mood categories and
 
 
 
different dataset, making comparison on previous work a virtually impossible mission. A
 
 
 
contest on music mood classification in MIREX will help build the first ever well
 
 
 
recognized mood taxomony, a scalable test set and precious ground truth.
 
  
This is the first time in MIREX to attempt a music mood classification evaluation. There
+
However, evaluation of music mood classification is difficult as music mood is a very subjective notion. Each aforementioned experiement used different mood categories and different dataset, making comparison on previous work a virtually impossible mission. A contest on music mood classification in MIREX will help build the first ever well recognized mood taxomony, a scalable test set and precious ground truth.
  
are many issues involved in this evaluation task, and let us start discuss them on wiki.  
+
This is the first time in MIREX to attempt a music mood classification evaluation. There are many issues involved in this evaluation task, and let us start discuss them on wiki. If needed, we will set up a mailing list devoting to the discussion.
 
 
If needed, we will set up a mailing list devoting to the discussion.
 
  
 
== Mood taxonomy ==
 
== Mood taxonomy ==
Line 42: Line 10:
  
 
1. Starts from theories in music perception:
 
1. Starts from theories in music perception:
(Lu, Liu and Zhang 2006) adopted Thayer's two dimensional energy-stress mood model which  
+
(Lu, Liu and Zhang 2006) adopted Thayer's two dimensional energy-stress mood model which divides music mood into four clusters:  
 
 
divides music mood into four clusters:  
 
 
# Contentment (low energy, low stress)
 
# Contentment (low energy, low stress)
 
# Depression (low energy, high stress)
 
# Depression (low energy, high stress)
Line 50: Line 16:
 
# Anxious/Frantic (high energy, high stress)
 
# Anxious/Frantic (high energy, high stress)
  
The authors argue "these four clusters almost cover the basic mood response to music and  
+
The authors argue "these four clusters almost cover the basic mood response to music and they are usually in the most highly rated emotions as discovered in [music perception and education]". Also, they pointed out there are alternative adjectives that are equivalent to the above four, and those may be better in describing music mood, such as <i> Tenderness, Sadness, Happiness, and Fear/Anger. </i>
 
 
they are usually in the most highly rated emotions as discovered in [music perception and  
 
 
 
education]". Also, they pointed out there are alternative adjectives that are equivalent  
 
 
 
to the above four, and those may be better in describing music mood, such as <i>  
 
 
 
Tenderness, Sadness, Happiness, and Fear/Anger. </i>
 
 
 
(Feng, Zhuang and Pan 2003) seemed to follow this approach and used four categories,
 
  
happiness, sadness, anger and fear.
+
(Feng, Zhuang and Pan 2003) seemed to follow this approach and used four categories, happiness, sadness, anger and fear.
  
 
(Li and Ogihara 2003) followed another mood model called Farnsworth model, and gave binary
 
(Li and Ogihara 2003) followed another mood model called Farnsworth model, and gave binary
label (existence versus non-existence) based on the ten adjective groups in Farnsworth. We  
+
label (existence versus non-existence) based on the ten adjective groups in Farnsworth. We agree with the authors that this many labels made the task too difficult, thus the performance was very low.  
 
 
agree with the authors that this many labels made the task too difficult, thus the  
 
 
 
performance was very low.  
 
  
 
2. Derive from practice of music information service:  
 
2. Derive from practice of music information service:  
Popular music websites and software (e.g. AllMusicGuild [[http://www.all-music.com]]],  
+
Popular music websites and software (e.g. AllMusicGuild [[http://www.allmusic.com]]], MoodLogic [[http://www.moodlogic.net]]]) seek to exploit emotional aspects of music and provide mood labels for albums or sound tracks. (Mandel, Poliner and Ellis 2006) used mood labels on AMG that included 50 or more songs which results in 100 mood labels. An advantage of this approach is the ground truth is already provided by those websites. However, such a large number of categories seems overwhelming for the evaluation and post-evaluation analysis. It would be ideal if we could come up a method to cluster those labels into a smaller number of categories (perhaps under the direction of musicpsychological theories). In this way, we can leverage the available labels and keep the contest in a managable scale.
 
 
MoodLogic [[http://www.moodlogic.net]]]) seek to exploit emotional aspects of music and  
 
 
 
provide mood labels for albums or sound tracks. (Mandel, Poliner and Ellis 2006) used mood  
 
 
 
labels on AMG that included 50 or more songs which results in 100 mood labels. An  
 
 
 
advantage of this approach is the ground truth is already provided by those websites.  
 
 
 
However, such a large number of categories seems overwhelming for the evaluation and post
 
 
 
-evaluation analysis. It would be ideal if we could come up a method to cluster those  
 
 
 
labels into a smaller number of categories (perhaps under the direction of  
 
 
 
musicpsychological theories). In this way, we can leverage the available labels and keep  
 
 
 
the contest in a managable scale.
 
  
 
   
 
   
 
== Ground Truth ==
 
== Ground Truth ==
Corresponding to how the mood taxonomy is going to be set up, there are two ways to obtain  
+
Corresponding to how the mood taxonomy is going to be set up, there are two ways to obtain ground truth for evaluation purpose.
 
 
ground truth for evaluation purpose.
 
 
 
1. human judgment: we can elicit subjective judgments by human evaluators by using an
 
 
 
online application comparable to IMIRSEL's Evalutron 6000. Details need to be further
 
 
 
discussed. To start, we propose human evaluators to choose one mood label from a set for
 
 
 
each music piece. Each piece may get at least 3 eyeballs and a label with at least 2 votes
 
  
will be assigned to this piece as ground truth. Of cause there will be disagreement and  
+
1. human judgment: we can elicit subjective judgments by human evaluators by using an online application comparable to IMIRSEL's Evalutron 6000. Details need to be further discussed. To start, we propose human evaluators to choose one mood label from a set for each music piece. Each piece may get at least 3 eyeballs and a label with at least 2 votes will be assigned to this piece as ground truth. Of cause there will be disagreement and depending on the number of available categories, votes to some pieces may be too scattered and thus invalidate judgments on those pieces.
  
depending on the number of available categories, votes to some pieces may be too scattered
+
2. collect labels from popular music websites. A problem is AMG only provide labels for albums. And even if labels for tracks are available, they might not be available for the pieces that we own.
  
and thus invalidate judgments on those pieces.
+
3. obtain datasets used in existing research. Those datasets have been labeled by individual reseachers.
 
 
2. collect labels from popular music websites. A problem is AMG only provide labels for
 
 
 
albums. And even if labels for tracks are available, they might not be available for the
 
 
 
pieces that we own.
 
 
 
3. obtain datasets used in existing research. Those datasets have been labeled by  
 
 
 
individual reseachers.
 
  
 
== Data Collection ==
 
== Data Collection ==
So far researchers have been using personal collections or those owned by their  
+
So far researchers have been using personal collections or those owned by their institutions. It would be the best if we could reuse their collections because ground truth is ready. Otherwise, the IMIRSEL lab has USPOP and USCRAP collections, but will need to obtain ground truth labels.
 
 
institutions. It would be the best if we could reuse their collections because ground  
 
 
 
truth is ready. Otherwise, the IMIRSEL lab has USPOP and USCRAP collections, but will need  
 
 
 
to obtain ground truth labels.
 
  
 
=== Data format ===  
 
=== Data format ===  
Line 136: Line 44:
  
 
=== Training set ===
 
=== Training set ===
It is unlikely that the contest would distribute any training dataset. Participants please  
+
It is unlikely that the contest would distribute any training dataset. Participants please feel free to use any data other than the contest collection to tune algorithms. The evaluation will use n-fold cross validation on the contest collection to eliminate any bias on data splitting.   
 
 
feel free to use any data other than the contest collection to tune algorithms. The  
 
 
 
evaluation will use n-fold cross validation on the contest collection to eliminate any  
 
 
 
bias on data splitting.   
 
  
 
== Evaluation ==  
 
== Evaluation ==  
  
Like the genre classification task before, We will use accuracy and standard deviation of  
+
Like the genre classification task before, We will use accuracy and standard deviation of results (in the event of uneven class sizes both this will be normalised according to class size).  
 
 
results (in the event of uneven class sizes both this will be normalised according to  
 
 
 
class size).  
 
 
 
Test significance of differences in error rates of each system at each iteration using
 
  
McNemar's test, mean average and standard deviation of P-values.  
+
Test significance of differences in error rates of each system at each iteration using McNemar's test, mean average and standard deviation of P-values.  
  
 
== Important Dates ==
 
== Important Dates ==
Line 169: Line 65:
  
 
== Challenging Issues ==  
 
== Challenging Issues ==  
# Mood changeable pieces: some pieces may start from one mood but end up with another one.  
+
# Mood changeable pieces: some pieces may start from one mood but end up with another one. For each of those,we can either label it with the most salient mood or just let inconsistent judgments rule it out.
  
For each of those,we can either label it with the most salient mood or just let
+
# Multiple label classification: it is possible that one piece can have two or more correct mood labels, but we strongly suggest to start with a less confusing contest and leave the challenge to future MIREXs.
  
inconsistent judgments rule it out.
+
== Opt-in survey of Audio music mood classification researchers ==
 +
In this section we would like to take a brief 'opt-in' survey of researchers actively working in this field. Please feel free to add yourself to the list (or email your details to the moderators listed below).
  
# Multiple label classification: it is possible that one piece can have two or more
+
== Moderators ==
 +
* J. Stephen Downie (IMIRSEL, University of Illinois, USA) - [mailto:jdownie@uiuc.edu]
 +
* Xiao Hu (IMIRSEL, University of Illinois, USA) -[mailto:xiaohu@uiuc.edu]
 +
 
 +
== Related Papers ==
 +
#[Juslin, Karlsson Lindstrom, Friberg and Schoonderwaldt (2006), '''Play It Again With Feeling: Computer Feedback in Musical Communication of Emotions''']. In Journal of Experimental Psychology: Applied 2006, Vol.12, No.2, 79-95.
 +
 
 +
#[http://ismir2004.ismir.net/proceedings/p075-page-415-paper152.pdf Vignoli (ISMIR 2004) '''Digital Music Interaction Concepts: A User Study''']
  
correct mood labels, but we strongly suggest to start with a less confusing contest and  
+
# [http://pubdb.medien.ifi.lmu.de/cgi-bin//info.pl?hilliges2006audio Hilliges, Holzer, Kl├╝ber and Butz (2006), '''AudioRadar: A metaphorical visualization for the navigation of large music collections'''].In Proceedings of the International Symposium on Smart Graphics 2006, Vancouver Canada. <i> summarized implicit problems in traditional genre/artist based music organization. </i>
  
leave the challenge to future MIREXs.
+
# [http://ismir2004.ismir.net/proceedings/p082-page-447-paper221.pdf Cunningham, Jones and Jones (ISMIR 2004), '''Organizing Digital Music For Use: An Examiniation of Personal Music Collections'''].
 +
 
 +
# [http://ismir2006.ismir.net/PAPERS/ISMIR0685_Paper.pdf Cunningham, Bainbridge and Falconer (ISMIR 2006), '''More of an Art than a Science': Supporting the Creation of Playlists and Mixes'''].
  
== Opt-in survey of Audio music mood classification researchers ==
+
#[Lu, Liu and Zhang (2006), '''Automatic Mood Detection and Tracking of Music Audio Signals''']. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 1, JANUARY 2006 <i> An earlier version of this paper appeared in [ISMIR 2003 http://ismir2003.ismir.net/papers/Liu.PDF]
In this section we would like to take a brief 'opt-in' survey of researchers actively
+
 
 +
# [http://www.cp.jku.at/research/papers/Pohle_CBMI_2005.pdf Pohle, Pampalk, and Widmer (CBMI 2005), '''Evaluation of Frequently Used Audio Features for Classification of Music into Perceptual Categories'''].<i>it separates "mood" and "emotion" as two classifcation dimensions, which are mostly combined in other studies.</i>
  
working in this field. Please feel free to add yourself to the list (or email your details
+
# [http://www.ee.columbia.edu/~dpwe/pubs/MandPE06-svm.pdf  Mandel, Poliner and Ellis (2006), '''Support vector machine active learning for music retrieval'''].  Multimedia Systems, Vol.12(1). Aug.2006.
  
to the moderators listed below).
+
# [http://doi.acm.org/10.1145/860435.860508 Feng, Zhuang and Pan (SIGIR 2003), '''Popular music retrieval by detecting mood''']
  
== Moderators ==
+
# [http://ismir2003.ismir.net/papers/Li.PDF Li and Ogihara (ISMIR 2003), '''Detecting emotion in music''']
* J. Stephen Downie (IMIRSEL, University of Illinois, USA) - [mailto:jdownie@uiuc.edu]
 
* Xiao Hu (IMIRSEL, University of Illinois, USA) -[mailto:xiaohu@uiuc.edu]
 

Revision as of 00:57, 29 January 2007

Introduction

In musicpsychology and music education, emotion component of music has been regonized as the most strongly associated with music expressivity.(e.g. Juslin et al 2006#Related Papers). Music information behavior studies (e.g.Cunningham, Jones and Jones 2004, Cunningham, Vignoli 2004, Bainbridge and Falconer 2006 #Related Papers) have also identified music mood/ emotion as an important criterion used by people in music seeking and organiztion. Several experiments have been conducted in the MIR community to classify music by mood (e.g. Lu, Liu and Zhang 2006, Pohle, Pampalk, and Widmer 2005, Mandel, Poliner and Ellis 2006, Feng, Zhuang and Pan 2003#Related Papers). Please note: the MIR community tends to use the word "mood" while musicpsychologists like to use "emotion". We follow the MIR tradition to use "mood" thereafter.

However, evaluation of music mood classification is difficult as music mood is a very subjective notion. Each aforementioned experiement used different mood categories and different dataset, making comparison on previous work a virtually impossible mission. A contest on music mood classification in MIREX will help build the first ever well recognized mood taxomony, a scalable test set and precious ground truth.

This is the first time in MIREX to attempt a music mood classification evaluation. There are many issues involved in this evaluation task, and let us start discuss them on wiki. If needed, we will set up a mailing list devoting to the discussion.

Mood taxonomy

There are mainly two approaches to set up a music mood taxomony:

1. Starts from theories in music perception: (Lu, Liu and Zhang 2006) adopted Thayer's two dimensional energy-stress mood model which divides music mood into four clusters:

  1. Contentment (low energy, low stress)
  2. Depression (low energy, high stress)
  3. Exuberance (high energy, low stress)
  4. Anxious/Frantic (high energy, high stress)

The authors argue "these four clusters almost cover the basic mood response to music and they are usually in the most highly rated emotions as discovered in [music perception and education]". Also, they pointed out there are alternative adjectives that are equivalent to the above four, and those may be better in describing music mood, such as Tenderness, Sadness, Happiness, and Fear/Anger.

(Feng, Zhuang and Pan 2003) seemed to follow this approach and used four categories, happiness, sadness, anger and fear.

(Li and Ogihara 2003) followed another mood model called Farnsworth model, and gave binary label (existence versus non-existence) based on the ten adjective groups in Farnsworth. We agree with the authors that this many labels made the task too difficult, thus the performance was very low.

2. Derive from practice of music information service: Popular music websites and software (e.g. AllMusicGuild [[1]]], MoodLogic [[2]]]) seek to exploit emotional aspects of music and provide mood labels for albums or sound tracks. (Mandel, Poliner and Ellis 2006) used mood labels on AMG that included 50 or more songs which results in 100 mood labels. An advantage of this approach is the ground truth is already provided by those websites. However, such a large number of categories seems overwhelming for the evaluation and post-evaluation analysis. It would be ideal if we could come up a method to cluster those labels into a smaller number of categories (perhaps under the direction of musicpsychological theories). In this way, we can leverage the available labels and keep the contest in a managable scale.


Ground Truth

Corresponding to how the mood taxonomy is going to be set up, there are two ways to obtain ground truth for evaluation purpose.

1. human judgment: we can elicit subjective judgments by human evaluators by using an online application comparable to IMIRSEL's Evalutron 6000. Details need to be further discussed. To start, we propose human evaluators to choose one mood label from a set for each music piece. Each piece may get at least 3 eyeballs and a label with at least 2 votes will be assigned to this piece as ground truth. Of cause there will be disagreement and depending on the number of available categories, votes to some pieces may be too scattered and thus invalidate judgments on those pieces.

2. collect labels from popular music websites. A problem is AMG only provide labels for albums. And even if labels for tracks are available, they might not be available for the pieces that we own.

3. obtain datasets used in existing research. Those datasets have been labeled by individual reseachers.

Data Collection

So far researchers have been using personal collections or those owned by their institutions. It would be the best if we could reuse their collections because ground truth is ready. Otherwise, the IMIRSEL lab has USPOP and USCRAP collections, but will need to obtain ground truth labels.

Data format

Many existing work stereo music to a mono signal with a sampling frequency of 22050 Hz with 16 bits precision. We will keep this format in this contest.

Training set

It is unlikely that the contest would distribute any training dataset. Participants please feel free to use any data other than the contest collection to tune algorithms. The evaluation will use n-fold cross validation on the contest collection to eliminate any bias on data splitting.

Evaluation

Like the genre classification task before, We will use accuracy and standard deviation of results (in the event of uneven class sizes both this will be normalised according to class size).

Test significance of differences in error rates of each system at each iteration using McNemar's test, mean average and standard deviation of P-values.

Important Dates

TBD

File Format

TBD

Submission Format

TBD

Challenging Issues

  1. Mood changeable pieces: some pieces may start from one mood but end up with another one. For each of those,we can either label it with the most salient mood or just let inconsistent judgments rule it out.
  1. Multiple label classification: it is possible that one piece can have two or more correct mood labels, but we strongly suggest to start with a less confusing contest and leave the challenge to future MIREXs.

Opt-in survey of Audio music mood classification researchers

In this section we would like to take a brief 'opt-in' survey of researchers actively working in this field. Please feel free to add yourself to the list (or email your details to the moderators listed below).

Moderators

  • J. Stephen Downie (IMIRSEL, University of Illinois, USA) - [3]
  • Xiao Hu (IMIRSEL, University of Illinois, USA) -[4]

Related Papers

  1. [Juslin, Karlsson Lindstrom, Friberg and Schoonderwaldt (2006), Play It Again With Feeling: Computer Feedback in Musical Communication of Emotions]. In Journal of Experimental Psychology: Applied 2006, Vol.12, No.2, 79-95.
  1. Vignoli (ISMIR 2004) Digital Music Interaction Concepts: A User Study
  1. Hilliges, Holzer, Kl├╝ber and Butz (2006), AudioRadar: A metaphorical visualization for the navigation of large music collections.In Proceedings of the International Symposium on Smart Graphics 2006, Vancouver Canada. summarized implicit problems in traditional genre/artist based music organization.
  1. Cunningham, Jones and Jones (ISMIR 2004), Organizing Digital Music For Use: An Examiniation of Personal Music Collections.
  1. Cunningham, Bainbridge and Falconer (ISMIR 2006), More of an Art than a Science': Supporting the Creation of Playlists and Mixes.
  1. [Lu, Liu and Zhang (2006), Automatic Mood Detection and Tracking of Music Audio Signals]. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 1, JANUARY 2006 An earlier version of this paper appeared in [ISMIR 2003 http://ismir2003.ismir.net/papers/Liu.PDF]
  1. Pohle, Pampalk, and Widmer (CBMI 2005), Evaluation of Frequently Used Audio Features for Classification of Music into Perceptual Categories.it separates "mood" and "emotion" as two classifcation dimensions, which are mostly combined in other studies.
  1. Mandel, Poliner and Ellis (2006), Support vector machine active learning for music retrieval. Multimedia Systems, Vol.12(1). Aug.2006.
  1. Feng, Zhuang and Pan (SIGIR 2003), Popular music retrieval by detecting mood
  1. Li and Ogihara (ISMIR 2003), Detecting emotion in music