Difference between revisions of "2007:Audio Music Mood Classification"

From MIREX Wiki
(Data Collection)
(Song Pool)
Line 47: Line 47:
  
 
Proposed audio format:  
 
Proposed audio format:  
 +
 
30 second clips, 22.05kHz, mono, 16bit, WAV files  
 
30 second clips, 22.05kHz, mono, 16bit, WAV files  
  

Revision as of 16:08, 19 June 2007

Introduction

In music psychology and music education, emotion component of music has been recognized as the most strongly associated with music expressivity.(e.g. Juslin et al 2006#Related Papers). Music information behavior studies (e.g.Cunningham, Jones and Jones 2004, Cunningham, Vignoli 2004, Bainbridge and Falconer 2006 #Related Papers) have also identified music mood/ emotion as an important criterion used by people in music seeking and organization. Several experiments have been conducted in the MIR community to classify music by mood (e.g. Lu, Liu and Zhang 2006, Pohle, Pampalk, and Widmer 2005, Mandel, Poliner and Ellis 2006, Feng, Zhuang and Pan 2003#Related Papers). Please note: the MIR community tends to use the word "mood" while musicpsychologists like to use "emotion". We follow the MIR tradition to use "mood" thereafter.

However, evaluation of music mood classification is difficult as music mood is a very subjective notion. Each aforementioned experiement used different mood categories and different datasets, making comparison on previous work a virtually impossible mission. A contest on music mood classification in MIREX will help build the first ever community available test set and precious ground truth.

This is the first time in MIREX to attempt a music mood classification evaluation. There are many issues involved in this evaluation task, and let us start discuss them on this wiki. If needed, we will set up a mailing list devoting to the discussion.

Mood Categories

The IMIRSEL has derived a set of 5 mood clusters from the AMG mood repository (Hu & Downie 2007#Related Papers). The mood clusters effectively reduce the diverse mood space into a tangible set of categories, and yet root in the social-cultural context of pop music. Therefore, we propose to use the 5 mood clusters as the categories in this yearΓÇÖs audio mood classification contest. Each of the clusters is a collection of the AMG mood labels which collectively define the cluster:

  • Cluster_1: passionate, rousing, confident,boisterous, rowdy
  • Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured
  • Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding
  • Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry
  • Cluster_5: aggressive, fiery,tense/anxious, intense, volatile,visceral

At this moment, the IMIRSEL and Cyril Laurier at the Music Technology Group of Barcelona have manually validated the mood clusters and exemplar songs in each cluster. Please see #Exemplar Songs in Each Category for details.


Previous Discussion on Mood Taxonomy

Discussion on Mood Categories

Exemplar Songs in Each Category

Exemplar songs for each mood cluster are manually selected by multiple human assessors. The purpose is to further clarify the perceptual identities of the mood clusters.

There are 190 candidate songs in the intersection of AMG mood repository and the USPOP collection in IMIRSEL, and each of these songs has only one unanimous mood cluster label assigned by AMG editors. The mood labels by AMG editors are important benchmark which can help us reach cross-listener consistency on such a subjective task. So far, 6 human assessors have listened to the 190 songs and assigned cluster labels to them. 49 songs are unanimously labeled by the 6 human assessors and AMG, and another 42 songs are unanimously labeled by the 6 human assessors. The song titles are listed in exemplar songs.

The advantages of the exemplar songs are two folds: 1. they will help people better understand what kind of mood each cluster refers to; 2. they can possibly be taken as training data for the algorithms (see the section of #Training Set).

Note: Lyrics issue: when labeling the songs, the human assessors were asked to ignore lyrics. As this is a contest focuses on music audio, lyrics should not be taken into consideration.


Previous Discussion on Ground Truth

Training Set

Some potential participants request a training set to be provided, and the exemplar songs described above can serve as (seeds of) training data. However, due to copyright issue, we cannot distribute the audio files of the exemplar songs. There are two ways to address this issue:

1) the IMIRSEL announces the bibliographic information of the exemplar songs (e.g. title, artist) and the participants will locate the audio files by themselves and/or possibly find other audio clips guided by the exemplar songs (i.e. seeds) for training purposes. Participants train their models in house and submit trained models.

2) the IMIRSEL announces the bibliographic information of the exemplar songs (e.g. title, artist) for helping participants understand the mood categories. The IMIRSEL prepares a certain number (e.g. 30) of short audio clips (e.g. 30 seconds) for each mood clusters. Participating algorithms/ models are trained and tested within IMIRSEL.


Song Pool

The pool of songs to be classified is from the same collection of the exemplar songs. Currently, the contest organizers are seeking additional songs in various genres other than Pop music to supplement the USPOP collection. Having songs in a variety of genres in each mood cluster will make the contest harder and more interesting. However, due to time and resource constraint, the song pool may still end up being dominated by pop music, which hopefully is still of interests to most participants.

Proposed audio format:

30 second clips, 22.05kHz, mono, 16bit, WAV files

We will randomly select a certain number of songs from the USPOP and other (to-be-decided) collections as the audio pool. This number should make the contest interesting enough, but not too hard. And the songs need to cover all 5 mood clusters.

Evaluation

Like the genre classification task before, we will use accuracy and its standard deviation as evaluation measures (in the cases of uneven class sizes, both measures will be normalized according to class size).

Test significance of differences in error rates of each system using McNemar's test, mean average and standard deviation of P-values.

Important Dates

TBD

File Format

TBD

Submission Format

TBD

Challenging Issues

  1. Mood changeable pieces: some pieces may start from one mood but end up with another one. For each of those,we can either label it with the most salient mood or just inconsistent judgments rule it out.
  1. Multiple label classification: it is possible that one piece can have two or more correct mood labels, but as a start, we strongly suggest to hold a less confusing contest and leave the challenge to future MIREXs.

Participants

If you think there is a slight chance that you might consider participating, please add your name and email address here.

  • Kris West (kw at cmp dot uea dot ac dot uk)
  • Cyril Laurier (claurier at iua dot upf dot edu)
  • Elias Pampalk (firstname.lastname@gmail.com)
  • Yuriy Molchanyuk (molchanyuk at onu.edu.ua)
  • Shigeki Sagayama (sagayama at hil dot t.u-tokyo.ac.jp)
  • Guillaume Nargeot (killy971 at gmail dot com)
  • Zhongzhe Xiao (zhongzhe dot xiao at ec-lyon dot fr)
  • Kyogu Lee (kglee at ccrma.stanford.edu)
  • Vitor Soares (firstname.lastname@clustermedialabs.com)
  • Wai Cheung (wlche1@infotech.monash.edu.au)

Moderators

  • J. Stephen Downie (IMIRSEL, University of Illinois, USA) - [1]
  • Xiao Hu (IMIRSEL, University of Illinois, USA) -[2]
  • Cyril Laurier (Music Technology Group, Barcelona, Spain) -[3]

Related Papers

  1. Dietterich, T. (1997). Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10(7), 1895-1924.
  2. Hu, Xiao and J. Stephen Downie (2007). Exploring mood metadata: Relationships with genre, artist and usage metadata. Accepted in the Eighth International Conference on Music Information Retrieval (ISMIR 2007),Vienna, September 23-27, 2007.
  3. Juslin, P.N., Karlsson, J., Lindstr├╢m E., Friberg, A. and Schoonderwaldt, E(2006), Play It Again With Feeling: Computer Feedback in Musical Communication of Emotions. In Journal of Experimental Psychology: Applied 2006, Vol.12, No.2, 79-95.
  4. Vignoli (ISMIR 2004) Digital Music Interaction Concepts: A User Study
  5. Cunningham, Jones and Jones (ISMIR 2004) Organizing Digital Music For Use: An Examiniation of Personal Music Collections.
  6. Cunningham, Bainbridge and Falconer (ISMIR 2006) More of an Art than a Science': Supporting the Creation of Playlists and Mixes.
  7. Lu, Liu and Zhang (2006), Automatic Mood Detection and Tracking of Music Audio Signals. IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 14, NO. 1, JANUARY 2006
    Part of this paper appeared in ISMIR 2003 http://ismir2003.ismir.net/papers/Liu.PDF
  8. Pohle, Pampalk, and Widmer (CBMI 2005) Evaluation of Frequently Used Audio Features for Classification of Music into Perceptual Categories.
    It separates "mood" and "emotion" as two classifcation dimensions, which are mostly combined in other studies.
  9. Mandel, Poliner and Ellis (2006) Support vector machine active learning for music retrieval. Multimedia Systems, Vol.12(1). Aug.2006.
  10. Feng, Zhuang and Pan (SIGIR 2003) Popular music retrieval by detecting mood
  11. Li and Ogihara (ISMIR 2003) Detecting emotion in music
  12. Hilliges, Holzer, Kl├╝ber and Butz (2006) AudioRadar: A metaphorical visualization for the navigation of large music collections.In Proceedings of the International Symposium on Smart Graphics 2006, Vancouver Canada.
    It summarized implicit problems in traditional genre/artist based music organization.
  13. Juslin, P. N., & Laukka, P. (2004). Expression, perception, and induction of musical emotions: A review and a questionnaire study of everyday listening. Journal of New Music Research, 33(3), 217-238.