2007:Evalutron6000 Walkthrough For Audio Mood Classification

Special Comments about Validating the Audio Mood Classification Candidates

Task Description: Ignoring Lyrics

The goal of Audio Music Mood Classification is to evaluate how well various algorithms can classify music pieces according to the mood they express. You will find in the candidate files a variety of different genres, melodies, instrumentations and tempos. We need you to look at the whole effect of the music pieces. In case of vocal music, please ignore the lyrics as it is unreseasonable to expect algorithms to tell the mood by understanding the lyrics.

Validating Expectations and "Reasonableness"

For each candidate, we need you to assign a mood cluster label. The 5 mood clusters used in this contest are derived from the AMG (allmusicguide.com) mood labels which collectively define the clusters (see the contest wiki Audio Music Mood Classification for more details):

Cluster_1: passionate, rousing, confident, boisterous, rowdy
Cluster_2: rollicking, cheerful, fun, sweet, amiable/good natured
Cluster_3: literate, poignant, wistful, bittersweet, autumnal, brooding
Cluster_4: humorous, silly, campy, quirky, whimsical, witty, wry
Cluster_5: aggressive, fiery,tense/anxious, intense, volatile, visceral

If you don't think a candidate fits in any mood cluster, you may choose "Other" for it. Please bear in mind that your reasonable judgments are very important -- only those candidates on which majority of human judges reach agreements will be used in the contest.

Understanding the Mood Clusters

We urge you to listen to the exemplar clips in each mood cluster located in the Instruction page of the Evalutron before you start and apply a level of "reasonableness" to your judgments. And whenever you feel in need, please go back to this page and refresh your ears.

The exemplar clips are manually selected by 6 human assessors from a set of songs in the USPOP collection. Their purpose is to help clarify the perceptual identities of the mood clusters. Bibliographic information of more exemplar songs can be find in exemplar songs page.

Each of the candidate clip to be evaluated is pre-labeled with one of mood clusters, according to its metadata provided by APM (apmmusic.com). Depending on the quality of such metadata and the method we used to select the candidates, the pre-labels may or may not correct. This is why we need your judgments to validate and/or correct the labels. However, the pre-labels do provide a sense of identification of the clusters, and thus the candidates are organized by their pre-labels in the Evalutron. While you should keep suspicion towards the pre-labels, we encourage you to navigate and listen to clips in different clusters from time to time, which can help remind your ears the nuance between clusters.

Evalutron 6000 design and use details follow below and should clarify for you the use of the system.

Coming and Going: Avoiding Grader Fatigue

Listening to, and then labeling, hundreds of audio files is very tiring. We have built into the back-end of the Evalutron 6000 system a rather robust database system that records in near-real-time your grading scores. These scores are saved along with information about which candidates you have yet to review. All this information is stored in association with your personal sign-in ID. This means you can break up your grading over several days at times convenient and productive to you. In fact, we recommend that you not try to tackle your "assignment" in one big chunk as fresh ears are happy ears and happy ears make for better evaluations.

Basic Technical Requirements

In order to use the Evalutron 6000 you will need a modern web browser (e.g., Firefox, Mozilla, Safari, Internet Explorer ) that supports JavaScript (ECMAScript) and Cookies. Evalutron has been tested on Windows XP, MacOS X, RedHat Linux, and Solaris. We have found that the combination of Firefox and Flash Player to be most stable across platforms. Despite our best efforts, the use of Internet Explorer continues to generate seemingly random errors and we ask that you avoid using Internet Explorer. A decent amount of bandwidth is also advisable to minimize download times (i.e., DSL, cable modem and better).

A screen resolution of 1024 X 768 the minimal advised. Higher resolutions appear to work well too. We have noted some odd behaviour if you adjust your resolution in mid-session, so it is probably best not to do this. Adjusting font size via you browser can sometimes make things look a bit tidier and does not seem to have adverse effects.

In general, if you are having trouble, please try accessing the Evalutron 6000 using another machine/platform/browser combination. If you are still having difficulty, please contact

[1].

Getting Started

When first visiting the Evalutron 6000 homepage, you will see a page similar to this (Fig. 1).

File:E6kamc home page scaled.PNG

Figure 1. Evalutron 6000 start page.

Four Important Issues Before Proceeding

This system is NOT open to the general public. Due to the legalities imposed upon us by the University of Illinois and the US Federal Government, we are allowed to accept only those graders that have a stake in Music Information Retrieval and Music Digital Library research. Acceptable persons include MIREX participants, subscribers to the music-ir[AT]ircam.fr list, ISMIR attendees, computational musicologists, music technology researchers, etc. If you are in doubt about your acceptability, please contact Prof. Downie at [2].
Do NOT create an account for yourself if you are merely curious to see what the Evalutron 6000 is set up to do. The account creation process scientifically distributes Query and Candidate sets to each account holder. The act of creating a "curiousity account" seriously disrupts the adminstration of the results data we are collecting.
IF YOU ARE REALLY CURIOUS: We have created a small "sandbox" version of the AudioSim Evalutron with fake data at: https://music-ir.org/eval6000/sandbox. You still must go through all the official procedures but your scores wil not affect anything as the data is "made up".
Please do NOT create multiple accounts for yourself for this particular evaluation task. If you are also grading the Audio Music Similarity and Retrieval (AudioSim) task or Symbolic Melody Similarity (SMS) task, you will be creating another task-specific account for AudioSim or SMS in the AudioSim or SMS space. The system is designed to track individuals based upon unique sign-in IDs. The creation of multiple accounts for one person causes the system to improperly distribute the assignments to your fellow graders. If you have difficulties, such as forgetting your sign-in password, please contact us at [3] or try the "Forgot Your Password?" link found at the top of the homepage (Fig 1).

Registration

First you must register a new account. Click on the "Register" link (listed under Step 2) on the page to create an account.

The registration page is fairly straightforward (Fig. 2). Required fields are marked in blue with asterisks. You can create any username and password you wish. Passwords must be at least 6 characters long and are case-sensitive. Before completing the registration, you must read and agree to the terms of the Informed Consent document. The evaluation, because it is using human judgments of music mood, is considered a human-subjects research project and the Evalutron is basically a survey instrument. To indicate your consent to participate in the evaluation, check the "I Agree" checkbox below the informed consent document.

If you have questions about your rights as a subject in this research project, you should contact the UIUC IRB office (http://www.irb.uiuc.edu) for more information. The research protocol for this project is IRB# 07066.

File:E6ksms registration page scaled.png

Figure 2. Evalutron 6000 registration page.

Listening to Exemplar Clips

After completing the Registration, the system will ask you to sign-in with your newly created username and password. After signing-in for the first time, you will be asked to listen to exemplar clips (Fig. 3). To start listening, click on the "HERE" on this page or "Exemplar Audio Clips" link on the homepage.

File:E6kamc login page scaled.png

Figure 3. Initial login page.

The links lead you to the Instructions page (Fig. 4). You will be presented with 3 exemplar clips for each cluster. You must finish listening to all the exemplar clips before the system can assign you candidates for your evaluation.

File:E6kamc instruction page scaled.png

Figure 4. Instructions page.

When you finish listening to all the exemplar clips, you can proceed to the actual evaluation by clicking the "PLEASE START EVALUATION with Flash MP3 Player" link. At this time, a set of candidate clips across all the 5 clusters will be assigned to you and for you to finish before the deadline of Aug. 20.

If you haven't listened to all the examples, the system will keep directing you back to the Instructions page and tell you how many examples you have yet to listen to (Fig. 5). This is to make sure you have listend to all the examples before starting evaluation because the exemplar songs are important to help reaching cross-validator consistency.

File:E6kamc example check scaled.png

Figure 5. Example check page.

After you listened to the examples, you have a chance to decide not to take part in the evaluation. Although we regret losing a valuable contributor, we will appreciate you leave without proceeding to the next stage, otherwise the system will take you into consideration when distributing candidates. Although you may stop and leave at any time during the evaluation, we will appreciate you try to finish all (or most of) your assignments so as to make sure that each valid human judge shares equal working load.

Evaluation Pages: Mood Clusters and Candidate Lists

Once you have got your assignments, you'll see a mood cluster evaluation page (Fig. 6). On this page, you will see the mood cluster label and the list of candidate players. The candidates are drawn from the APM collection and are 30 seconds long.

File:E6kamc evaluation page scaled.png

Figure 6. Sample evaluation page.

Please do not be puzzled if your first assignment is not Mood Cluster #1 or if your Candidate List has candidates in a seemingly random order: This is deliberate! The Evalutron 6000 is designed to build customized randomized lists and orderings for each grader to minimize fatigue and ordering effects. We encourage you to jump around different Mood Clusters in the system to do your evaluations, so as to reinforce the distinctions between Mood Clusters. The "Mood Cluster #" buttons visible in Figure 6 are designed to guide you through differnt Mood Cluster Lists.

Please note that the list of candidates is longer than the visible page (i.e., there are more candidates than may be immediately visible on the page). Please scroll to the bottom of the candidate list to make sure you have evaluated each song.

The procedure for listening to a candidate is the same as listening to a query -- click the "Play Candidate" buttons to load the clip into the player and listen to it (Fig. 4). Once you have a feeling for how similar the candidate is to the query, click the "Not Similar," "Somewhat Similar" or "Very Similar" radio buttons to the right of the candidate. Note: you can continue to replay the query during candidate playback.

File:E6ksms select category.png

Figure 5. Close up image of Broad Category selection buttons.

Depending on how you graded the candidate, you should see the candidate box react to your grade, indicating that the vote has been logged in the database (Fig. 6). If you indicated the song was "Somewhat Similar" the box will turn yellow and state "SAVED". If you indicated the song was "Very Similar" the box will turn green, "Not Similar" the box will turn red, and state "SAVED".

File:E6ksms select category saved.png

Figure 6. Close up image of Broad Category selection buttons with "Somewhat Similar" selected and "SAVED" automatically.

We also need each grader to assign a fine-grained score for the similarity of the candidate to the query on a scale of 0-10 (Fig. 7). More information about this process below.

File:E6ksms select score.png

Figure 7. Close up image of the Fine Score selection scale.

The system will automatically record the score when you let go of the scaler (Fig. 8).

File:E6ksms select score saved.png

Figure 8. Close up image of a Fine Score selection scale with a random score "SAVED".

You can also manually enter the score in the box, but in this case you MUST click the button labeled "Click to Save" to record your score (Fig. 9). After the score is saved, the scaler will automatically move to the correct position and the label on the button will change to "Saved" (Fig. 8).

File:E6ksms select score before save.PNG

Figure 9. Close up image of a Fine Score selection scale before "SAVED".

You can always change your evaluation for any candidate by toggling the radio buttons. You can also go back and adjust the Fine Score selection scale.

When you have completed evaluating all of the candidates for a single query, you can click on the "Next Query" button at the top right of the evaluation page. This button will load a new Query and associated Candidate List for you to evaluate. Using the "My Assignment" tab, you can check the list of queries and candidates that were assigned to you, and also see how far you are in completing your evaluation task (Fig. 10). You must assign BOTH a Broad Category and a Fine Score for each candidate before the system registers it as a "completed" candidate.

File:E6ksms my assignment scaled.png

Figure 10. My Assignment page.

You can also see a list of all of the queries you have been assigned at the bottom of each Candidate List page. You can return to any query by clicking on the button for that query here (Fig. 11). You can re-evaluate any candidate for any query at any time, up to the closing of the evaluation system.

File:Eval6 queries detail.png

Figure 11. Close up image of Query List buttons which allow for revisiting of completed query/candidate sets.

2007:Evalutron6000 Walkthrough For Audio Mood Classification

Contents

Special Comments about Validating the Audio Mood Classification Candidates

Task Description: Ignoring Lyrics

Validating Expectations and "Reasonableness"

Understanding the Mood Clusters

Coming and Going: Avoiding Grader Fatigue

Basic Technical Requirements

Getting Started

Four Important Issues Before Proceeding

Registration

Listening to Exemplar Clips

Evaluation Pages: Mood Clusters and Candidate Lists

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools