Difference between revisions of "2010:Evalutron6000 Walkthrough"

From MIREX Wiki
(Welcome to the Evalutron 6000)
m (Welcome to the Evalutron 6000)
 
(5 intermediate revisions by 2 users not shown)
Line 5: Line 5:
  
 
# This system is NOT open to the general public. Due to the legalities imposed upon us by the University of Illinois and the US Federal Government, we are allowed to accept only those graders that have a stake in Music Information Retrieval and Music Digital Library research. Acceptable persons include MIREX participants, subscribers to the music-ir[AT]ircam.fr list, ISMIR attendees, computational musicologists, music technology researchers, etc. If you are in doubt about your acceptability, please contact Prof. Downie at [mailto:jdownie@illinois.edu].
 
# This system is NOT open to the general public. Due to the legalities imposed upon us by the University of Illinois and the US Federal Government, we are allowed to accept only those graders that have a stake in Music Information Retrieval and Music Digital Library research. Acceptable persons include MIREX participants, subscribers to the music-ir[AT]ircam.fr list, ISMIR attendees, computational musicologists, music technology researchers, etc. If you are in doubt about your acceptability, please contact Prof. Downie at [mailto:jdownie@illinois.edu].
# Do NOT click "Get Assignments" link if you are merely curious to see what the Evalutron 6000 is set up to do. The assignment distribution process scientifically distributes Query and Candidate sets to each account asking for assignments.
+
# Do NOT click "Get Assignments" link if you are merely curious to see what the Evalutron 6000 is set up to do. The assignment distribution process scientifically distributes Query and Candidate sets to each account asking for assignments. To have a look at the Evalutron evaluation page, please click the "Try E6K Demo" link on the left-side of the page.
 
# The Evalutron is set up for both Audio Music Similarity (AMS) task and Symbolic Melodic Similarity (SMS) task. Please do NOT click "Get Assignments" link under the task you do not intend to participate. Otherwise, you will get assignments for that task which will prevent other graders from getting the assignments you get.
 
# The Evalutron is set up for both Audio Music Similarity (AMS) task and Symbolic Melodic Similarity (SMS) task. Please do NOT click "Get Assignments" link under the task you do not intend to participate. Otherwise, you will get assignments for that task which will prevent other graders from getting the assignments you get.
 
# Please do NOT create multiple accounts for yourself Evalutron. The system is designed to track individuals based upon unique sign-in IDs. The creation of multiple accounts for one person causes the system to improperly distribute the assignments to your fellow graders. If you have difficulties, such as forgetting your sign-in password, please contact us at [mailto:mirproject@lists.lis.illinois.edu] or try the "Forgot Password" link found at the left side of the homepage (Fig 1).
 
# Please do NOT create multiple accounts for yourself Evalutron. The system is designed to track individuals based upon unique sign-in IDs. The creation of multiple accounts for one person causes the system to improperly distribute the assignments to your fellow graders. If you have difficulties, such as forgetting your sign-in password, please contact us at [mailto:mirproject@lists.lis.illinois.edu] or try the "Forgot Password" link found at the left side of the homepage (Fig 1).
Line 11: Line 11:
 
==Welcome to the Evalutron 6000==
 
==Welcome to the Evalutron 6000==
  
In order to use the Evalutron 6000 you will need to modern web browser (e.g., Firefox, Internet Explorer, Safari, Mozilla, etc) that supports JavaScript, Flash, and Cookies. Evalutron has been tested on Windows XP, Windows 7, MacOS X, and Ubuntu Linux. If you are using a different platform and having trouble, please try accessing Evalutron 6000 from another machine. If you are still having difficulty, contact us at [mailto:mirex@imirsel.org mirex@imirsel.org].
+
In order to use the Evalutron 6000 you will need a modern web browser (e.g., Firefox, Internet Explorer, Safari, Mozilla, etc) that supports JavaScript, Flash, and Cookies. Evalutron has been tested on Windows XP, Windows 7, MacOS X, and Ubuntu Linux. If you are using a different platform and having trouble, please try accessing Evalutron 6000 from another machine. If you are still having difficulty, contact us at [mailto:mirex@imirsel.org mirex@imirsel.org].
  
 
When first visiting the Evalutron 6000 homepage, you will see its home page (Fig. 1).  
 
When first visiting the Evalutron 6000 homepage, you will see its home page (Fig. 1).  
Line 21: Line 21:
 
==Register to the Evalutron 6000==
 
==Register to the Evalutron 6000==
  
If you have an account with the submission system, then you can use the same account for Evalutron 6000. Otherwise, you must register a new account. Click on the "Register" link on the left side of the page to create an account.
+
If you have an account with the MIREX submission system, then you can use the same account for Evalutron 6000. Otherwise, you must register a new account. Click on the "Register" link on the left side of the page to create an account.
  
 
The registration page is fairly straightforward (Fig. 2). All fields are required. You can create any username and password you wish. Username must be at least 5 characters long and passwords must be at least 8 characters long and are case-sensitive. Before completing the registration, you will receive an email with an activation link. Clicking that link will complete your registration.  
 
The registration page is fairly straightforward (Fig. 2). All fields are required. You can create any username and password you wish. Username must be at least 5 characters long and passwords must be at least 8 characters long and are case-sensitive. Before completing the registration, you will receive an email with an activation link. Clicking that link will complete your registration.  
Line 31: Line 31:
 
==Agree to the Informed Consent ==
 
==Agree to the Informed Consent ==
  
Before starting evaluation, every evaluator must read and agree to the terms of the Informed Consent document. Otherwise, you will be redirect to the informed consent page when you try to get evaluation assignments. Clicking the "Informed Consent" link on the left-side menu will show the form (Fig. 3). The evaluation, because it is using human judgments of similarity, is considered a human-subjects research project and the Evaluatron is basically a survey instrument. To indicate your consent to participate in the evaluation, scroll down the page and check the "I Agree" checkbox below the informed consent document.
+
Before starting evaluation, every evaluator must read and agree to the terms of the Informed Consent document. Otherwise, you will be redirect to the informed consent page when you try to get evaluation assignments. Clicking the "Informed Consent" link on the left-side menu will show the form (Fig. 3). The evaluation, because it is using human judgments of similarity, is considered a human-subjects research project and the Evalutron is a survey instrument. To indicate your consent to participate in the evaluation, scroll down the page and check the checkbox below the informed consent document.
  
 
If you have questions about your rights as a subject in this research project, you should contact the UIUC IRB office (http://www.irb.uiuc.edu) for more information. The research protocol for this project is IRB# 07066.
 
If you have questions about your rights as a subject in this research project, you should contact the UIUC IRB office (http://www.irb.uiuc.edu) for more information. The research protocol for this project is IRB# 07066.
Line 50: Line 50:
  
 
==Evaluate==
 
==Evaluate==
Clicking the "Evaluate Query" button under a query will lead you to the evaluation page of that query (Fig. 5). This page consists of instructions on the top and a list of query-candidate pairs. Please read the instructions carefully. The query in each of the query-candidate pairs is the same, and it is aligned with each candidate so that you can replay it at any time when you evaluate the candidate. For Audio Music Similarity (AMS) task, each query and candidate is 30 second long. For Symbolic Melody Similarity (SMS) task, the length varies. Clicking on the player button besides each query or candidate will load the song into the player and begin playing. Clicking the player button again will pause it. We recommend you listen to the entire query at least once before evaluating any candidate files.
+
Clicking the "Evaluate Query" button under a query will lead you to the evaluation page of that query (Fig. 5). This page consists of instructions on the top and a list of query-candidate pairs. Please read the instructions carefully. The query for all candidates on a single page is the same, it is included with each candidate so that you can easily compare it to each candidate. For the Audio Music Similarity (AMS) task, each query and candidate is 30 second long. For Symbolic Melody Similarity (SMS) task, the length is about 10 seconds. Clicking on the player button besides each query or candidate will load the song into the player and begin playing. Clicking the player button again will pause it. We recommend you listen to the entire query at least once before evaluating any candidate files.
  
 
[[Image:2010_e6k_eval_page.png|border]]
 
[[Image:2010_e6k_eval_page.png|border]]
Line 58: Line 58:
 
Please note there are more candidates than may be immediately visible on the page. Please scroll to the bottom of the candidate list to make sure you've evaluated each song.
 
Please note there are more candidates than may be immediately visible on the page. Please scroll to the bottom of the candidate list to make sure you've evaluated each song.
  
Once you have a feeling for whether or not the candidate is similar to the query, click the "Not Similar", "Somewhat Similar" or "Very Similar" radio buttons to the right of the query-candidate pair. Each grader will also need to assign a fine-grained score for the similarity of the candidate to the query on a scale of 0-100, with 0 indicating completely different and 100 perfectly similar or identical). To input the fine score, you need to move the scaler, and once you let go of the scaler the system will automatically record the score.
+
Once you have a feeling for whether or not the candidate is similar to the query, click the "Not Similar", "Somewhat Similar" or "Very Similar" radio buttons to the right of the query-candidate pair. Each grader will also need to assign a fine-grained score for the similarity of the candidate to the query on a scale of 0-100, with 0 indicating completely different and 100 perfectly similar or identical). To input the fine score, you need to move the slider, and once you let go of the slider the system will automatically record the score.
 
   
 
   
 
[[Image:2010_e6k_qcq_detail.png|border]]
 
[[Image:2010_e6k_qcq_detail.png|border]]
Line 64: Line 64:
 
'''Figure 6. Close up image of a query-candidate pair and evaluation buttons.'''
 
'''Figure 6. Close up image of a query-candidate pair and evaluation buttons.'''
  
Only after you input BOTH the broad category and the fine score can a query-candidate pair be marked as green, indicating you have complete evaluating this query-candidate pair.
+
Only after you input BOTH the broad category and the fine score can a query-candidate pair be marked as green, indicating you have completed evaluating this query-candidate pair.
  
 
[[Image:2010_e6k_qcq_done_ detail.png|border]]
 
[[Image:2010_e6k_qcq_done_ detail.png|border]]

Latest revision as of 09:13, 17 July 2010

UPDATE 2010

The 2010 Evalutron has a new look and implementation. It is easier to use than the original implementation. However, it is necessary to read through this document before you start using the system. Enjoy!

Four Important Issues Before Proceeding

  1. This system is NOT open to the general public. Due to the legalities imposed upon us by the University of Illinois and the US Federal Government, we are allowed to accept only those graders that have a stake in Music Information Retrieval and Music Digital Library research. Acceptable persons include MIREX participants, subscribers to the music-ir[AT]ircam.fr list, ISMIR attendees, computational musicologists, music technology researchers, etc. If you are in doubt about your acceptability, please contact Prof. Downie at [1].
  2. Do NOT click "Get Assignments" link if you are merely curious to see what the Evalutron 6000 is set up to do. The assignment distribution process scientifically distributes Query and Candidate sets to each account asking for assignments. To have a look at the Evalutron evaluation page, please click the "Try E6K Demo" link on the left-side of the page.
  3. The Evalutron is set up for both Audio Music Similarity (AMS) task and Symbolic Melodic Similarity (SMS) task. Please do NOT click "Get Assignments" link under the task you do not intend to participate. Otherwise, you will get assignments for that task which will prevent other graders from getting the assignments you get.
  4. Please do NOT create multiple accounts for yourself Evalutron. The system is designed to track individuals based upon unique sign-in IDs. The creation of multiple accounts for one person causes the system to improperly distribute the assignments to your fellow graders. If you have difficulties, such as forgetting your sign-in password, please contact us at [2] or try the "Forgot Password" link found at the left side of the homepage (Fig 1).

Welcome to the Evalutron 6000

In order to use the Evalutron 6000 you will need a modern web browser (e.g., Firefox, Internet Explorer, Safari, Mozilla, etc) that supports JavaScript, Flash, and Cookies. Evalutron has been tested on Windows XP, Windows 7, MacOS X, and Ubuntu Linux. If you are using a different platform and having trouble, please try accessing Evalutron 6000 from another machine. If you are still having difficulty, contact us at mirex@imirsel.org.

When first visiting the Evalutron 6000 homepage, you will see its home page (Fig. 1).

2010 e6k home.png

Figure 1. Evalutron 6000 home page.

Register to the Evalutron 6000

If you have an account with the MIREX submission system, then you can use the same account for Evalutron 6000. Otherwise, you must register a new account. Click on the "Register" link on the left side of the page to create an account.

The registration page is fairly straightforward (Fig. 2). All fields are required. You can create any username and password you wish. Username must be at least 5 characters long and passwords must be at least 8 characters long and are case-sensitive. Before completing the registration, you will receive an email with an activation link. Clicking that link will complete your registration.

2010 e6k register.png

Figure 2. Evalutron 6000 registration page.

Agree to the Informed Consent

Before starting evaluation, every evaluator must read and agree to the terms of the Informed Consent document. Otherwise, you will be redirect to the informed consent page when you try to get evaluation assignments. Clicking the "Informed Consent" link on the left-side menu will show the form (Fig. 3). The evaluation, because it is using human judgments of similarity, is considered a human-subjects research project and the Evalutron is a survey instrument. To indicate your consent to participate in the evaluation, scroll down the page and check the checkbox below the informed consent document.

If you have questions about your rights as a subject in this research project, you should contact the UIUC IRB office (http://www.irb.uiuc.edu) for more information. The research protocol for this project is IRB# 07066.

2010 e6k consent.png

Figure 3. Evalutron 6000 informed consent page.

Get Your Assignments

To start the evaluation process, click the "My Assignments" link on the left-side menu. The assignment page is similar to Fig. 4. This page shows all tasks available in the Evalutron 6000, and initially there is no assignment given. Be careful to select the task you intend to participate, and click the "Get Assignment" button under this task. The system will assign you a number of queries to evaluate.

Please note that once assignments are made, they cannot be changed or removed. Therefore, please NEVER click "Get Assignment" for the task you do not intend to participate.

2010 e6k getAssignment.png

Figure 4. Evalutron 6000 get assignment page.

Evaluate

Clicking the "Evaluate Query" button under a query will lead you to the evaluation page of that query (Fig. 5). This page consists of instructions on the top and a list of query-candidate pairs. Please read the instructions carefully. The query for all candidates on a single page is the same, it is included with each candidate so that you can easily compare it to each candidate. For the Audio Music Similarity (AMS) task, each query and candidate is 30 second long. For Symbolic Melody Similarity (SMS) task, the length is about 10 seconds. Clicking on the player button besides each query or candidate will load the song into the player and begin playing. Clicking the player button again will pause it. We recommend you listen to the entire query at least once before evaluating any candidate files.

2010 e6k eval page.png

Figure 5. Sample evaluation page.

Please note there are more candidates than may be immediately visible on the page. Please scroll to the bottom of the candidate list to make sure you've evaluated each song.

Once you have a feeling for whether or not the candidate is similar to the query, click the "Not Similar", "Somewhat Similar" or "Very Similar" radio buttons to the right of the query-candidate pair. Each grader will also need to assign a fine-grained score for the similarity of the candidate to the query on a scale of 0-100, with 0 indicating completely different and 100 perfectly similar or identical). To input the fine score, you need to move the slider, and once you let go of the slider the system will automatically record the score.

2010 e6k qcq detail.png

Figure 6. Close up image of a query-candidate pair and evaluation buttons.

Only after you input BOTH the broad category and the fine score can a query-candidate pair be marked as green, indicating you have completed evaluating this query-candidate pair.

2010 e6k qcq done detail.png

Figure 7. Close up image of a completed query-candidate pair.

You can always change your evaluation for any candidate by toggling the radio buttons and adjust the Fine Score selection scale. Once an evaluation has been made, however, it cannot be retracted, only changed. (i.e., you cannot "unvote").

Work on Another Query

At the bottom of the evaluation page, there is a "View All Assignments" button (Fig. 8). At any time, clicking this button will direct you to your assignment page similar to the one shown in Fig. 4. When you have completed evaluating all of the candidates for this query, you may click this button to continue on another query.

2010 e6k eval page bottom.png

Figure 8. Close up image of "View All Assignments" button which loads the assignment page.

You may also click the "My Assignments" link on the left-side menu to go to the My Assignments page.

On the My Assignments page, you may click another "Evaluate Query" button to load evaluation page (like Fig. 5) of that query.

You will see a list of all of the queries you have evaluated (or are evaluating) at the My Assignments page. You can return to any query by clicking on the "Evaluate Query" underneath it (Fig. 4). You can re-evaluate any candidate for any query at any time, up to the closing of the evaluation system.

Monitor Progress

At any time a grader may monitor his/her progress on the My Assignment page (Fig. 9). Each query has a status bar where the completed portion will be marked as green and the unfinished part red. When all the bars become green, the assignments are all completed.

2010 e6k progress.png

Figure 9. Progress indicated on the My Assignment Page


Grading Expectations and "Reasonableness"

For each query-candidate pair, we need you to assign BOTH a Broad Category score AND a Fine Score (i.e., a numeric grade between 0 and 100, 0 is meant to represent complete different and 100 perfectly similar or identical.). You have the freedom to make whatever associations you desire between a particular Broad Category score and its related Fine Score. In fact, we expect to see variations across evaluators with regard to the relationships between Broad Categories and Fine Scores as this is a normal part of human subjectivity. However, we will be using the two different types of scores to do important inter-related post-Evalutron calculations so, please, do be thoughtful in selecting your Broad Categories and related Fine Scores. What we are really asking here is that you apply a level of "reasonableness" to both your scores and your associations. For example, if you score a candidate in the VERY SIMILAR category, a Fine Score of 21 would not be, by most standards, "reasonable". Same applies at the other extreme. For example, a Broad Category score of NOT SIMILAR should not be associated with a Fine Score of, say, 72 or 84, etc.


Coming and Going: Avoiding Grader Fatigue

Listening to, and then comparing, hundreds of audio files is very tiring. We have built into the back-end of the Evalutron 6000 system a rather robust database system that records in near-real-time your grading scores. These scores are saved along with information about which queries and candidates you have yet to review. All this information is stored in association with your personal sign-in ID. This means you can break up your grading over several days at times convenient and productive to you. In fact, we recommend that you not try to tackle your "assignment" in one big chunk as fresh ears are happy ears and happy ears make for better evaluations.