Oct. 12th @ Empress Crystal Hall, Victoria

Openning

Professor Stephen Downie gave the openning remarks:

We will present certificates for participants. Feel free to grab yours if you are leaving.
Appreciation to IMIRSEL team members.

Overview

This year MIREX is highly successful. We got everything done on time!
Matlab is widely used (universal retrieval language!)
All the evaluation result data files are available on the wiki.

Tasks

We had sub-tasks as tasks are getting matured.
New tasks:
- Audio cover song: 13 different songs, each of which has 11 different versions
- Score following: have ground work done for future years
- QBSH: 48 ground truth melodies. Different versions of queries on the 48 melodies. About 2000 noise songs were selected from Essen dataset. Both audio input and MIDI input are supported.
Please think about new tasks next year.
New evaluations:
- Evalutron 6000 got real-world human judgment.
- Audio onset detection supported multiple parameters.
- Friedman test: It is valuable experience from TREC conferences, the annual contests in Text Retrieval area.

Onset Detection

By tuning the parameters, we can get an optimal setting which is a tradeoff between precision and recall. We need new dataset to see if the tuned parameters are good for onseen data. Question: comparison to last year results? Answer:

Evalutron 6000

Two judgments:

category judgment: Not similar; Similar; Very similar
continurous score: from 0 to 10, allowing one decimal after the decimal point.
the system: using CMS open source software
still have data that we haven't fully processed (other user/evaluator behaviors)
new evaluation on other facets? e.g. mood
suggestions?
appreciate evaluators' volunteer work. Your work makes life beautiful!

Questions: consistency across users? Answer: the data appear to be quite consistency. More analysis can be done on the data which are publicly assessable.

automatic evaluation using available metadata (vs human judgment)

Friedman tests

a variation of chi-square test
Matlab script code is on the wiki
Compare different algorithms
this test is conservative

Future MIREX plans

Discussion

Encourage everyone to participate.
Need data!
Metadata: handy goundtruth
reuse data: for at least two or three years
submission: robustness, platform, scalability, paralellization

Acknowledgement

Mellon Foundation

Kris: call for organizer! Alexandra Uitdenbogerd: "similarity" judgment is difficult. It might be easier to make judgment on genres for example. audience: How long was need for evaluate one pair? Stephen: we have the data, but have not digged into it. Bergstra: can you make the contests year around?

audience: please be aware of a work on labelling images? "EST game"? people playing games while labeling image. they went throught the IRB in CMU audience2: reaching some conclusions. To get some sense on what makes them different. Stephen: IPM journal will have a special issue on MIREX, I'd like to organize it by contests. There have been a lot of discussions going on on the mailing lists of Audio sim and symbolic melody similarity.

2006:2006 Plenary Notes

Contents

Openning

Overview

Tasks

Onset Detection

Evalutron 6000

Friedman tests

Future MIREX plans

Discussion

Acknowledgement

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools