Difference between revisions of "2006:2006 Plenary Notes"
(→Opening) |
|||
| (15 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
Oct. 12th @ Empress Crystal Hall, Victoria | Oct. 12th @ Empress Crystal Hall, Victoria | ||
| − | = | + | =Opening= |
| − | Professor Stephen Downie gave the | + | Professor Stephen Downie gave the opening remarks: |
*We will present certificates for participants. Feel free to grab yours if you are leaving. | *We will present certificates for participants. Feel free to grab yours if you are leaving. | ||
*Appreciation to IMIRSEL team members. | *Appreciation to IMIRSEL team members. | ||
| − | =Overview= | + | ==Overview== |
*This year MIREX is highly successful. We got everything done on time! | *This year MIREX is highly successful. We got everything done on time! | ||
| Line 19: | Line 19: | ||
** Audio cover song: 13 different songs, each of which has 11 different versions | ** Audio cover song: 13 different songs, each of which has 11 different versions | ||
** Score following: have ground work done for future years | ** Score following: have ground work done for future years | ||
| − | ** QBSH: 48 ground truth melodies. Different versions of queries on the 48 melodies. About 2000 noise songs were selected from Essen dataset. | + | ** QBSH: 48 ground truth melodies. Different versions of queries on the 48 melodies. About 2000 noise songs were selected from Essen dataset. Both audio input and MIDI input are supported. |
| + | *Please think about new tasks next year. | ||
| + | *New evaluations: | ||
| + | **Evalutron 6000 got real-world human judgment. | ||
| + | **Audio onset detection supported multiple parameters. | ||
| + | **Friedman test: It is valuable experience from TREC conferences, the annual contests in Text Retrieval area. | ||
| + | |||
| + | ==Onset Detection== | ||
| + | By tuning the parameters, we can get an optimal setting which is a tradeoff between precision and recall. | ||
| + | We need new dataset to see if the tuned parameters are good for onseen data. | ||
| + | Question: comparison to last year results? | ||
| + | Answer: this year is better because there are multiple parameter tunings. | ||
| + | |||
| + | ==Evalutron 6000== | ||
| + | Two judgments: | ||
| + | *category judgment: Not similar; Similar; Very similar | ||
| + | *continurous score: from 0 to 10, allowing one decimal after the decimal point. | ||
| + | *the system: using CMS open source software | ||
| + | *still have data that we haven't fully processed (other user/evaluator behaviors) | ||
| + | *new evaluation on other facets? e.g. mood | ||
| + | *suggestions? | ||
| + | *appreciate evaluators' volunteer work. Your work makes life beautiful! | ||
| + | Questions: consistency across users? | ||
| + | Answer: the data appear to be quite consistency. More analysis can be done on the data which are publicly assessable. | ||
| + | *automatic evaluation using available metadata (vs human judgment) | ||
| + | |||
| + | == Friedman tests== | ||
| + | *a variation of chi-square test | ||
| + | *Matlab script code is on the wiki | ||
| + | *Compare different algorithms | ||
| + | *this test is conservative | ||
| + | |||
| + | ==Future MIREX plans== | ||
| + | |||
| + | Please see the powerpoint slides. | ||
| + | |||
| + | =Acknowledgement= | ||
| + | Mellon Foundation | ||
| + | |||
| + | |||
| + | =Discussion= | ||
| + | *Encourage everyone to participate. | ||
| + | *Need data! | ||
| + | *Metadata: handy goundtruth | ||
| + | *reuse data: for at least two or three years | ||
| + | *submission: robustness, platform, scalability, paralellization | ||
| + | |||
| + | Kris: call for organizers! | ||
| + | |||
| + | Alexandra Uitdenbogerd: "similarity" judgment is difficult. It might be easier to make judgment on genres for example. | ||
| + | |||
| + | audience1: How long was need for evaluate one pair? | ||
| + | Stephen: we have the data, but have not digged into it. | ||
| + | |||
| + | Bergstra: can you make the contests year around? | ||
| + | Stephen: some of them, yes. | ||
| + | |||
| + | audience1: please be aware of a work on labelling images? "ESP game": people playing games while labeling image. they went throught the IRB in CMU | ||
| + | |||
| + | audience2: reaching some conclusions. To get some sense on what makes them different. | ||
| + | Stephen: IPM journal will have a special issue on MIREX, I'd like to organize it by contests. There have been a lot of discussions going on on the mailing lists of Audio sim and symbolic melody similarity. | ||
| + | |||
| + | audience3: Make the data available for the participants after evaluation? It would be a big reward for participants. It is an incentive for participation. | ||
| + | Stephen: audio is hard to move | ||
| + | Mert: we can distribute features | ||
| + | audience3: we would like to pay .50$ for each song. | ||
| + | Stephen: I like this motivation model too, but the copyright is really tricky. we will work towards that. This brings to funding issues. | ||
| + | Kris: "unknow" is a bonous to avoid overfitting. | ||
| + | |||
| + | audience4: let old algorithms run in new years, so as to see their variantions. | ||
| + | Stephen: I/O changes across years. We will try to make I/O stable. | ||
| + | Alexandra Uitdenbogerd: some participants may not want their algrithms to run against new datasets. But stable I/O is really nice. Better to make source code accessible for individuals who wants to share their code. | ||
| + | |||
| + | == Onset detection== | ||
| + | |||
| + | audience4: having individual results for each entrance? because metrics and statistic tests can change, only raw results last. | ||
| + | Andy: the raw results are avaible, but the groundtruth is Martin's data. | ||
| + | |||
| + | ==Audio similarity== | ||
| + | A link to Elias' paper on this task. | ||
| + | Paul: organizers should attend the Spring meeting and finalize evaluation, better not to change evaluation at last minutes. New modifications can take effect in next year. | ||
| + | Elias: this is very good, consistency is high | ||
| + | Stephen: '''precise definition''' of task would help -- what we are going to compare!. A bit worry about variance. I hope we are not getting malicious people. | ||
| + | Elias: "audio similarity" means too many things, so anyone can give a better name? | ||
| + | Andy: we got improved compared to last year, this is exciting. | ||
| + | |||
| + | ==QBSH== | ||
| + | Roger: it is easy to get data, all you need to do is singing on a microphone. I hope every participate contribute some data (both ground truth and queries) | ||
| + | Rainer: this year we have both audio and midi, but the midi was generated by pv5, no segementation. So might hurt the results using midi input. | ||
| + | |||
| + | ==Symbolic Melody Similarity== | ||
| + | Alexandra: the query set is quite small. | ||
| + | Stephen: we haven't done Friedman test for this contest yet. | ||
| + | Rainer: more data means more evaluation burden, really depends how much we'd like to do. there is a link on the wiki to my processing results. | ||
| + | |||
| + | ==Score Following== | ||
| + | Organizer (Diemo Schwarz): I am glad we have a framework now. Next year, we will have more participants. Now audio to symbolic, we have high precision after quite a lot hand work. | ||
| + | Offline analysis can be another topic. | ||
| + | next year: augment database, and change the measures. | ||
| + | |||
| + | ==Audio Cover Song== | ||
| + | Stephen: I will lead this contest next year. Get more songs and build larger database | ||
| + | * Folks please post your poster (pdf) onto the wiki. | ||
| + | |||
| + | =New tasks= | ||
| + | 1. Andy: pitch detection | ||
| + | 2. Stephen: similarity and metadata like mood, usage, etc. | ||
| + | 3. Eric Nicoles: encourage you to keep on the symbolic contests. | ||
| + | 4. collaborative filtering: the textual data can be shared by participants and encourage participation. Norman in last.fm has much data. | ||
| + | Audience1: We might have the problem on making our data public. | ||
| + | Kris: connect collaborative filering data to audio | ||
| + | |||
| + | Stephen: start to think about this NOW! Thank everyone!!! | ||
| + | Digest MIREX 2006 results; Think about MIREX 2007! | ||
Latest revision as of 12:45, 19 October 2006
Oct. 12th @ Empress Crystal Hall, Victoria
Contents
Opening
Professor Stephen Downie gave the opening remarks:
- We will present certificates for participants. Feel free to grab yours if you are leaving.
- Appreciation to IMIRSEL team members.
Overview
- This year MIREX is highly successful. We got everything done on time!
- Matlab is widely used (universal retrieval language!)
- All the evaluation result data files are available on the wiki.
Tasks
- We had sub-tasks as tasks are getting matured.
- New tasks:
- Audio cover song: 13 different songs, each of which has 11 different versions
- Score following: have ground work done for future years
- QBSH: 48 ground truth melodies. Different versions of queries on the 48 melodies. About 2000 noise songs were selected from Essen dataset. Both audio input and MIDI input are supported.
- Please think about new tasks next year.
- New evaluations:
- Evalutron 6000 got real-world human judgment.
- Audio onset detection supported multiple parameters.
- Friedman test: It is valuable experience from TREC conferences, the annual contests in Text Retrieval area.
Onset Detection
By tuning the parameters, we can get an optimal setting which is a tradeoff between precision and recall. We need new dataset to see if the tuned parameters are good for onseen data. Question: comparison to last year results? Answer: this year is better because there are multiple parameter tunings.
Evalutron 6000
Two judgments:
- category judgment: Not similar; Similar; Very similar
- continurous score: from 0 to 10, allowing one decimal after the decimal point.
- the system: using CMS open source software
- still have data that we haven't fully processed (other user/evaluator behaviors)
- new evaluation on other facets? e.g. mood
- suggestions?
- appreciate evaluators' volunteer work. Your work makes life beautiful!
Questions: consistency across users? Answer: the data appear to be quite consistency. More analysis can be done on the data which are publicly assessable.
- automatic evaluation using available metadata (vs human judgment)
Friedman tests
- a variation of chi-square test
- Matlab script code is on the wiki
- Compare different algorithms
- this test is conservative
Future MIREX plans
Please see the powerpoint slides.
Acknowledgement
Mellon Foundation
Discussion
- Encourage everyone to participate.
- Need data!
- Metadata: handy goundtruth
- reuse data: for at least two or three years
- submission: robustness, platform, scalability, paralellization
Kris: call for organizers!
Alexandra Uitdenbogerd: "similarity" judgment is difficult. It might be easier to make judgment on genres for example.
audience1: How long was need for evaluate one pair? Stephen: we have the data, but have not digged into it.
Bergstra: can you make the contests year around? Stephen: some of them, yes.
audience1: please be aware of a work on labelling images? "ESP game": people playing games while labeling image. they went throught the IRB in CMU
audience2: reaching some conclusions. To get some sense on what makes them different. Stephen: IPM journal will have a special issue on MIREX, I'd like to organize it by contests. There have been a lot of discussions going on on the mailing lists of Audio sim and symbolic melody similarity.
audience3: Make the data available for the participants after evaluation? It would be a big reward for participants. It is an incentive for participation. Stephen: audio is hard to move Mert: we can distribute features audience3: we would like to pay .50$ for each song. Stephen: I like this motivation model too, but the copyright is really tricky. we will work towards that. This brings to funding issues. Kris: "unknow" is a bonous to avoid overfitting.
audience4: let old algorithms run in new years, so as to see their variantions. Stephen: I/O changes across years. We will try to make I/O stable. Alexandra Uitdenbogerd: some participants may not want their algrithms to run against new datasets. But stable I/O is really nice. Better to make source code accessible for individuals who wants to share their code.
Onset detection
audience4: having individual results for each entrance? because metrics and statistic tests can change, only raw results last. Andy: the raw results are avaible, but the groundtruth is Martin's data.
Audio similarity
A link to Elias' paper on this task. Paul: organizers should attend the Spring meeting and finalize evaluation, better not to change evaluation at last minutes. New modifications can take effect in next year. Elias: this is very good, consistency is high Stephen: precise definition of task would help -- what we are going to compare!. A bit worry about variance. I hope we are not getting malicious people. Elias: "audio similarity" means too many things, so anyone can give a better name? Andy: we got improved compared to last year, this is exciting.
QBSH
Roger: it is easy to get data, all you need to do is singing on a microphone. I hope every participate contribute some data (both ground truth and queries) Rainer: this year we have both audio and midi, but the midi was generated by pv5, no segementation. So might hurt the results using midi input.
Symbolic Melody Similarity
Alexandra: the query set is quite small. Stephen: we haven't done Friedman test for this contest yet. Rainer: more data means more evaluation burden, really depends how much we'd like to do. there is a link on the wiki to my processing results.
Score Following
Organizer (Diemo Schwarz): I am glad we have a framework now. Next year, we will have more participants. Now audio to symbolic, we have high precision after quite a lot hand work. Offline analysis can be another topic. next year: augment database, and change the measures.
Audio Cover Song
Stephen: I will lead this contest next year. Get more songs and build larger database
- Folks please post your poster (pdf) onto the wiki.
New tasks
1. Andy: pitch detection 2. Stephen: similarity and metadata like mood, usage, etc. 3. Eric Nicoles: encourage you to keep on the symbolic contests. 4. collaborative filtering: the textual data can be shared by participants and encourage participation. Norman in last.fm has much data. Audience1: We might have the problem on making our data public. Kris: connect collaborative filering data to audio
Stephen: start to think about this NOW! Thank everyone!!! Digest MIREX 2006 results; Think about MIREX 2007!