Difference between revisions of "2007:Discussion Points Euro MIREX Meeting"

From MIREX Wiki
(Topic 13: Relationships between features)
 
(21 intermediate revisions by 6 users not shown)
Line 7: Line 7:
 
I would appreciate comments and feedback on the idea that we re-run the Audio Cover Song task with the option (encouragement?) that participants use the the proposed MIREXDIYDEMO framework for submission and evaluation. I believe that Audio Cover Song is a good choice because the the ground-truth is fixed (i.e., either the returned items are cover songs or not) and thus requires no human intervention. Open to other suggestions along the same lines. Does look now (as of 11 April, 2007) that we will only be able to support MATLAB and JAVA submissions but this might be subject to change. Fool around with the preliminary prototype at http://cluster3.lis.uiuc.edu:8080/mirexdiydemo/. By the time we reach Vienna, we hope have a Version II up but this, unfortunately (for security reasons), will probably '''not''' be open to the general community.
 
I would appreciate comments and feedback on the idea that we re-run the Audio Cover Song task with the option (encouragement?) that participants use the the proposed MIREXDIYDEMO framework for submission and evaluation. I believe that Audio Cover Song is a good choice because the the ground-truth is fixed (i.e., either the returned items are cover songs or not) and thus requires no human intervention. Open to other suggestions along the same lines. Does look now (as of 11 April, 2007) that we will only be able to support MATLAB and JAVA submissions but this might be subject to change. Fool around with the preliminary prototype at http://cluster3.lis.uiuc.edu:8080/mirexdiydemo/. By the time we reach Vienna, we hope have a Version II up but this, unfortunately (for security reasons), will probably '''not''' be open to the general community.
  
[[User:JDownie|JDownie]] 10:50, 12 April 2007 (CDT)
+
JDownie 10:50, 12 April 2007 (CDT)
  
 
===Topic 2: First Proposal on Audio Music Mood Classification===
 
===Topic 2: First Proposal on Audio Music Mood Classification===
Line 13: Line 13:
 
Main points to discuss:
 
Main points to discuss:
  
*Taxonomy
+
*Taxonom
 
**Derived from music perception theories : Dimensional like Thayer's or Categorical like Juslin, Hevner
 
**Derived from music perception theories : Dimensional like Thayer's or Categorical like Juslin, Hevner
 
**Taken from a practical data collection (and maybe clustering to have less categories) : AllMusicGuild, MoodLogic, LastFM
 
**Taken from a practical data collection (and maybe clustering to have less categories) : AllMusicGuild, MoodLogic, LastFM
Line 25: Line 25:
  
  
--[[User:Claurier|Claurier]] 06:53, 19 April 2007 (CDT)
+
-- Claurier 06:53, 19 April 2007 (CDT)
  
 
===Topic 3: Resurrecting drum detection task from 2005===
 
===Topic 3: Resurrecting drum detection task from 2005===
  
The task could be run quite easily based on the decisions and specifications from 2005. But would there be enough participants? New data from ENST-drums?  
+
The task could be run quite easily based on the decisions and specifications from 2005:
 +
* Polyphonic music data from three collections. (New data from ENST-drums?)
 +
* Transcribe kick, snare and hihat.
 +
* Evaluation routines ready.
 +
 
 +
But would there be enough participants?
 +
* Yoshii, Tanghe, Gillet, Dittmar, Paulus, somebody new?
  
 
===Topic 4: New task of structural analysis===
 
===Topic 4: New task of structural analysis===
Line 36: Line 42:
 
** Chorus detection has been done more and there would be data available from RWC. Chorus detection in a way easier problem to specify and to evaluate.  
 
** Chorus detection has been done more and there would be data available from RWC. Chorus detection in a way easier problem to specify and to evaluate.  
 
** Full description harder. What is correct description? (Levels of hierarchy and locations of borders between parts, etc.)
 
** Full description harder. What is correct description? (Levels of hierarchy and locations of borders between parts, etc.)
* Data
+
* Data (big problem)
 
** RWC: AIST annotations of chorus sections, available for all
 
** RWC: AIST annotations of chorus sections, available for all
 
** TUT-structure data: commercial audio (also part from RWC Pop), annotations of clear repeated parts + some solos etc. Annotations do not cover whole songs. Amount of data limited, just 50 pieces. Ground truth not available publicly, but for the evaluation.
 
** TUT-structure data: commercial audio (also part from RWC Pop), annotations of clear repeated parts + some solos etc. Annotations do not cover whole songs. Amount of data limited, just 50 pieces. Ground truth not available publicly, but for the evaluation.
 
** Beatles data: Some annotations are known to be made based on Alan Pollock's "notes on"-series. Availability of the ground truth?
 
** Beatles data: Some annotations are known to be made based on Alan Pollock's "notes on"-series. Availability of the ground truth?
 +
* Evaluation?
 +
** Accept only description identical to ground truth?
 +
** Handle different levels of hierarchy?
 +
** Borders between parts/segments, allow shifts? Defining the locations difficult even for humans.
 
* Participants?
 
* Participants?
 +
* Start planning and collecting data now, but run in 2008?
  
--[[User:Paulus|Paulus]] 09:02, 20 April 2007 (CDT)
+
-- Paulus 09:02, 20 April 2007 (CDT)
  
 
===Topic 5: (Audio) Rhythmic similarity===
 
===Topic 5: (Audio) Rhythmic similarity===
Line 58: Line 69:
 
===Topic 7: Cover song===
 
===Topic 7: Cover song===
 
Is anyone working on audio melodic and harmonic similarity?
 
Is anyone working on audio melodic and harmonic similarity?
 +
* What about remix finding as well?
  
 
===Topic 8: Symbolic tasks===
 
===Topic 8: Symbolic tasks===
  
===Topic 9: Audio mood classification===
+
===Topic 9: Bandwidth within the MIR community for human evaluations===
 +
 
 +
===Topic 10: Software/Infrastruture components participants need for their submissions===
 +
* Java, Python, matlab, Mex, linux shells, Perl, MySQL DBs, C libraries
 +
 
 +
===Topic 11: MIREXDIY===
 +
 
 +
What is to prevent me from submitting a module which simply writes the sample data to stdout?
 +
* Hard limit on quantity (in kb) of console output?
 +
** once passed only a sample of console output is returned?
 +
** until acknowledged as safe by by someone at IMRSEL?
 +
 
 +
===Topic 12: Audio sim===
 +
...
 +
...
 +
* Clips presented to evaluators to be same as those presented to algorithms to prevent inconsistency
 +
 
 +
 
 +
===Topic 13: Braining-storming full task list and potential contact points/moderators===
 +
(need at least 3 participants)
 +
* Instrument recognition (Monophonic anf polyphonic)
 +
** Josh Reiss & Geoffrey Peters (others on wiki)
 +
 
 +
 
 +
 
 +
* Polyphonic transcription/ Multi F0 estimation
 +
** Josh Reiss & Geoffrey Peters (others on wiki - Mert Bay, Chungsin yeh), Matti Ryynänen
 +
 
 +
 
 +
 
 +
* Mood classification
 +
** Downie, Xiao Hu +
 +
*  Kris West (kw at cmp dot uea dot ac dot uk)
 +
* Cyril Laurier (claurier at iua dot upf dot edu)
 +
* Elias Pampalk (firstname.lastname@gmail.com)
 +
** Human eval
 +
** ground-truth proposals on way
 +
 
 +
 
 +
 
 +
* Rhythmic music audio similarity
 +
** Kris West, (Hamish?), (Elias?), (Fabien G.?), Ben Fields, Thomas Lidy
 +
** Query by tapping (etc.)
 +
*** Rainer - Cory McKay, Paulus
 +
*** Note ROger Jang's sunbg query DB
 +
 
 +
 
 +
 
 +
* Symbolic Rhythmic similarity
 +
** Hamish
 +
 
 +
 
 +
 
 +
* Artist ID
 +
** scrape/request ground-truth from Last.fm/AMG etc.
 +
#  Thomas Lidy (lastname@ifs.tuwien.ac.at)
 +
# Francois Pachet and Pierre Roy (lastname@csl.sony.fr)
 +
# Elias Pampalk (firstname.lastname@gmail.com)
 +
# Tim Pohle (firstname.lastname@jku.at)
 +
# Kris West (kw at cmp dot uea dot ac dot uk)
 +
 
 +
 
 +
 
 +
* Symbolic melodic similarity
 +
** Rainer, Klaus Frieler, Niko Mikkila, Dan Mullensiefen(?)
 +
** Same DB?
 +
 
 +
 
 +
 
 +
* Web based artist similarity
 +
** Crawl web and rank specified artist according to their similarity
 +
** List allowable data sources?
 +
** Dan Ellis' group has some ground-truth
 +
** Implement a game to evaluate?
 +
** Are there legal/IRB iisues?
 +
 
 +
 
 +
 
 +
* Audio Genre classification
 +
# Thomas Lidy (lastname@ifs.tuwien.ac.at)
 +
# Francois Pachet and Pierre Roy (lastname@csl.sony.fr)
 +
# Elias Pampalk (firstname.lastname@gmail.com)
 +
# Tim Pohle (firstname.lastname@jku.at)
 +
# Kris West (kw at cmp dot uea dot ac dot uk)
 +
# Enric Guaus (firstname.lastname@iua.upf.edu)
 +
# Abhinav Singh (abhinavs at iitg.ernet.in) and S.R.M. Prasanna (prasanna at iitg.ernet.in)
 +
# Ben Fields (map01bf at gold dot ac dot uk)
 +
# Tom Diethe (initial.surname@cs.ucl.ac.uk)
 +
** Artist/Album-filtering
 +
** Hierachical discount confusion
 +
 
 +
 
 +
 
 +
* Audio Chord detection
 +
** Katja Rosenbauer (Fraunhofer IDMT, Ilmenau, Germany), Christian Dittmar (Fraunhofer IDMT, Ilmenau, Germany), (Ben Fields?) (Chris Hart's method), Matti Ryynänen (TUT)
 +
 
 +
 
 +
 
 +
* Audio/Symbolic Key detection
 +
** Geoffrey Peters
 +
** Key change over time?
 +
** Synthesised from Midi
 +
 
 +
 
 +
 
 +
* Audio sim
 +
** write-up (many) discussions from meeting/lists
 +
# Klaas Bosteels (firstname.lastname@gmail.com)
 +
# Thomas Lidy (lastname@ifs.tuwien.ac.at)
 +
# Elias Pampalk (firstname.lastname@gmail.com)
 +
# Tim Pohle (firstname.lastname@jku.at)
 +
# Kris West (kw at cmp dot uea dot ac dot uk)
 +
# Julien Ricard (firstname.lastname@gmail.com)
 +
# Abhinav Singh (abhinavs at iitg.ernet.in) and S.R.M.Prasanna (prasanna at iitg.ernet.in)
 +
# Ben Fields (map01bf at gold dot ac dot uk)
 +
# Christoph Bastuck (bsk at idmt.fhg.de)
 +
# Aliaksandr Paradzinets (aliaksandr.paradzinets {at} ec-lyon.fr)
 +
# (very large num non-submitting, interested parties)
 +
 
 +
 
 +
 
 +
* Audio beat detection/tracking
 +
** Martin Mckinney, Geoffrey Peters (big problems last time), (MTG?), Mathew Davis, Simon Dixon
 +
** MTG data already open
 +
** Fuzzy scoring to make fair?
 +
 
 +
 
 +
 
 +
* Segmenting (Audio) - structure analysis/phrase boundary detection
 +
** Josh Reiss, Ben Fields, Jouni Paulus
 +
** Produce human annotations and get probability distribution across many evaluators
 +
 
 +
 
 +
 
 +
* Onset Detection
 +
** Dan Stowell (Queen Mary)
 +
** Alexandre Lacoste (Montréal)
 +
** Axel Roebel (Paris)
 +
 
 +
 
 +
 
 +
* Real-time Audio to Score Alignment (a.k.a Score Following)
 +
 
 +
 
 +
 
 +
* Query by Singing/Humming
 +
** Rainer, Roger Jang, Niko Mikkila
  
 +
===Topic 13: Relationships between features===
  
===Topic 10: Bandwidth within the MIR community for human evaluations===
+
There is an interest in storing and re-using features calculated from audio, which leads to the question of how to manage the relationships between low-level and higher-level features.
  
===Topic 11: Software/Infrastruture components participants need for their submissions===
+
* Do we want to use an ontology to describe features so that they can be equated?
* Java, Python, matlab, Mex, linux shells, Perl, MySQL DBs
+
* How should data be passed around?
 +
* Personally, from an M2K point of view, I would like to see intelligent caching. My concept of a feature is related to what is upstream of it. So if I ask for the GMMs of MFCCs of Magical Mystery Tour (with a particular parameter set), they might already have been calculated and cached; if not, the system knows to look for the MFCCs of MMT, if they haven't been calculated the system needs to get MMT, etc. This has the advantage that a policy might be set for certain features that they can be made public -- e.g. MFCCs and phaseless STFT might be okay, full STFT might not be, but onsets determined from full STFT might be. Hamish 04:59, 23 April 2007 (CDT)

Latest revision as of 00:47, 15 December 2011

Welcome

This page is open to all folks interested in helping to set the agenda for the Euro MIREX Planning Meeting (22-23 April, 2007). This includes meeting participants and anyone who would like to see a topic addressed by the participants. Any and all topics, suggestions, gripes, etc. related to MIREX are not only welcomed, they are encouraged.

AGENDA TOPICS, IDEAS, SUGGESTIONS, GRIPES, ETC.

Topic 1: First Task Proposal For MIREXDIYDEMO Prototype: Audio Cover Song?

I would appreciate comments and feedback on the idea that we re-run the Audio Cover Song task with the option (encouragement?) that participants use the the proposed MIREXDIYDEMO framework for submission and evaluation. I believe that Audio Cover Song is a good choice because the the ground-truth is fixed (i.e., either the returned items are cover songs or not) and thus requires no human intervention. Open to other suggestions along the same lines. Does look now (as of 11 April, 2007) that we will only be able to support MATLAB and JAVA submissions but this might be subject to change. Fool around with the preliminary prototype at http://cluster3.lis.uiuc.edu:8080/mirexdiydemo/. By the time we reach Vienna, we hope have a Version II up but this, unfortunately (for security reasons), will probably not be open to the general community.

JDownie 10:50, 12 April 2007 (CDT)

Topic 2: First Proposal on Audio Music Mood Classification

Main points to discuss:

  • Taxonom
    • Derived from music perception theories : Dimensional like Thayer's or Categorical like Juslin, Hevner
    • Taken from a practical data collection (and maybe clustering to have less categories) : AllMusicGuild, MoodLogic, LastFM
    • Both?
  • Ground Truth
    • Human evaluators by using an online application comparable to IMIRSEL's Evalutron 6000
    • Collect labels from popular music websites (LastFM?)
    • Both?
  • Data
    • Why not using stereo files? (general question)


-- Claurier 06:53, 19 April 2007 (CDT)

Topic 3: Resurrecting drum detection task from 2005

The task could be run quite easily based on the decisions and specifications from 2005:

  • Polyphonic music data from three collections. (New data from ENST-drums?)
  • Transcribe kick, snare and hihat.
  • Evaluation routines ready.

But would there be enough participants?

  • Yoshii, Tanghe, Gillet, Dittmar, Paulus, somebody new?

Topic 4: New task of structural analysis

To discuss:

  • Problem definition: just chorus detection or full description?
    • Chorus detection has been done more and there would be data available from RWC. Chorus detection in a way easier problem to specify and to evaluate.
    • Full description harder. What is correct description? (Levels of hierarchy and locations of borders between parts, etc.)
  • Data (big problem)
    • RWC: AIST annotations of chorus sections, available for all
    • TUT-structure data: commercial audio (also part from RWC Pop), annotations of clear repeated parts + some solos etc. Annotations do not cover whole songs. Amount of data limited, just 50 pieces. Ground truth not available publicly, but for the evaluation.
    • Beatles data: Some annotations are known to be made based on Alan Pollock's "notes on"-series. Availability of the ground truth?
  • Evaluation?
    • Accept only description identical to ground truth?
    • Handle different levels of hierarchy?
    • Borders between parts/segments, allow shifts? Defining the locations difficult even for humans.
  • Participants?
  • Start planning and collecting data now, but run in 2008?

-- Paulus 09:02, 20 April 2007 (CDT)

Topic 5: (Audio) Rhythmic similarity

Construct a task to evaluate comparison of audio music tracks by Rhythmic characteristics alone.

  • Perhaps model on audio similarity
  • Human evaluation?
  • Automated statistical evaluation?
    • Maybe easier to establish a ground-truth than for wider music similarity (e.g. use ballroom dance music, modern dance music genres)
  • Could be combined/compared with (timbral) music similarity algorithms.

Topic 6: Instrument recognition

  • monophonic
  • polyphonic

Topic 7: Cover song

Is anyone working on audio melodic and harmonic similarity?

  • What about remix finding as well?

Topic 8: Symbolic tasks

Topic 9: Bandwidth within the MIR community for human evaluations

Topic 10: Software/Infrastruture components participants need for their submissions

  • Java, Python, matlab, Mex, linux shells, Perl, MySQL DBs, C libraries

Topic 11: MIREXDIY

What is to prevent me from submitting a module which simply writes the sample data to stdout?

  • Hard limit on quantity (in kb) of console output?
    • once passed only a sample of console output is returned?
    • until acknowledged as safe by by someone at IMRSEL?

Topic 12: Audio sim

... ...

  • Clips presented to evaluators to be same as those presented to algorithms to prevent inconsistency


Topic 13: Braining-storming full task list and potential contact points/moderators

(need at least 3 participants)

  • Instrument recognition (Monophonic anf polyphonic)
    • Josh Reiss & Geoffrey Peters (others on wiki)


  • Polyphonic transcription/ Multi F0 estimation
    • Josh Reiss & Geoffrey Peters (others on wiki - Mert Bay, Chungsin yeh), Matti Ryyn├ñnen


  • Mood classification
    • Downie, Xiao Hu +
  • Kris West (kw at cmp dot uea dot ac dot uk)
  • Cyril Laurier (claurier at iua dot upf dot edu)
  • Elias Pampalk (firstname.lastname@gmail.com)
    • Human eval
    • ground-truth proposals on way


  • Rhythmic music audio similarity
    • Kris West, (Hamish?), (Elias?), (Fabien G.?), Ben Fields, Thomas Lidy
    • Query by tapping (etc.)
      • Rainer - Cory McKay, Paulus
      • Note ROger Jang's sunbg query DB


  • Symbolic Rhythmic similarity
    • Hamish


  • Artist ID
    • scrape/request ground-truth from Last.fm/AMG etc.
  1. Thomas Lidy (lastname@ifs.tuwien.ac.at)
  2. Francois Pachet and Pierre Roy (lastname@csl.sony.fr)
  3. Elias Pampalk (firstname.lastname@gmail.com)
  4. Tim Pohle (firstname.lastname@jku.at)
  5. Kris West (kw at cmp dot uea dot ac dot uk)


  • Symbolic melodic similarity
    • Rainer, Klaus Frieler, Niko Mikkila, Dan Mullensiefen(?)
    • Same DB?


  • Web based artist similarity
    • Crawl web and rank specified artist according to their similarity
    • List allowable data sources?
    • Dan Ellis' group has some ground-truth
    • Implement a game to evaluate?
    • Are there legal/IRB iisues?


  • Audio Genre classification
  1. Thomas Lidy (lastname@ifs.tuwien.ac.at)
  2. Francois Pachet and Pierre Roy (lastname@csl.sony.fr)
  3. Elias Pampalk (firstname.lastname@gmail.com)
  4. Tim Pohle (firstname.lastname@jku.at)
  5. Kris West (kw at cmp dot uea dot ac dot uk)
  6. Enric Guaus (firstname.lastname@iua.upf.edu)
  7. Abhinav Singh (abhinavs at iitg.ernet.in) and S.R.M. Prasanna (prasanna at iitg.ernet.in)
  8. Ben Fields (map01bf at gold dot ac dot uk)
  9. Tom Diethe (initial.surname@cs.ucl.ac.uk)
    • Artist/Album-filtering
    • Hierachical discount confusion


  • Audio Chord detection
    • Katja Rosenbauer (Fraunhofer IDMT, Ilmenau, Germany), Christian Dittmar (Fraunhofer IDMT, Ilmenau, Germany), (Ben Fields?) (Chris Hart's method), Matti Ryyn├ñnen (TUT)


  • Audio/Symbolic Key detection
    • Geoffrey Peters
    • Key change over time?
    • Synthesised from Midi


  • Audio sim
    • write-up (many) discussions from meeting/lists
  1. Klaas Bosteels (firstname.lastname@gmail.com)
  2. Thomas Lidy (lastname@ifs.tuwien.ac.at)
  3. Elias Pampalk (firstname.lastname@gmail.com)
  4. Tim Pohle (firstname.lastname@jku.at)
  5. Kris West (kw at cmp dot uea dot ac dot uk)
  6. Julien Ricard (firstname.lastname@gmail.com)
  7. Abhinav Singh (abhinavs at iitg.ernet.in) and S.R.M.Prasanna (prasanna at iitg.ernet.in)
  8. Ben Fields (map01bf at gold dot ac dot uk)
  9. Christoph Bastuck (bsk at idmt.fhg.de)
  10. Aliaksandr Paradzinets (aliaksandr.paradzinets {at} ec-lyon.fr)
  11. (very large num non-submitting, interested parties)


  • Audio beat detection/tracking
    • Martin Mckinney, Geoffrey Peters (big problems last time), (MTG?), Mathew Davis, Simon Dixon
    • MTG data already open
    • Fuzzy scoring to make fair?


  • Segmenting (Audio) - structure analysis/phrase boundary detection
    • Josh Reiss, Ben Fields, Jouni Paulus
    • Produce human annotations and get probability distribution across many evaluators


  • Onset Detection
    • Dan Stowell (Queen Mary)
    • Alexandre Lacoste (Montr├⌐al)
    • Axel Roebel (Paris)


  • Real-time Audio to Score Alignment (a.k.a Score Following)


  • Query by Singing/Humming
    • Rainer, Roger Jang, Niko Mikkila

Topic 13: Relationships between features

There is an interest in storing and re-using features calculated from audio, which leads to the question of how to manage the relationships between low-level and higher-level features.

  • Do we want to use an ontology to describe features so that they can be equated?
  • How should data be passed around?
  • Personally, from an M2K point of view, I would like to see intelligent caching. My concept of a feature is related to what is upstream of it. So if I ask for the GMMs of MFCCs of Magical Mystery Tour (with a particular parameter set), they might already have been calculated and cached; if not, the system knows to look for the MFCCs of MMT, if they haven't been calculated the system needs to get MMT, etc. This has the advantage that a policy might be set for certain features that they can be made public -- e.g. MFCCs and phaseless STFT might be okay, full STFT might not be, but onsets determined from full STFT might be. Hamish 04:59, 23 April 2007 (CDT)