2005:Symbolic Genre Class

From MIREX Wiki
Revision as of 00:25, 28 February 2005 by 70.48.173.175 (talk | contribs) (Relevant Test Collections)

Proposer

Cory McKay (McGill University) cory.mckay@mail.mcgill.ca

Title

AMENDED Genre Classification of MIDI Files Proposal

Description

Submitted software will automatically classify MIDI recordings into genre categories.

1) Genre Categories

Two sets of genre categories will be used, one consisting of a few categories and one consisting of more categories. Systems will be trained and tested separately on these two taxonomies. This will allow measurements of how well the systems perform coarse classifications as well as fine classifications.

Although the categories will be organized hierarchically, submitted software will only need to produce classifications of leaf categories. This means that entrants who have not implemented hierarchical classification can simply treat the problem as a flat classification among the leaf categories, and can effectively ignore the hierarchy. The use of a hierarchical structure is suggested because this reflects the natural way in which humans appear to organize genre, and it allows one to take advantage of hierarchical classification techniques if desired. The approach proposed here has the advantage of allowing entrants to treat the problem as either a flat or hierarchical classification problem, whatever their preference.

Based on responses to the original proposal, each recording will belong to one and only one category. Although the original proposal of allowing multiple memberships is more realistic, it is inconsistent with how most systems have been implemented. Furthermore, allowing multiple classifications would have greatly complicated the task of the evaluation committee, so the choice of requiring single category classifications is probably better for all involved.

I have two taxonomies that I have used in past research, one consisting of 9 unique leaf categories, and the other of 38 unique leaf categories. I also have 25 hand-labelled MIDI recordings for each category (a total of 950 annotated recordings) that could be used for training and/or validation. Although alternative suggestions are certainly welcome, I propose these taxonomies and recordings simply because I have them and am more than willing to share them.

The taxonomies I propose are as follows (leaf categories, which are the only classification outputs of systems, in order to allow for flat classification if desired, are marked with plus signs to their right):


9 Leaf Category Taxonomy:

  • Jazz
    • Bebop +
    • Jazz Soul +
    • Swing +
  • Popular
    • Rap +
    • Punk +
    • Country +
  • Western Classical
    • Baroque +
    • Modern Classical +
    • Romantic +


38 Leaf Category Taxonomy:

  • Country
    • Bluegrass +
    • Contemporary +
    • Trad. Country +
  • Jazz
    • Bop
      • Bebop +
      • Cool +
    • Fusion
      • Bossa Nova +
      • Jazz Soul +
      • Smooth Jazz +
    • Ragtime +
    • Swing +
  • Modern Pop
    • Adult Contemporary +
    • Dance
      • Dance Pop +
      • Pop Rap +
      • Techno +
    • Smooth Jazz +
  • Rap
    • Hardcore Rap +
    • Pop Rap +
  • Rhythm and Blues
    • Blues
      • Blues Rock +
      • Chicago Blues +
      • Country Blues +
      • Soul Blues +
    • Funk +
    • Jazz Soul +
    • Rock and Roll +
    • Soul +
  • Rock
    • Classic Rock
      • Blues Rock +
      • Hard Rock +
      • Psychedelic +
    • Modern Rock
      • Alternative Rock +
      • Hard Rock +
      • Metal +
      • Punk +
  • Western Classical
    • Baroque +
    • Classical +
    • Early Music
      • Medieval +
      • Renaissance +
    • Modern Classical +
    • Romantic +
  • Western Folk
    • Bluegrass +
    • Celtic +
    • Country Blues +
    • Flamenco +
  • Worldbeat
    • Latin
      • Bossa Nova +
      • Salsa +
      • Tango +
    • Reggae +


Please note that, in the 38-category taxonomy, some leaf-categories belong to more than one parent. This is done in order to more realistically interrelationships between categories. Those performing flat classification could simply ignore this fact, and simply use the 38 unique leaf categories.

Of course, as stated earlier, alternative suggestions are certainly welcome, and it may be desirable to use subsets of these taxonomies, as some participants have expressed an interest in using a smaller number of categories.

I have the taxonomies and file information stored in XML files, using the following two DTDs (including a short example for each):

<!DOCTYPE taxonomy_file [

  <!ELEMENT taxonomy_file (taxonomy_comments, parent_category+)>
  <!ELEMENT taxonomy_comments (#PCDATA)>
  <!ELEMENT parent_category (category_name, sub_category+)>
  <!ELEMENT category_name (#PCDATA)>
  <!ELEMENT sub_category (category_name, sub_category*)>

]>

  <parent_category>
        <category_name>Modern Pop</category_name>
        <sub_category>
           <category_name>Adult Contemporary</category_name>
        </sub_category>
        <sub_category>
           <category_name>Dance</category_name>
           <sub_category>
              <category_name>Dance Pop</category_name>
           </sub_category>
           <sub_category>
              <category_name>Pop Rap</category_name>
           </sub_category>
           <sub_category>
              <category_name>Techno</category_name>
           </sub_category>
        </sub_category>
        <sub_category>
           <category_name>Smooth Jazz</category_name>
        </sub_category>
  </parent_category>

<!DOCTYPE recordings_file [ <!ELEMENT recordings_file (comments, file_path, recording+)>

  <!ELEMENT comments (#PCDATA)>
  <!ELEMENT file_path (#PCDATA)>
  <!ELEMENT recording (filename, role, model_genres, title, artist)>
  <!ELEMENT filename (#PCDATA)>
  <!ELEMENT role (#PCDATA)>
  <!ELEMENT model_genres (genre*)>
  <!ELEMENT genre (#PCDATA)>
  <!ELEMENT title (#PCDATA)>
  <!ELEMENT artist (#PCDATA)>

]>

  <recording>
     <filename>JAMES_BROWN_I_GOT_THE_FEELIN_.mid</filename>
     <role>classification</role>
     <model_genres>
        <genre>Funk</genre>
     </model_genres>
     <title>I Got the Feelin'</title>
     <artist>James Brown</artist>
  </recording>

Of course, alternative formats could certainly be used if preferred by participants. RudiΓÇÖs format suggestions, for example, sound great to me. I simply propose the above XML format because that is how my data is currently annotated, but this can be changed. The general feeling of the interested participants, based on e-mail correspondance, is that the simpler format proposed by Rudi would be a better choice for the purposes of MIREX, so it may be worthwhile pursuing that option.


2) Training and Testing Recordings

As mentioned above, I will be glad to make my collection of 950 MIDI files available, including model classifications. There are 25 recordings belonging to each of the 38 leaf categories provided above. I should add as a warning, however, that although my acquisition of the MIDI files through downloading was legal here in Canada, the copyright situation may be more complicated in some countries. As has been suggested by one of the reviewers, one possibility is to use the D2K interface to make them available. Alternative arrangements could be made for those who have difficulty accessing the D2K system. Pedro José Ponce de León Amador also has a library of MIDI files that he would like to make available as well.

The evaluation committee may wish to test the entrants using cross-validation with the MIDI files I am offering. Cross-validation has the advantage that no additional validation set needs to be collected and annotated by the committee, but then the committee would need to train each system, which may pose problems for some systems, and some participants have expressed a preference for training their own systems.

So, perhaps it would be preferable for the evaluation committee to distribute training recordings to all participants so that they may each train their own systems with them. The evaluation committee could then use their own small validation set to test performance. It is suggested here that if training is performed ahead of time by individual participants, then the validation recordings should be kept confidential until after evaluation in order to ensure that no classifiers were trained with them. This would, however, require the gathering and annotation of, perhaps, 10 validation files per category. The committee may therefore wish to shorten the number of categories in the 38-category taxonomy.


3) Input Data

Training will be performed by providing the software (through a command-line argument) with two files, one specifying the possible taxonomies and one specifying the training MIDI file paths and their associated model classifications. Testing will be performed the same way, except that no model classifications will be provided for the MIDI files.


4) Output Data

The software will produce a text file listing test recording file paths and the genre that each has been classified as.

Participants

  • Confirmed: George Tzanetakis (University of Victoria), gtzan@cs.uvic.ca
  • Confirmed: Cory McKay & Ichiro Fujinaga (McGill University), cory.mckay@mail.mcgill.ca
  • Confirmed: Pedro J. Ponce de Leon & Jose M. Inesta (Universidad de Alicante), pierre@dlsi.ua.es
  • Expressed interest: Rudi Cilibrasi (CWI) cilibrar@cwi.nl
  • Contacted but no response yet: Roberto Basili, Alfredo Serafini & Armando Stellato (University of Rome Tor Vergata), basili@info.uniroma2.it
  • Contacted but no response yet: Man-Kwan Shan & Fang-Fei Kuo (National Cheng Chi University), mkshan@cs.nccu.edu.tw

Evaluation Procedures

Entries will be evaluated based on their success rates with respect to both fine and coarse classifications. Entrants will have the option of enabling their software to output classifications of ΓÇ£unknown,ΓÇ¥ which will be penalized less severely during evaluation than misclassifications, as classifications flagged as uncertain are much better than false classifications in a practical context.

Submissions in C/C++, Java, MatLab and Python (and other languages?) will be accepted.

Relevant Test Collections

  • The 950 MIDI files I have available
  • The research MIDI library of Pedro Jos├⌐ Ponce de Le├│n Amador
  • Collections of other participants (e.g. Pedro has some MIDI files available)
  • On-line repositories of MIDI files (sample links available at http://www.music.mcgill.ca/~cmckay/midi.html, although these were collected about a year ago)
  • Research databases

Review 1

The problem is very interesting for MIR, but too vaguely described. The role of the committee is not to propose anything, but to review the proposed evaluation sessions. Thus the author should propose a detailed list of genres and corresponding data.

I'm not against organizing the genres hierarchically and associating several genres to each file, but this raises many issues that are not discussed at all here. If a track belongs to several genres, are these genres equally weighted or not ? Are they determined by asking several people to classify each track into one genre, or by asking each one to classify each track into several genres ? If there are coarse categories for classical and folk music, where lies the fine category of classical music adapted from folk songs ? I suggest that the contest concentrates on the single genre problem.

The choice of the genre classes is a crucial issue for the contest to be held several times. Indeed existing databases can be reused only when the defined categories are identical each year. Obviously the list of categories should reflect the list of MIDI music available on the internet. It would help if some data were already labeled according to this list.

The list of relevant data should be developed. How many files are needed for learning and testing ? Have the participants already collected some labeled data that they could give to the organizers ? How much ?

Regarding the release of the data, I think that it would be better not to release anything. The training and test data should always be accessible through the D2K interface, and thus no copyright problem would appear. Is it possible to ensure that the test data are used only for testing and not for learning ? Is it possible to implement learning easily in M2K ? (each algorithm may use different structures to store learnt data)

Finally, the evaluation procedure seems nice, but I don't have any clue whether the proposed participants are really interested.

Review 2

This is an interesting topic, one that I haven't seen much work on. I do not believe that its difficult to get a large collection of midi files. Many are in public domain, were never intended to be copyrighted, or have copyleft / creative commons licences. However, its still difficult to assemble a reasonable collection of midi files of appropriate length which accurately represent a sufficient number of genres. This must be addressed.

A key point is that it requires the Contest Committee to handlabel a large number of midi files. We also need to determine what our genres are. Is the Committee capable and willing to do this? I personally would find it very difficult to determine the genre of a midi recording which I don't recognize. MIDI all sounds like Muzak to me, unless I know the original audio recording. Has anyone tried midi-based genre classification before?

I have no problems with the suggested evaluation and testing procedures.

I think we need some more feedback on whether people are really interested in this. Most researchers who use MIDI, to my knowledge, aren't concerned with genre issues. George typically works with audio, so the proposer is the only one I'm aware of who I know is interested. I could be wrong so lets ask around. We also need to explore the handlabelling task, and to see if we can assemble a decent collection (which we should do regardless of this proposal).

If there is significant interest, and the labeling can be done, then we should accept it.

Downie's Comments

1. Happy to see another symbolic proposal!

2. See my comments w/r/t the Audio Genre proposal. We need to make these two tasks as similar as possible!

Rudi's Comments

Looks good to me. I agree with the first reviewer's comments wholeheartedly. I think it may be too complicated to do hierarchical genre classification. What if we just restrict ourselves to just 2-5 genres and pretend they are disjoint? And then only pick songs that clearly fit one or the other. I'm not against a hierarchical system necessarily, but it does seem like it may involve so much more work and arbitrariness in labelling, scoring, etc. If you just want to get something a bit more interesting than simple Jazz / Rock / Classical then how about happy / sad music? We could train on two different dimensions in two parts (or perhaps on the same set of songs?) to add a little variety without much additional complexity on the part of the participants or organizers. Or how about "hit song" (greater than 1 million copies sold or something) versus "not hit" like that Hit Song predictor that got some press lately.

I agree that we will need to get some more parameters about the number of MIDI files involved in the experiment. Let's put a finer point on the data model. Each training sample will have

  • a MIDI file, provided as an absolute pathname string
  • a string song title
  • a string artist/group name
  • a numerical genre classification code as an integer
  • any other codes (e.g. happy/sad or hit/not-hit) also as integers

On each run of the system, the training set will be partitioned into five parts and set up for five-fold cross-validation testing. It will be given each song for training along with one of the N different integer label codes for whatever test is in progress. The program will read from standard input the following information, one record per line:

  • first line, the number of MIDI songs for training as ASCII decimal, then a space, then the number of testing songs, then a newline
  • next the training songs, one per line, with an ASCII decimal label code then a space then the absolute filename of the MIDI file, then a newline
  • next the testing songs, one per line, as an absolute filename

The program is to output one integer prediction per line, for each test song, in order.

The program may assume the current directory is readable, searchable, and writable.