2005:Audio Key Finding
Contents
Proposer
Arpi Mardirossian, Ching-Hua Chuan and Elaine Chew (University of Southern California) mardiros@usc.edu
Title
Evaluation of Key Finding Algorithms
Description
Determination of the key is a prerequisite for any analysis of tonal music. As a result, extensive work has been done in the area of automatic key detection. However, among this plethora of key finding algorithms, what seems to be lacking is a formal and extensive evaluation process. We propose the evaluation of key-finding algorithms at the 2005 MIREX.
There are significant contributions in the area of key finding for both audio and symbolic representation. Thus another the same contest was also proposed for MIDI data. Algorithms that determine the key from audio should be robust enough to handle frequency interferences and harmonic effects caused by the use of multiple instruments.
Potential Participants
- Emilia G├│mez (egomez@iua.upf.es) and Perfecto Herrera (perfecto.herrera@iua.upf.es): [high].
- Steffen Pauws (steffen.pauws@philips.com): [high].
- Ching-Hua Chuan (chinghuc@usc.edu) and Elaine Chew (echew@usc.edu): [high].
- Ozgur Izmirli (oizm@conncoll.edu): [moderate].
- Yongwei Zhu (ywzhu@i2r.a-start.edu.sg) and Mohan Kankanhalli (mohan@comp.nus.edu.sg): [unknown].
Evaluation Procedures
The following evaluation outline is a general guideline that will be compatible with both audio and symbolic key finding algorithms. It is safe to assume that each key finding algorithm will have its own set of parameters. The creators of the system should pre-determine the optimal settings for the parameters. Once these settings are determined, an accuracy rate may be calculated. The input of the test should be some excerpt of the pieces in the test set and the output will be the key name, for example, C major or E flat minor. We plan to use pieces for which the keys are known, for example, symphonies and concertos by well-known composers where the keys are stated in the title of the piece. The excerpt will typically be the beginnings of the pieces as this is the only part of the piece for which establishing of the global and known key can be guaranteed.
The error analysis will center on comparing the key identified by the algorithm to the actual key of the piece. We will then determine how 'close' each identified key is to the corresponding correct key. Keys will be considered as 'close' if they have one of the following relationships: distance of perfect fifth, relative major and minor, and parallel major and minor. It can be assumed that if an algorithm returns a key that is closely related to the actual key then it is superior. We may then use this information to generate further metrics.
Clearly, the optimal parameters may vary for different styles of music, and by composer. If time permits and the systems allow, we may next focus on pieces for which the algorithm has identified an incorrect key under the optimal settings of the parameters and determine whether the incorrect assignments were due to improper parameter selection. We may then calculate the percent of the pieces that had an incorrect assignment under the optimal settings but have a correct assignment with other settings.
Relevant Test Collections
Audio data can be obtained from HNH Hong Kong International, Ltd. (http://www.naxos.com), if the agreement with the company is now in effect for MIR testing. We have determined that only fifteen to thirty second excerpts may be sufficient for key finding using audio data. Copyright regulations state that up to 33% of audio files may be copied without any violations of such regulations. This is advantageous since fifteen to thirty second excerpts will be well within this limit.
Review 1
The proposals contemplate two different evaluations for key estimation: one for MIDI and another one for Audio Data. Maybe these two proposals could be merged in a single one. At least part of the data could be shared among done by having a test collection including Audio Data and its MIDI representation, or MIDI representation and the Audio generated by a MIDI synthesizer. This way, we could evaluate and compare approaches dealing with MIDI & Audio.
Regarding the key estimation contest from audio data, it seems that only classical music is considered. It would be possible to generalize to some other styles? For instance popular music which key is known.
Regarding evaluation measures for audio data, it is said that "Keys will be considered as 'close' if they have one of the following relationships: distance of perfect fifth, relative major and minor, and parallel major and minor". What about tuning errors? In the case of audio, there are different tuning systems that can be used. The detection algorithm should be able to estimate where the key is "tuned" (A 440 or 442,...). Keys should be also considered as 'close' if they have a relationship of "1 semitone", to consider this difference between real key (according to its tuning) & labelled key (A major). In the case of MIDI, this problem does not appear.
Will it be some training data, so that participants can try their algorithms?
I cannot tell whether the suggested participants are willing to participate. Other potential candidate could be: Hendrik Purwins
Review 2
General comments: Title: Evaluation of Key Finding Algorithms Using Audio Data or Evaluation of Key Finding Algorithms Part 1 Description Paragraph: Par 2, Line 2 - sentence requires correction
The problem is well defined and the mentioned possible participants seem likely to participate.
Regarding the evaluation procedures, length of input excerpt would have to be determined (15 to 30 seconds - any studies on the ideal length?) Assumption of closeness:
- Perfect 5th: Is this generally accepted as an almost similar key?
- Parallel major or minor: Not too certain if this needs to be clarified (Ignore this comment if this is generally understood by the majority working in this field)
Based on the error analysis approach outlined, would the algorithm that performs best with the new parameter settings be considered superior ?
The test data are relevant. Are there any alternative data sets if the Naxos collection does not become available?
Downie's Comments
1. Am intrigued and heartened by the fact that both an audio and a symbolic version of the task has been proposed.
2. The modality question does arise and like Review #2, I would like to understand better the gradations of "failure" (i.e., the Perfect 5th issue), etc.
3. I would very much like to see a direct tie in with symbolic and audio data (i.e., a one-to-one match of score with audio), if possible.
4. Wonder if we could frame this for evaluation purposes as a more traditional IR task? For example, Find all pieces in Key X...find all pieces in a minor mode.....and the kicker...find all pieces transposed from their original keys!
Emmanuel's Comments
I was the one to decide that the original proposal on key finding should be split into two proposals on audio key finding and symbolic key finding. Indeed the audio and symbolic parts involve completely separate data and separate participants. From the committee point of view, this needs as much annotation and testing work as two independent proposals. I did not ask the authors about it, so it's not their fault.
I am strongly in favor of merging the two proposals into a single one again. But then the symbolic and audio data need to correspond to the same titles as much as possible, so that the performances can be compared. Can the RWC database or another database be used for it ? Also the participants need to submit algorithms for both tasks if possible. I suppose it won't be too hard for audio key finding algorithms to work also on symbolic data, since audio data may be easily synthesized from symbolic data using a conventional midi synthesizer.