2006:Audio Music Similarity and Retrieval

This page is devoted to discussions of the evaluation of Audio Music Similarity algorithms at MIREX 2006. Discussions have already begun on the MIREX 06 "AudioSim06" contest planning list and will be briefly digested here. A full digest of the discussions is available to subscribers from the MIREX 06 "AudioSim06" contest planning list archives.

As consensus is achieved on the planning list, a full proposal (Audio Music Similarity proposal) will be produced for the format of the evaluation, including pseudocode for the evaluation metric and suggested formats for submitted algorithms. A skeleton of proposal is already available on the Audio Music Similarity proposal page.



Although the automatic extraction of genre and artist labels from audio are interesting tasks, I (KW) believe that they are often used to evaluate more general music similarity techniques that compare two songs based on their audio content. These techniques are hard to evaluate directly, for example with listening tests, as it is not practical to have a human listener rank the similarities of even a small test collection for a number of queries, which might require many hours of listening. Therefore, We have begun discussion of other methods of evaluating music similarity techniques, such as the methods described in Logan & Saloman (A Music Similarity Function Based on Signal Analysis, ICME2001), where the most similar 5, 10 or 20 songs were retrieved and the average number of songs in the same genre, from the same artist and from the same album calculated. This evaluation could be extended to multiple genres if data is available. I believe it is also important that we evaluate other characteristics of these algorithms, such as the descriptor extraction time, query time and memory footprint (which may indicate the applicability of a technique to an application).