2010:Audio Music Similarity and Retrieval Results

Introduction

These are the results for the 2010 running of the Audio Music Similarity and Retrieval task set. For background information about this task set please refer to the Audio Music Similarity and Retrieval page.

Each system was given 7000 songs chosen from IMIRSEL's "uspop", "uscrap" and "american" "classical" and "sundry" collections. Each system then returned a 7000x7000 distance matrix. 100 songs were randomly selected from the 10 genre groups (10 per genre) as queries and the first 5 most highly ranked songs out of the 7000 were extracted for each query (after filtering out the query itself, returned results from the same artist were also omitted). Then, for each query, the returned results (candidates) from all participants were grouped and were evaluated by human graders using the Evalutron 6000 grading system. Each individual query/candidate set was evaluated by a single grader. For each query/candidate pair, graders provided two scores. Graders were asked to provide 1 categorical BROAD score with 3 categories: NS,SS,VS as explained below, and one FINE score (in the range from 0 to 10). A description and analysis is provided below.

The systems read in 30 second audio clips as their raw data. The same 30 second clips were used in the grading stage.

General Legend

Team ID

Sub code	Submission name	Abstract	Contributors
BWL1	MTG-AMS	PDF	Dmitry Bogdanov, Nicolas Wack, Cyril Laurier
ML1	Musical Audio Similarity Submission MIR	PDF	Mathieu Lagrange
ML2	Musical Audio Similarity Submission with Uvic First Version	PDF	Mathieu Lagrange
ML3	Musical Audio Similarity Submission with Uvic Second Version	PDF	Mathieu Lagrange
PS1	PS09	PDF	Tim Pohle, Dominik Schnitzer
PSS1	PSS10	PDF	Tim Pohle, Klaus Seyerlehner, Dominik Schnitzer
RZ1	RND	PDF	Rainer Zufall
SSPK2	cbmr_sim	PDF	Klaus Seyerlehner, Markus Schedl, Tim Pohle, Peter Knees
TLN1	Post-Processing 1 of Marsyas similarity results	PDF	George Tzanetakis, Mathieu Lagrange, Steven Ness
TLN2	Post-Processing 2 of Marsyas similarity results	PDF	George Tzanetakis, Mathieu Lagrange, Steven Ness
TNL1	MarsyasSimilarity	PDF	George Tzanetakis, Steven Ness, Mathieu Lagrange

Broad Categories

NS = Not Similar
SS = Somewhat Similar
VS = Very Similar

Understanding Summary Measures

Fine = Has a range from 0 (failure) to 10 (perfection).
Broad = Has a range from 0 (failure) to 2 (perfection) as each query/candidate pair is scored with either NS=0, SS=1 or VS=2.

Human Evaluation

Overall Summary Results

file /nema-raid/www/mirex/results/2010/ams/evalutron/summary_evalutron.csv not found

Friedman's Tests

Friedman's Test (FINE Scores)

The Friedman test was run in MATLAB against the Fine summary data over the 100 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

file /nema-raid/www/mirex/results/2010/ams/evalutron/evalutron.fine.friedman.tukeyKramerHSD.csv not found

https://music-ir.org/mirex/results/2010/ams/evalutron/small.evalutron.fine.friedman.tukeyKramerHSD.png

Friedman's Test (BROAD Scores)

The Friedman test was run in MATLAB against the BROAD summary data over the 100 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

file /nema-raid/www/mirex/results/2010/ams/evalutron/evalutron.cat.friedman.tukeyKramerHSD.csv not found

https://music-ir.org/mirex/results/2010/ams/evalutron/small.evalutron.cat.friedman.tukeyKramerHSD.png

Summary Results by Query

FINE Scores

These are the mean FINE scores per query assigned by Evalutron graders. The FINE scores for the 5 candidates returned per algorithm, per query, have been averaged. Values are bounded between 0.0 and 10.0. A perfect score would be 10. Genre labels have been included for reference.

file /nema-raid/www/mirex/results/2010/ams/evalutron/fine_scores.csv not found

BROAD Scores

These are the mean BROAD scores per query assigned by Evalutron graders. The BROAD scores for the 5 candidates returned per algorithm, per query, have been averaged. Values are bounded between 0 (not similar) and 2 (very similar). A perfect score would be 2. Genre labels have been included for reference.

file /nema-raid/www/mirex/results/2010/ams/evalutron/cat_scores.csv not found

Raw Scores

The raw data derived from the Evalutron 6000 human evaluations are located on the 2010:Audio Music Similarity and Retrieval Raw Data page.

Metadata and Distance Space Evaluation

The following reports provide evaluation statistics based on analysis of the distance space and metadata matches and include:

Neighbourhood clustering by artist, album and genre
Artist-filtered genre clustering
How often the triangular inequality holds
Statistics on 'hubs' (tracks similar to many tracks) and orphans (tracks that are not similar to any other tracks at N results).

Reports

ANO = Anonymous
BF1 = Benjamin Fields (chr12)
BF2 = Benjamin Fields (mfcc10)
BSWH1 = Dmitry Bogdanov, Joan Serrà, Nicolas Wack, and Perfecto Herrera (clas)
BSWH2 = Dmitry Bogdanov, Joan Serrà, Nicolas Wack, and Perfecto Herrera (hybrid)
CL1 = Chuan Cao, Ming Li
CL2 = Chuan Cao, Ming Li
GT = George Tzanetakis
LR = Thomas Lidy, Andreas Rauber]
ME1 = François Maillet, Douglas Eck (mlp)
ME2 = François Maillet, Douglas Eck (sda)
PS1 = Tim Pohle, Dominik Schnitzer (2007)
PS2 = Tim Pohle, Dominik Schnitzer (2010)
SH1 = Stephan Hübler
SH2 = Stephan Hübler

Run Times

file /nema-raid/www/mirex/results/2010/ams/audiosim.runtime.csv not found

2010:Audio Music Similarity and Retrieval Results

Contents

Introduction

General Legend

Team ID

Broad Categories

Understanding Summary Measures

Human Evaluation

Overall Summary Results

Friedman's Tests

Friedman's Test (FINE Scores)

Friedman's Test (BROAD Scores)

Summary Results by Query

FINE Scores

BROAD Scores

Raw Scores

Metadata and Distance Space Evaluation

Reports

Run Times

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools