Difference between revisions of "2007:Audio Music Similarity and Retrieval Results"

Revision as of 12:34, 17 December 2007

Introduction

These are the results for the 2007 running of the Audio Music Similarity and Retrieval task set. For background information about this task set please refer to the Audio Music Similarity and Retrieval page.

Each system was given 7000 songs chosen from IMIRSEL's "uspop", "uscrap" and "american" "classical" and "sundry" collections. Each system then returned a 7000x7000 distance matrix. 100 songs were randomly selected from the 10 genre groups (10 per genre) as queries and the first 5 most highly ranked songs out of the 7000 were extracted for each query (after filtering out the query itself, returned results from the same artist were also omitted). Then, for each query, the returned results (candidates) from all participants were grouped and were evaluated by human graders using the Evalutron 6000 grading system. Each individual query/candidate set was evaluated by a single grader. For each query/candidate pair, graders provided two scores. Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 10). A description and analysis is provided below.

The systems read in 30 second audio clips as their raw data. The same 30 second clips were used in the grading stage.

Summary Data on Human Evaluations (Evalutron 6000)

Number of evaluators = 20
Number of evaluations per query/candidate pair = 1
Number of queries per grader = 5
Size of the candidate lists = 48.32
Number of randomly selected queries = 100
Number of query/candidate pairs graded = 4832

General Legend

Team ID

BK1 = Klaas Bosteels, Etienne E. Kerre 1
BK2 = Klaas Bosteels, Etienne E. Kerre 2
CB1 = Christoph Bastuck 1
CB2 = Christoph Bastuck 2
CB3 = Christoph Bastuck 3
GT = George Tzanetakis
LB = Luke Barrington, Douglas Turnbull, David Torres, Gert Lanskriet
ME = Michael I. Mandel, Daniel P. W. Ellis
PC = Aliaksandr Paradzinets, Liming Chen
PS = Tim Pohle, Dominik Schnitzer
TL1 = Thomas Lidy, Andreas Rauber, Antonio Pertusa, Jos├⌐ Manuel I├▒esta 1
TL2 = Thomas Lidy, Andreas Rauber, Antonio Pertusa, Jos├⌐ Manuel I├▒esta 2

Broad Categories

NS = Not Similar
SS = Somewhat Similar
VS = Very Similar

Calculating Summary Measures

Fine⁽¹⁾ = Sum of fine-grained human similarity decisions (0-10).
PSum⁽¹⁾ = Sum of human broad similarity decisions: NS=0, SS=1, VS=2.
WCsum⁽¹⁾ = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar).
SDsum⁽¹⁾ = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar).
Greater0⁽¹⁾ = NS=0, SS=1, VS=1 (binary relevance judgement).
Greater1⁽¹⁾ = NS=0, SS=0, VS=1 (binary relevance judgement using only Very Similar).

⁽¹⁾Normalized to the range 0 to 1.

Overall Summary Results

NB: The results for BK2 were interpolated from partial data due to a runtime error.

file /nema-raid/www/mirex/results/ams07_overall_summary2.csv not found

Friedman Test with Multiple Comparisons Results (p=0.05)

The Friedman test was run in MATLAB against the Fine summary data over the 100 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05); file /nema-raid/www/mirex/results/ams07_sum_friedman_fine.csv not found file /nema-raid/www/mirex/results/ams07_detail_friedman_fine.csv not found

Summary Results by Query

These are the mean FINE scores per query assigned by Evalutron graders. The FINE scores for the 5 candidates returned per algorithm, per query, have been averaged. Values are bounded between 0.0 and 10.0. A perfect score would be 10. Genre labels have been included for reference.

file /nema-raid/www/mirex/results/ams07_fine_by_query_with_genre.csv not found

These are the mean BROAD scores per query assigned by Evalutron graders. The BROAD scores for the 5 candidates returned per algorithm, per query, have been averaged. Values are bounded between 0 (not similar) and 2 (very similar). A perfect score would be 2. Genre labels have been included for reference. file /nema-raid/www/mirex/results/ams07_broad_by_query_with_genre.csv not found

Anonymized Metadata

Raw Scores

The raw data derived from the Evalutron 6000 human evaluations are located on the Audio Music Similarity and Retrieval Raw Data page.

@@ Line 17: / Line 17: @@
 ====General Legend====
 =====Team ID=====
-'''BK1''' = [https://www.music-ir.org/mirex2007/abs/AS_bosteels.pdf Klaas Bosteels, Etienne E. Kerre 1]<br />
+'''BK1''' = [https://www.music-ir.org/mirex/2007/abs/AS_bosteels.pdf Klaas Bosteels, Etienne E. Kerre 1]<br />
-'''BK2''' = [https://www.music-ir.org/mirex2007/abs/AS_bosteels.pdf Klaas Bosteels, Etienne E. Kerre 2]<br />
+'''BK2''' = [https://www.music-ir.org/mirex/2007/abs/AS_bosteels.pdf Klaas Bosteels, Etienne E. Kerre 2]<br />
-'''CB1''' = [https://www.music-ir.org/mirex2007/abs/AS_bastuck.pdf Christoph Bastuck 1]<br />
+'''CB1''' = [https://www.music-ir.org/mirex/2007/abs/AS_bastuck.pdf Christoph Bastuck 1]<br />
-'''CB2''' = [https://www.music-ir.org/mirex2007/abs/AS_bastuck.pdf Christoph Bastuck 2]<br />
+'''CB2''' = [https://www.music-ir.org/mirex/2007/abs/AS_bastuck.pdf Christoph Bastuck 2]<br />
-'''CB3''' = [https://www.music-ir.org/mirex2007/abs/AS_bastuck.pdf Christoph Bastuck 3]<br />
+'''CB3''' = [https://www.music-ir.org/mirex/2007/abs/AS_bastuck.pdf Christoph Bastuck 3]<br />
-'''GT''' = [https://www.music-ir.org/mirex2007/abs/AI_CC_GC_MC_AS_tzanetakis.pdf George Tzanetakis] <br />
+'''GT''' = [https://www.music-ir.org/mirex/2007/abs/AI_CC_GC_MC_AS_tzanetakis.pdf George Tzanetakis] <br />
-'''LB''' = [https://www.music-ir.org/mirex2007/abs/AS_barrington.pdf Luke Barrington, Douglas Turnbull, David Torres, Gert Lanskriet]<br />
+'''LB''' = [https://www.music-ir.org/mirex/2007/abs/AS_barrington.pdf Luke Barrington, Douglas Turnbull, David Torres, Gert Lanskriet]<br />
-'''ME''' = [https://www.music-ir.org/mirex2007/abs/AI_CC_GC_MC_AS_mandel.pdf Michael I. Mandel, Daniel P. W. Ellis]<br />
+'''ME''' = [https://www.music-ir.org/mirex/2007/abs/AI_CC_GC_MC_AS_mandel.pdf Michael I. Mandel, Daniel P. W. Ellis]<br />
-'''PC''' = [https://www.music-ir.org/mirex2007/abs/AS_paradzinets.pdf Aliaksandr Paradzinets, Liming Chen]<br />
+'''PC''' = [https://www.music-ir.org/mirex/2007/abs/AS_paradzinets.pdf Aliaksandr Paradzinets, Liming Chen]<br />
-'''PS''' = [https://www.music-ir.org/mirex2007/abs/AS_pohle.pdf Tim Pohle, Dominik Schnitzer]<br />
+'''PS''' = [https://www.music-ir.org/mirex/2007/abs/AS_pohle.pdf Tim Pohle, Dominik Schnitzer]<br />
-'''TL1''' = [https://www.music-ir.org/mirex2007/abs/AI_CC_GC_MC_AS_lidy.pdf Thomas Lidy, Andreas Rauber, Antonio Pertusa, Jos├⌐ Manuel I├▒esta 1]<br />
+'''TL1''' = [https://www.music-ir.org/mirex/2007/abs/AI_CC_GC_MC_AS_lidy.pdf Thomas Lidy, Andreas Rauber, Antonio Pertusa, Jos├⌐ Manuel I├▒esta 1]<br />
-'''TL2''' = [https://www.music-ir.org/mirex2007/abs/AI_CC_GC_MC_AS_lidy.pdf Thomas Lidy, Andreas Rauber, Antonio Pertusa, Jos├⌐ Manuel I├▒esta 2]<br />
+'''TL2''' = [https://www.music-ir.org/mirex/2007/abs/AI_CC_GC_MC_AS_lidy.pdf Thomas Lidy, Andreas Rauber, Antonio Pertusa, Jos├⌐ Manuel I├▒esta 2]<br />
 ====Broad Categories====

Difference between revisions of "2007:Audio Music Similarity and Retrieval Results"

Revision as of 12:34, 17 December 2007

Contents

Introduction

Summary Data on Human Evaluations (Evalutron 6000)

General Legend

Team ID

Broad Categories

Calculating Summary Measures

Overall Summary Results

Friedman Test with Multiple Comparisons Results (p=0.05)

Summary Results by Query

Anonymized Metadata

Raw Scores

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools