Difference between revisions of "2006:Audio Music Similarity and Retrieval Results"
Eliaspampalk (talk | contribs) (→Overall Summary Results) |
Eliaspampalk (talk | contribs) (→Audio Music Similarity and Retrieval Runtime Data) |
||
Line 44: | Line 44: | ||
<csv>as06_runtime.csv</csv> | <csv>as06_runtime.csv</csv> | ||
+ | |||
+ | For a description of the computers the submission ran on see [[MIREX_2006_Equipment]]. | ||
==Friedman Test with Multiple Comparisons Results (p=0.05)== | ==Friedman Test with Multiple Comparisons Results (p=0.05)== |
Revision as of 13:02, 3 October 2006
Contents
Introduction
These are the results for the 2006 running of the Audio Music Similarity and Retrieval task set. For background information about this task set please refer to the Audio Music Similarity and Retrieval page.
Number of evaluators = 2x
Number of evaluation per query/candidate pair = 3
Number of queries per grader = 7~8
Size of the candidate lists = Maximum 30 (with no overlap)
Number of randomly selected queries = 60
General Legend
Team ID
EP = Elias Pampalk
TP = Tim Pohle
VS = Vitor Soares
LR = Thomas Lidy and Andreas Rauber
KWT = Kris West (Trans)
KWL = Kris West (Likely)
Broad Categories
NS = Not Similar
SS = Somewhat Similar
VS = Very Similar
Calculating Summary Measures
Fine(1) = Sum of fine-grained human similarity decisions (0-10).
PSum(1) = Sum of human broad similarity decisions: NS=0, SS=1, VS=2.
WCsum(1) = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar).
SDsum(1) = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar).
Greater0(1) = NS=0, SS=1, VS=1 (binary relevance judgement).
Greater1(1) = NS=0, SS=0, VS=1 (binary relevance judgement using only Very Similar).
(1)Normalized to the range 0 to 1.
Overall Summary Results
file /nema-raid/www/mirex/results/mirex06_as_overalllist.csv not found
http://staff.aist.go.jp/elias.pampalk/papers/mirex06/friedman.png
This figure shows the official ranking of the submissions computed using a Friedman test. The blue lines indicate significance boundaries at the p=0.05 level. As can be seen, the differences are not significant. For a more detailed description and discussion see [1].
Audio Music Similarity and Retrieval Runtime Data
file /nema-raid/www/mirex/results/as06_runtime.csv not found
For a description of the computers the submission ran on see MIREX_2006_Equipment.
Friedman Test with Multiple Comparisons Results (p=0.05)
The Friedman test was run in MATLAB against the Fine summary data over the 60 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);
file /nema-raid/www/mirex/results/AV_sum_friedman.csv not found
file /nema-raid/www/mirex/results/AV_fine_result.csv not found
Summary Results by Query
file /nema-raid/www/mirex/results/mirex06_as_uberlist.csv not found
Raw Scores
The raw data derived from the Evalutron 6000 human evaluations are located on the Audio Music Similarity and Retrieval Raw Data page.