Difference between revisions of "2006:Audio Music Similarity and Retrieval Results"
(→Results from Automatic Evaluation) |
IMIRSELBot (talk | contribs) m (Robot: Automated text replacement (-http://www.music-ir.org/evaluation/MIREX/2006_abstracts/ +http://www.music-ir.org/mirex/abstracts/2006/)) |
||
(7 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
[[Category: Results]] | [[Category: Results]] | ||
==Introduction== | ==Introduction== | ||
− | These are the results for the 2006 running of the Audio Music Similarity and Retrieval task set. For background information about this task set please refer to the [[Audio Music Similarity and Retrieval]] page. | + | These are the results for the 2006 running of the Audio Music Similarity and Retrieval task set. For background information about this task set please refer to the [[2006:Audio Music Similarity and Retrieval]] page. |
− | Each system was given 5000 songs chosen from "uspop", "uscrap" and "cover song" collections. Each system then returned a 5000x5000 distance matrix. 60 songs were randomly selected as queries and the first 5 most highly ranked songs out of the 5000 were extracted for each query (after filtering out the query itself, returned results from the same artist and members of the cover song collection). Then, for each query, the returned results from all participants were grouped and were evaluated by human graders, each query being evaluated by 3 different graders with two scores (using the Evalutron 6000 system). Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 10). | + | Each system was given 5000 songs chosen from "uspop", "uscrap" and "cover song" collections. Each system then returned a 5000x5000 distance matrix. 60 songs were randomly selected as queries and the first 5 most highly ranked songs out of the 5000 were extracted for each query (after filtering out the query itself, returned results from the same artist and members of the cover song collection). Then, for each query, the returned results from all participants were grouped and were evaluated by human graders, each query being evaluated by 3 different graders with two scores (using the Evalutron 6000 system). Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 10). An automated statistical evaluation based on a metadata catalog was also conducted. A desciption and analysis is provided below. |
− | + | ===Summary Data on Human Evaluations (Evalutron 6000)=== | |
'''Number of evaluators''' = 24<br /> | '''Number of evaluators''' = 24<br /> | ||
'''Number of evaluation per query/candidate pair''' = 3<br /> | '''Number of evaluation per query/candidate pair''' = 3<br /> | ||
Line 12: | Line 12: | ||
'''Number of randomly selected queries''' = 60<br /> | '''Number of randomly selected queries''' = 60<br /> | ||
− | ===General Legend=== | + | ====General Legend==== |
− | ====Team ID==== | + | =====Team ID===== |
− | '''EP''' = [https://www.music-ir.org/ | + | '''EP''' = [https://www.music-ir.org/mirex/abstracts/2006/AS_pampalk.pdf Elias Pampalk]<br /> |
− | '''TP''' = [https://www.music-ir.org/ | + | '''TP''' = [https://www.music-ir.org/mirex/abstracts/2006/AS_pohle.pdf Tim Pohle]<br /> |
'''VS''' = Vitor Soares<br /> | '''VS''' = Vitor Soares<br /> | ||
− | '''LR''' = [https://www.music-ir.org/ | + | '''LR''' = [https://www.music-ir.org/mirex/abstracts/2006/AS_lidy.pdf Thomas Lidy and Andreas Rauber]<br /> |
'''KWT''' = Kris West (Trans)<br /> | '''KWT''' = Kris West (Trans)<br /> | ||
'''KWL''' = Kris West (Likely)<br /> | '''KWL''' = Kris West (Likely)<br /> | ||
− | ====Broad Categories==== | + | =====Broad Categories==== |
'''NS''' = Not Similar<br /> | '''NS''' = Not Similar<br /> | ||
'''SS''' = Somewhat Similar<br /> | '''SS''' = Somewhat Similar<br /> | ||
'''VS''' = Very Similar<br /> | '''VS''' = Very Similar<br /> | ||
− | ===Calculating Summary Measures=== | + | =====Calculating Summary Measures===== |
'''Fine'''<sup>(1)</sup> = Sum of fine-grained human similarity decisions (0-10). <br /> | '''Fine'''<sup>(1)</sup> = Sum of fine-grained human similarity decisions (0-10). <br /> | ||
'''PSum'''<sup>(1)</sup> = Sum of human broad similarity decisions: NS=0, SS=1, VS=2. <br /> | '''PSum'''<sup>(1)</sup> = Sum of human broad similarity decisions: NS=0, SS=1, VS=2. <br /> | ||
Line 36: | Line 36: | ||
<sup>(1)</sup>Normalized to the range 0 to 1. | <sup>(1)</sup>Normalized to the range 0 to 1. | ||
− | ==Overall Summary Results== | + | ===Overall Summary Results=== |
− | <csv>mirex06_as_overalllist.csv</csv> | + | <csv>2006/mirex06_as_overalllist.csv</csv> |
http://staff.aist.go.jp/elias.pampalk/papers/mirex06/friedman.png | http://staff.aist.go.jp/elias.pampalk/papers/mirex06/friedman.png | ||
Line 44: | Line 44: | ||
This figure shows the official ranking of the submissions computed using a Friedman test. The blue lines indicate significance boundaries at the p=0.05 level. As can be seen, the differences are not significant. For a more detailed description and discussion see [http://staff.aist.go.jp/elias.pampalk/papers/pam_mirex06.pdf]. | This figure shows the official ranking of the submissions computed using a Friedman test. The blue lines indicate significance boundaries at the p=0.05 level. As can be seen, the differences are not significant. For a more detailed description and discussion see [http://staff.aist.go.jp/elias.pampalk/papers/pam_mirex06.pdf]. | ||
− | ===Audio Music Similarity and Retrieval Runtime Data=== | + | ====Audio Music Similarity and Retrieval Runtime Data==== |
− | <csv>as06_runtime.csv</csv> | + | <csv>2006/as06_runtime.csv</csv> |
− | For a description of the computers the submission ran on see [[MIREX_2006_Equipment]]. | + | For a description of the computers the submission ran on see [[2006:MIREX_2006_Equipment]]. |
− | ==Friedman Test with Multiple Comparisons Results (p=0.05)== | + | ===Friedman Test with Multiple Comparisons Results (p=0.05)=== |
The Friedman test was run in MATLAB against the Fine summary data over the 60 queries.<br /> | The Friedman test was run in MATLAB against the Fine summary data over the 60 queries.<br /> | ||
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05); | Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05); | ||
− | <csv>AV_sum_friedman.csv</csv> | + | <csv>2006/AV_sum_friedman.csv</csv> |
− | <csv>AV_fine_result.csv</csv> | + | <csv>2006/AV_fine_result.csv</csv> |
− | ==Summary Results by Query== | + | ===Summary Results by Query=== |
− | <csv>mirex06_as_uberlist.csv</csv> | + | <csv>2006/mirex06_as_uberlist.csv</csv> |
+ | |||
+ | ===Raw Scores=== | ||
+ | The raw data derived from the Evalutron 6000 human evaluations are located on the [[2006:Audio Music Similarity and Retrieval Raw Data]] page. | ||
+ | |||
+ | ===Query Meta Data=== | ||
+ | <csv>2006/as06_queries.csv</csv> | ||
− | |||
− | |||
− | |||
− | |||
==Results from Automatic Evaluation== | ==Results from Automatic Evaluation== | ||
− | <csv>as06_nonhuman_results.csv</csv> | + | <csv>2006/as06_nonhuman_results.csv</csv> |
=== Other Results from Automatic Evaluation=== | === Other Results from Automatic Evaluation=== | ||
− | See [[Audio Music Similarity and Retrieval Other Automatic Evaluation Results]] page. | + | See [[2006:Audio Music Similarity and Retrieval Other Automatic Evaluation Results]] page. |
Line 138: | Line 140: | ||
* Establish optimal granularity or range of granularities for a genre catalogue to be used in this type of evaluation (8, 32 or 256 classes?) and integrate a confusion-cost matrix to reduce the penalisation of confusion between similar genres of music (e.g Punk and Heavy Metal) relative to confusion between highly dissimilar genres (e. Classical and Heavy Metal). | * Establish optimal granularity or range of granularities for a genre catalogue to be used in this type of evaluation (8, 32 or 256 classes?) and integrate a confusion-cost matrix to reduce the penalisation of confusion between similar genres of music (e.g Punk and Heavy Metal) relative to confusion between highly dissimilar genres (e. Classical and Heavy Metal). | ||
[[User:Kriswest|Kriswest]] | [[User:Kriswest|Kriswest]] | ||
+ | |||
=== Evaluation Tools in Music-2-Knowledge (M2K) === | === Evaluation Tools in Music-2-Knowledge (M2K) === | ||
Line 152: | Line 155: | ||
=== Comments === | === Comments === | ||
− | The evaluation statistics for the MIREX 2006 Audio music similarity contest seem to support the contention that genre, artist and artist-filtered genre neighbourhood statistics are correlated with the human perception of the performance of music similarity estimators as they all reproduce the ranking produced by the human evaluation. However, the differences between systems in that evaluation are not statistically significant, so no firm conclusion can be made. Average distance statistics produce a different ranking but are intended to correlate with visualisation performance and not search. [[User:Kriswest|Kriswest]] | + | The evaluation statistics for the MIREX 2006 Audio music similarity contest seem to support the contention that genre, artist and artist-filtered genre neighbourhood statistics are correlated with the human perception of the performance of music similarity estimators as they all reproduce the ranking produced by the human evaluation. However, the differences between systems in that evaluation are not statistically significant, so no firm conclusion can be made. Average distance statistics produce a different ranking but are intended to correlate with visualisation performance and not search. |
+ | [[User:Kriswest|Kriswest]] | ||
Line 158: | Line 162: | ||
As each statitic was found to be correlated with the results of the listening test, any *may* be used to evaluate performance and to guide model optimisation or feature selection/weighting experiments. However, unfiltered genre and artist identification statistics are known to allow overfitting to produce over-optimistic performance estimates. In a model optimisation or feature selection experiment these statistics will be more likely to indicate '''Audio-similarity''' performance rather than actual '''Music-similarity''' performance and may lead to the selection of sub-optimal features or models. The artist-filtered genre neighbourhood can be used to avoid this effect. | As each statitic was found to be correlated with the results of the listening test, any *may* be used to evaluate performance and to guide model optimisation or feature selection/weighting experiments. However, unfiltered genre and artist identification statistics are known to allow overfitting to produce over-optimistic performance estimates. In a model optimisation or feature selection experiment these statistics will be more likely to indicate '''Audio-similarity''' performance rather than actual '''Music-similarity''' performance and may lead to the selection of sub-optimal features or models. The artist-filtered genre neighbourhood can be used to avoid this effect. | ||
− | The results from MIREX 2006 do not show a significant drop in performance using the artist-filtered genre statistic as would normally be expected. This may be due to the excessively skewed distribution of examples in the database (roughly 50% of | + | The results from MIREX 2006 do not show a significant drop in performance using the artist-filtered genre statistic as would normally be expected. This may be due to the excessively skewed distribution of examples in the database (roughly 50% of examples are labelled as Rock/Pop, while a further 25% are Rap & Hip-Hop). Hence, the difference between the results produced and the random baseline are not well emphasized. Normalising this statistic by the prior probabilities of examples in the database (taking the mean of the diagonal of the artist-filtered genre confusion matrix) equally weights the contribution of each class to the final statistic and prevents performance on a single class dominating the statistic. This normalised statistic shows a drastic reduction in the performance estimates for each system and increases the relative distance between each of the systems in the evaluation. |
[[User:Kriswest|Kriswest]] | [[User:Kriswest|Kriswest]] | ||
− | |||
=== References === | === References === | ||
# [http://gatekeeper.research.compaq.com/pub/compaq/CRL/publications/logan/icme2001_logan.pdf Logan and Salomon (ICME 2001), '''A Music Similarity Function Based On Signal Analysis'''].<br>One of the first papers on this topic. Reports a small scale listening test (2 users) which rate items in a playlists as similar or not similar to the query song. In addition automatic evaluation is reported: percentage of top 5, 10, 20 most similar songs in the same genre/artist/album as query. | # [http://gatekeeper.research.compaq.com/pub/compaq/CRL/publications/logan/icme2001_logan.pdf Logan and Salomon (ICME 2001), '''A Music Similarity Function Based On Signal Analysis'''].<br>One of the first papers on this topic. Reports a small scale listening test (2 users) which rate items in a playlists as similar or not similar to the query song. In addition automatic evaluation is reported: percentage of top 5, 10, 20 most similar songs in the same genre/artist/album as query. | ||
# [http://www.ofai.at/~elias.pampalk/publications/pampalk06thesis.pdf E. Pampalk, '''Computational Models of Music Similarity and their Application in Music Information Retrieval.'''] <br>PhD thesis, Vienna University of Technology, Austria, March 2006 | # [http://www.ofai.at/~elias.pampalk/publications/pampalk06thesis.pdf E. Pampalk, '''Computational Models of Music Similarity and their Application in Music Information Retrieval.'''] <br>PhD thesis, Vienna University of Technology, Austria, March 2006 |
Latest revision as of 19:32, 13 May 2010
Contents
- 1 Introduction
- 2 Results from Automatic Evaluation
- 2.1 Other Results from Automatic Evaluation
- 2.2 Introduction to automatic evaluation
- 2.3 Description of evaluation statistics
- 2.4 Normalisation
- 2.5 Music-similarity evaluation issues
- 2.6 Directions for further work on evaluating audio music similarity
- 2.7 Evaluation Tools in Music-2-Knowledge (M2K)
- 2.8 Comments
- 2.9 A statistic for evaluation and use in selection & optimization experiments
- 2.10 References
Introduction
These are the results for the 2006 running of the Audio Music Similarity and Retrieval task set. For background information about this task set please refer to the 2006:Audio Music Similarity and Retrieval page.
Each system was given 5000 songs chosen from "uspop", "uscrap" and "cover song" collections. Each system then returned a 5000x5000 distance matrix. 60 songs were randomly selected as queries and the first 5 most highly ranked songs out of the 5000 were extracted for each query (after filtering out the query itself, returned results from the same artist and members of the cover song collection). Then, for each query, the returned results from all participants were grouped and were evaluated by human graders, each query being evaluated by 3 different graders with two scores (using the Evalutron 6000 system). Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 10). An automated statistical evaluation based on a metadata catalog was also conducted. A desciption and analysis is provided below.
Summary Data on Human Evaluations (Evalutron 6000)
Number of evaluators = 24
Number of evaluation per query/candidate pair = 3
Number of queries per grader = 7~8
Size of the candidate lists = Maximum 30 (with no overlap)
Number of randomly selected queries = 60
General Legend
Team ID
EP = Elias Pampalk
TP = Tim Pohle
VS = Vitor Soares
LR = Thomas Lidy and Andreas Rauber
KWT = Kris West (Trans)
KWL = Kris West (Likely)
=Broad Categories
NS = Not Similar
SS = Somewhat Similar
VS = Very Similar
Calculating Summary Measures
Fine(1) = Sum of fine-grained human similarity decisions (0-10).
PSum(1) = Sum of human broad similarity decisions: NS=0, SS=1, VS=2.
WCsum(1) = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar).
SDsum(1) = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar).
Greater0(1) = NS=0, SS=1, VS=1 (binary relevance judgement).
Greater1(1) = NS=0, SS=0, VS=1 (binary relevance judgement using only Very Similar).
(1)Normalized to the range 0 to 1.
Overall Summary Results
EP | TP | VS | LR | KWT | KWL | |
---|---|---|---|---|---|---|
Fine | 0.430 | 0.423 | 0.404 | 0.393 | 0.372 | 0.339 |
Psum | 0.425 | 0.411 | 0.388 | 0.374 | 0.349 | 0.313 |
Wcsum | 0.358 | 0.340 | 0.323 | 0.306 | 0.280 | 0.248 |
Sdsum | 0.324 | 0.305 | 0.290 | 0.271 | 0.246 | 0.216 |
Greater0 | 0.627 | 0.623 | 0.586 | 0.579 | 0.557 | 0.509 |
Greater1 | 0.223 | 0.199 | 0.191 | 0.169 | 0.142 | 0.118 |
http://staff.aist.go.jp/elias.pampalk/papers/mirex06/friedman.png
This figure shows the official ranking of the submissions computed using a Friedman test. The blue lines indicate significance boundaries at the p=0.05 level. As can be seen, the differences are not significant. For a more detailed description and discussion see [1].
Audio Music Similarity and Retrieval Runtime Data
Team ID | Machine | Run-time(seconds) | |
---|---|---|---|
EP | feature | beer 6 | 5889 |
EP | distant | beer 6 | 6066 |
KWT | feature | beer 6 | 29899 |
KWT | distant | beer 6 | 25352 |
KWL | both | beer 4 | 47698 |
LR | feature | beer 4 | 13794 |
LR | distant | beer 4 | 131 |
TP | feature | beer 8 | 14333 |
TP | distant | beer 8 | 3337 |
For a description of the computers the submission ran on see 2006:MIREX_2006_Equipment.
Friedman Test with Multiple Comparisons Results (p=0.05)
The Friedman test was run in MATLAB against the Fine summary data over the 60 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);
Friedman's ANOVA Table | |||||
---|---|---|---|---|---|
Source | SS | df | MS | Chi-sq | Prob>Chi-sq |
Columns | 84.7333 | 5 | 16.9467 | 24.2905 | 0.00019091 |
Error | 961.7667 | 295 | 3.2602 | ||
Total | 1046.5 | 359 |
TeamID | TeamID | Lowerbound | Mean | Upperbound | Significance |
---|---|---|---|---|---|
EP | TP | -0.963 | 0.008 | 0.980 | FALSE |
EP | VS | -0.755 | 0.217 | 1.188 | FALSE |
EP | LR | -0.630 | 0.342 | 1.313 | FALSE |
EP | KWT | -0.030 | 0.942 | 1.913 | FALSE |
EP | KWL | 0.320 | 1.292 | 2.263 | TRUE |
TP | VS | -0.763 | 0.208 | 1.180 | FALSE |
TP | LR | -0.638 | 0.333 | 1.305 | FALSE |
TP | KWT | -0.038 | 0.933 | 1.905 | FALSE |
TP | KWL | 0.312 | 1.283 | 2.255 | TRUE |
VS | LR | -0.847 | 0.125 | 1.097 | FALSE |
VS | KWT | -0.247 | 0.725 | 1.697 | FALSE |
VS | KWL | 0.103 | 1.075 | 2.047 | TRUE |
LR | KWT | -0.372 | 0.600 | 1.572 | FALSE |
LR | KWL | -0.022 | 0.950 | 1.922 | FALSE |
KWT | KWL | -0.622 | 0.350 | 1.322 | FALSE |
Summary Results by Query
Fine | ||||||
---|---|---|---|---|---|---|
queryID | EP | TP | VS | LR | KWT | KWL |
a001528 | 0.428 | 0.318 | 0.387 | 0.459 | 0.354 | 0.331 |
a004667 | 0.429 | 0.503 | 0.579 | 0.529 | 0.383 | 0.429 |
a000518 | 0.107 | 0.217 | 0.221 | 0.213 | 0.145 | 0.206 |
a002693 | 0.657 | 0.495 | 0.337 | 0.519 | 0.345 | 0.483 |
a004830 | 0.338 | 0.348 | 0.345 | 0.311 | 0.413 | 0.354 |
a002784 | 0.371 | 0.432 | 0.280 | 0.347 | 0.281 | 0.282 |
a005705 | 0.590 | 0.739 | 0.500 | 0.389 | 0.337 | 0.247 |
a006272 | 0.258 | 0.233 | 0.219 | 0.149 | 0.247 | 0.200 |
a007005 | 0.188 | 0.078 | 0.166 | 0.190 | 0.183 | 0.085 |
a008401 | 0.365 | 0.242 | 0.319 | 0.176 | 0.514 | 0.247 |
a008850 | 0.111 | 0.464 | 0.211 | 0.211 | 0.123 | 0.113 |
a007054 | 0.230 | 0.315 | 0.354 | 0.295 | 0.295 | 0.343 |
a008365 | 0.260 | 0.344 | 0.337 | 0.280 | 0.374 | 0.145 |
b000990 | 0.477 | 0.401 | 0.403 | 0.366 | 0.487 | 0.335 |
b001799 | 0.301 | 0.437 | 0.277 | 0.464 | 0.327 | 0.303 |
b001516 | 0.367 | 0.579 | 0.471 | 0.358 | 0.329 | 0.358 |
b002576 | 0.138 | 0.228 | 0.167 | 0.213 | 0.445 | 0.167 |
b004483 | 0.479 | 0.434 | 0.266 | 0.285 | 0.477 | 0.264 |
b006517 | 0.599 | 0.739 | 0.149 | 0.709 | 0.615 | 0.176 |
b005395 | 0.374 | 0.402 | 0.342 | 0.511 | 0.503 | 0.319 |
b007493 | 0.809 | 0.677 | 0.736 | 0.429 | 0.577 | 0.785 |
b005447 | 0.447 | 0.342 | 0.553 | 0.254 | 0.426 | 0.419 |
b009401 | 0.639 | 0.513 | 0.695 | 0.612 | 0.557 | 0.464 |
b006979 | 0.228 | 0.219 | 0.442 | 0.314 | 0.158 | 0.224 |
b012801 | 0.316 | 0.215 | 0.278 | 0.443 | 0.419 | 0.527 |
b008611 | 0.337 | 0.329 | 0.277 | 0.299 | 0.247 | 0.227 |
b013992 | 0.266 | 0.309 | 0.334 | 0.331 | 0.219 | 0.207 |
b015082 | 0.497 | 0.530 | 0.494 | 0.527 | 0.280 | 0.500 |
b015991 | 0.818 | 0.617 | 0.784 | 0.787 | 0.305 | 0.385 |
b009364 | 0.393 | 0.412 | 0.535 | 0.443 | 0.489 | 0.384 |
a007915 | 0.656 | 0.500 | 0.690 | 0.373 | 0.248 | 0.397 |
a002856 | 0.309 | 0.164 | 0.173 | 0.167 | 0.420 | 0.307 |
a000751 | 0.653 | 0.475 | 0.353 | 0.305 | 0.171 | 0.244 |
a002907 | 0.487 | 0.509 | 0.376 | 0.156 | 0.269 | 0.315 |
a000193 | 0.344 | 0.425 | 0.296 | 0.305 | 0.199 | 0.321 |
b006599 | 0.353 | 0.300 | 0.363 | 0.483 | 0.201 | 0.245 |
b010953 | 0.631 | 0.564 | 0.747 | 0.731 | 0.497 | 0.502 |
a003397 | 0.325 | 0.361 | 0.223 | 0.315 | 0.161 | 0.167 |
a006525 | 0.448 | 0.421 | 0.192 | 0.231 | 0.397 | 0.155 |
b012279 | 0.592 | 0.306 | 0.564 | 0.305 | 0.537 | 0.565 |
a004526 | 0.497 | 0.401 | 0.421 | 0.461 | 0.590 | 0.485 |
b010504 | 0.727 | 0.594 | 0.631 | 0.615 | 0.377 | 0.428 |
b017426 | 0.393 | 0.423 | 0.396 | 0.424 | 0.449 | 0.467 |
b011185 | 0.611 | 0.455 | 0.524 | 0.547 | 0.353 | 0.465 |
b011453 | 0.475 | 0.317 | 0.292 | 0.410 | 0.500 | 0.535 |
b006618 | 0.480 | 0.534 | 0.584 | 0.499 | 0.526 | 0.559 |
b017223 | 0.059 | 0.202 | 0.179 | 0.202 | 0.452 | 0.467 |
a001530 | 0.615 | 0.593 | 0.337 | 0.625 | 0.550 | 0.312 |
b019063 | 0.445 | 0.403 | 0.383 | 0.433 | 0.398 | 0.162 |
b005063 | 0.587 | 0.711 | 0.334 | 0.235 | 0.507 | 0.271 |
a004035 | 0.495 | 0.557 | 0.530 | 0.516 | 0.299 | 0.292 |
a003713 | 0.425 | 0.409 | 0.259 | 0.321 | 0.467 | 0.427 |
b015200 | 0.198 | 0.236 | 0.109 | 0.198 | 0.140 | 0.174 |
a004755 | 0.556 | 0.584 | 0.699 | 0.493 | 0.481 | 0.437 |
b019276 | 0.310 | 0.447 | 0.360 | 0.393 | 0.210 | 0.347 |
b018901 | 0.711 | 0.743 | 0.796 | 0.717 | 0.602 | 0.446 |
b005570 | 0.334 | 0.363 | 0.424 | 0.331 | 0.268 | 0.298 |
b006144 | 0.513 | 0.565 | 0.671 | 0.600 | 0.537 | 0.421 |
b002169 | 0.274 | 0.426 | 0.423 | 0.344 | 0.307 | 0.386 |
b016133 | 0.487 | 0.305 | 0.441 | 0.415 | 0.330 | 0.226 |
Ave. Fine Score: | 0.430 | 0.423 | 0.404 | 0.393 | 0.372 | 0.339 |
Psum | ||||||
queryID | EP | TP | VS | LR | KWT | KWL |
a001528 | 0.400 | 0.300 | 0.333 | 0.433 | 0.300 | 0.233 |
a004667 | 0.367 | 0.400 | 0.467 | 0.467 | 0.300 | 0.333 |
a000518 | 0.033 | 0.167 | 0.200 | 0.267 | 0.133 | 0.267 |
a002693 | 0.700 | 0.467 | 0.300 | 0.500 | 0.367 | 0.467 |
a004830 | 0.300 | 0.267 | 0.233 | 0.267 | 0.400 | 0.300 |
a002784 | 0.433 | 0.467 | 0.333 | 0.433 | 0.300 | 0.267 |
a005705 | 0.633 | 0.800 | 0.500 | 0.367 | 0.267 | 0.233 |
a006272 | 0.167 | 0.133 | 0.100 | 0.000 | 0.067 | 0.100 |
a007005 | 0.167 | 0.067 | 0.167 | 0.167 | 0.167 | 0.133 |
a008401 | 0.267 | 0.133 | 0.233 | 0.067 | 0.567 | 0.167 |
a008850 | 0.033 | 0.433 | 0.200 | 0.133 | 0.067 | 0.067 |
a007054 | 0.200 | 0.267 | 0.333 | 0.267 | 0.233 | 0.400 |
a008365 | 0.167 | 0.300 | 0.300 | 0.233 | 0.367 | 0.067 |
b000990 | 0.567 | 0.433 | 0.400 | 0.367 | 0.533 | 0.333 |
b001799 | 0.367 | 0.500 | 0.267 | 0.500 | 0.367 | 0.367 |
b001516 | 0.367 | 0.633 | 0.467 | 0.300 | 0.333 | 0.300 |
b002576 | 0.100 | 0.233 | 0.200 | 0.233 | 0.433 | 0.133 |
b004483 | 0.533 | 0.433 | 0.200 | 0.233 | 0.533 | 0.200 |
b006517 | 0.633 | 0.833 | 0.067 | 0.767 | 0.667 | 0.133 |
b005395 | 0.467 | 0.500 | 0.433 | 0.633 | 0.600 | 0.400 |
b007493 | 0.900 | 0.733 | 0.867 | 0.533 | 0.567 | 0.900 |
b005447 | 0.433 | 0.400 | 0.667 | 0.167 | 0.467 | 0.433 |
b009401 | 0.733 | 0.533 | 0.733 | 0.700 | 0.667 | 0.567 |
b006979 | 0.200 | 0.200 | 0.433 | 0.300 | 0.133 | 0.200 |
b012801 | 0.267 | 0.100 | 0.267 | 0.400 | 0.433 | 0.567 |
b008611 | 0.267 | 0.300 | 0.233 | 0.200 | 0.200 | 0.133 |
b013992 | 0.300 | 0.267 | 0.267 | 0.233 | 0.200 | 0.067 |
b015082 | 0.533 | 0.633 | 0.567 | 0.567 | 0.267 | 0.633 |
b015991 | 0.967 | 0.733 | 0.900 | 0.967 | 0.300 | 0.333 |
b009364 | 0.333 | 0.300 | 0.500 | 0.367 | 0.433 | 0.267 |
a007915 | 0.733 | 0.567 | 0.800 | 0.300 | 0.267 | 0.367 |
a002856 | 0.300 | 0.067 | 0.067 | 0.133 | 0.467 | 0.333 |
a000751 | 0.733 | 0.433 | 0.267 | 0.200 | 0.067 | 0.100 |
a002907 | 0.367 | 0.500 | 0.367 | 0.100 | 0.200 | 0.300 |
a000193 | 0.367 | 0.433 | 0.267 | 0.233 | 0.100 | 0.233 |
b006599 | 0.167 | 0.133 | 0.167 | 0.300 | 0.100 | 0.167 |
b010953 | 0.767 | 0.600 | 0.800 | 0.900 | 0.500 | 0.633 |
a003397 | 0.300 | 0.400 | 0.200 | 0.267 | 0.133 | 0.100 |
a006525 | 0.467 | 0.400 | 0.167 | 0.200 | 0.400 | 0.067 |
b012279 | 0.600 | 0.233 | 0.600 | 0.233 | 0.533 | 0.567 |
a004526 | 0.433 | 0.333 | 0.333 | 0.433 | 0.567 | 0.500 |
b010504 | 0.767 | 0.633 | 0.700 | 0.667 | 0.333 | 0.367 |
b017426 | 0.400 | 0.533 | 0.433 | 0.467 | 0.467 | 0.533 |
b011185 | 0.533 | 0.333 | 0.467 | 0.500 | 0.233 | 0.433 |
b011453 | 0.433 | 0.233 | 0.200 | 0.367 | 0.433 | 0.433 |
b006618 | 0.533 | 0.567 | 0.633 | 0.533 | 0.500 | 0.600 |
b017223 | 0.033 | 0.100 | 0.133 | 0.167 | 0.500 | 0.500 |
a001530 | 0.733 | 0.633 | 0.367 | 0.667 | 0.600 | 0.333 |
b019063 | 0.500 | 0.433 | 0.367 | 0.400 | 0.367 | 0.100 |
b005063 | 0.567 | 0.700 | 0.367 | 0.167 | 0.533 | 0.233 |
a004035 | 0.500 | 0.567 | 0.633 | 0.533 | 0.167 | 0.233 |
a003713 | 0.400 | 0.333 | 0.067 | 0.267 | 0.433 | 0.367 |
b015200 | 0.167 | 0.267 | 0.067 | 0.200 | 0.100 | 0.200 |
a004755 | 0.467 | 0.533 | 0.600 | 0.367 | 0.400 | 0.333 |
b019276 | 0.300 | 0.500 | 0.333 | 0.467 | 0.133 | 0.333 |
b018901 | 0.700 | 0.667 | 0.800 | 0.700 | 0.600 | 0.367 |
b005570 | 0.333 | 0.367 | 0.367 | 0.333 | 0.200 | 0.200 |
b006144 | 0.433 | 0.533 | 0.667 | 0.600 | 0.433 | 0.333 |
b002169 | 0.167 | 0.400 | 0.467 | 0.333 | 0.233 | 0.367 |
b016133 | 0.467 | 0.267 | 0.433 | 0.333 | 0.300 | 0.167 |
Ave. Psum Score: | 0.425 | 0.411 | 0.388 | 0.374 | 0.349 | 0.313 |
WCsum | ||||||
queryID | EP | TP | VS | LR | KWT | KWL |
a001528 | 0.289 | 0.200 | 0.244 | 0.333 | 0.222 | 0.178 |
a004667 | 0.289 | 0.289 | 0.356 | 0.356 | 0.200 | 0.244 |
a000518 | 0.022 | 0.111 | 0.133 | 0.200 | 0.089 | 0.200 |
a002693 | 0.622 | 0.333 | 0.200 | 0.400 | 0.289 | 0.400 |
a004830 | 0.200 | 0.178 | 0.156 | 0.178 | 0.289 | 0.200 |
a002784 | 0.356 | 0.378 | 0.267 | 0.356 | 0.222 | 0.200 |
a005705 | 0.578 | 0.733 | 0.422 | 0.267 | 0.178 | 0.156 |
a006272 | 0.111 | 0.089 | 0.067 | 0.000 | 0.044 | 0.067 |
a007005 | 0.111 | 0.044 | 0.111 | 0.133 | 0.111 | 0.089 |
a008401 | 0.200 | 0.133 | 0.200 | 0.044 | 0.511 | 0.111 |
a008850 | 0.022 | 0.333 | 0.133 | 0.089 | 0.044 | 0.044 |
a007054 | 0.156 | 0.222 | 0.244 | 0.200 | 0.156 | 0.311 |
a008365 | 0.111 | 0.200 | 0.200 | 0.178 | 0.289 | 0.044 |
b000990 | 0.533 | 0.356 | 0.311 | 0.289 | 0.467 | 0.244 |
b001799 | 0.267 | 0.400 | 0.178 | 0.378 | 0.289 | 0.244 |
b001516 | 0.311 | 0.600 | 0.422 | 0.267 | 0.311 | 0.267 |
b002576 | 0.067 | 0.156 | 0.133 | 0.156 | 0.356 | 0.111 |
b004483 | 0.422 | 0.311 | 0.156 | 0.200 | 0.444 | 0.133 |
b006517 | 0.556 | 0.800 | 0.044 | 0.711 | 0.622 | 0.111 |
b005395 | 0.378 | 0.422 | 0.356 | 0.556 | 0.533 | 0.311 |
b007493 | 0.867 | 0.711 | 0.822 | 0.467 | 0.511 | 0.867 |
b005447 | 0.378 | 0.356 | 0.600 | 0.133 | 0.400 | 0.378 |
b009401 | 0.644 | 0.400 | 0.644 | 0.600 | 0.556 | 0.489 |
b006979 | 0.156 | 0.156 | 0.378 | 0.267 | 0.111 | 0.178 |
b012801 | 0.200 | 0.067 | 0.200 | 0.311 | 0.289 | 0.467 |
b008611 | 0.200 | 0.222 | 0.200 | 0.133 | 0.133 | 0.089 |
b013992 | 0.200 | 0.178 | 0.178 | 0.156 | 0.133 | 0.044 |
b015082 | 0.489 | 0.556 | 0.467 | 0.467 | 0.222 | 0.578 |
b015991 | 0.956 | 0.667 | 0.867 | 0.956 | 0.267 | 0.267 |
b009364 | 0.244 | 0.200 | 0.400 | 0.311 | 0.356 | 0.178 |
a007915 | 0.667 | 0.489 | 0.756 | 0.244 | 0.200 | 0.311 |
a002856 | 0.222 | 0.044 | 0.067 | 0.089 | 0.356 | 0.244 |
a000751 | 0.689 | 0.422 | 0.200 | 0.133 | 0.044 | 0.067 |
a002907 | 0.311 | 0.444 | 0.311 | 0.067 | 0.133 | 0.222 |
a000193 | 0.311 | 0.378 | 0.222 | 0.178 | 0.067 | 0.178 |
b006599 | 0.111 | 0.089 | 0.111 | 0.244 | 0.067 | 0.111 |
b010953 | 0.711 | 0.511 | 0.756 | 0.867 | 0.422 | 0.578 |
a003397 | 0.222 | 0.333 | 0.156 | 0.222 | 0.111 | 0.067 |
a006525 | 0.444 | 0.378 | 0.133 | 0.200 | 0.378 | 0.044 |
b012279 | 0.533 | 0.200 | 0.556 | 0.200 | 0.467 | 0.489 |
a004526 | 0.356 | 0.244 | 0.289 | 0.378 | 0.467 | 0.378 |
b010504 | 0.689 | 0.533 | 0.644 | 0.622 | 0.289 | 0.289 |
b017426 | 0.333 | 0.444 | 0.311 | 0.378 | 0.356 | 0.444 |
b011185 | 0.422 | 0.267 | 0.333 | 0.378 | 0.178 | 0.311 |
b011453 | 0.356 | 0.178 | 0.133 | 0.289 | 0.311 | 0.356 |
b006618 | 0.422 | 0.511 | 0.556 | 0.444 | 0.378 | 0.511 |
b017223 | 0.022 | 0.067 | 0.111 | 0.111 | 0.422 | 0.422 |
a001530 | 0.644 | 0.511 | 0.311 | 0.556 | 0.489 | 0.244 |
b019063 | 0.422 | 0.356 | 0.289 | 0.311 | 0.311 | 0.067 |
b005063 | 0.511 | 0.622 | 0.267 | 0.111 | 0.400 | 0.178 |
a004035 | 0.422 | 0.489 | 0.556 | 0.422 | 0.111 | 0.178 |
a003713 | 0.356 | 0.267 | 0.044 | 0.200 | 0.356 | 0.267 |
b015200 | 0.111 | 0.200 | 0.044 | 0.133 | 0.067 | 0.133 |
a004755 | 0.378 | 0.444 | 0.489 | 0.244 | 0.311 | 0.267 |
b019276 | 0.222 | 0.378 | 0.267 | 0.378 | 0.089 | 0.222 |
b018901 | 0.600 | 0.556 | 0.733 | 0.600 | 0.533 | 0.244 |
b005570 | 0.244 | 0.289 | 0.289 | 0.222 | 0.133 | 0.133 |
b006144 | 0.333 | 0.422 | 0.578 | 0.533 | 0.356 | 0.289 |
b002169 | 0.111 | 0.378 | 0.378 | 0.244 | 0.156 | 0.311 |
b016133 | 0.356 | 0.178 | 0.378 | 0.244 | 0.222 | 0.133 |
Ave. WCsum Score: | 0.358 | 0.340 | 0.323 | 0.306 | 0.280 | 0.248 |
Sdsum | ||||||
queryID | EP | TP | VS | LR | KWT | KWL |
a001528 | 0.233 | 0.150 | 0.200 | 0.283 | 0.183 | 0.150 |
a004667 | 0.250 | 0.233 | 0.300 | 0.300 | 0.150 | 0.200 |
a000518 | 0.017 | 0.083 | 0.100 | 0.167 | 0.067 | 0.167 |
a002693 | 0.583 | 0.267 | 0.150 | 0.350 | 0.250 | 0.367 |
a004830 | 0.150 | 0.133 | 0.117 | 0.133 | 0.233 | 0.150 |
a002784 | 0.317 | 0.333 | 0.233 | 0.317 | 0.183 | 0.167 |
a005705 | 0.550 | 0.700 | 0.383 | 0.217 | 0.133 | 0.117 |
a006272 | 0.083 | 0.067 | 0.050 | 0.000 | 0.033 | 0.050 |
a007005 | 0.083 | 0.033 | 0.083 | 0.117 | 0.083 | 0.067 |
a008401 | 0.167 | 0.133 | 0.183 | 0.033 | 0.483 | 0.083 |
a008850 | 0.017 | 0.283 | 0.100 | 0.067 | 0.033 | 0.033 |
a007054 | 0.133 | 0.200 | 0.200 | 0.167 | 0.117 | 0.267 |
a008365 | 0.083 | 0.150 | 0.150 | 0.150 | 0.250 | 0.033 |
b000990 | 0.517 | 0.317 | 0.267 | 0.250 | 0.433 | 0.200 |
b001799 | 0.217 | 0.350 | 0.133 | 0.317 | 0.250 | 0.183 |
b001516 | 0.283 | 0.583 | 0.400 | 0.250 | 0.300 | 0.250 |
b002576 | 0.050 | 0.117 | 0.100 | 0.117 | 0.317 | 0.100 |
b004483 | 0.367 | 0.250 | 0.133 | 0.183 | 0.400 | 0.100 |
b006517 | 0.517 | 0.783 | 0.033 | 0.683 | 0.600 | 0.100 |
b005395 | 0.333 | 0.383 | 0.317 | 0.517 | 0.500 | 0.267 |
b007493 | 0.850 | 0.700 | 0.800 | 0.433 | 0.483 | 0.850 |
b005447 | 0.350 | 0.333 | 0.567 | 0.117 | 0.367 | 0.350 |
b009401 | 0.600 | 0.333 | 0.600 | 0.550 | 0.500 | 0.450 |
b006979 | 0.133 | 0.133 | 0.350 | 0.250 | 0.100 | 0.167 |
b012801 | 0.167 | 0.050 | 0.167 | 0.267 | 0.217 | 0.417 |
b008611 | 0.167 | 0.183 | 0.183 | 0.100 | 0.100 | 0.067 |
b013992 | 0.150 | 0.133 | 0.133 | 0.117 | 0.100 | 0.033 |
b015082 | 0.467 | 0.517 | 0.417 | 0.417 | 0.200 | 0.550 |
b015991 | 0.950 | 0.633 | 0.850 | 0.950 | 0.250 | 0.233 |
b009364 | 0.200 | 0.150 | 0.350 | 0.283 | 0.317 | 0.133 |
a007915 | 0.633 | 0.450 | 0.733 | 0.217 | 0.167 | 0.283 |
a002856 | 0.183 | 0.033 | 0.067 | 0.067 | 0.300 | 0.200 |
a000751 | 0.667 | 0.417 | 0.167 | 0.100 | 0.033 | 0.050 |
a002907 | 0.283 | 0.417 | 0.283 | 0.050 | 0.100 | 0.183 |
a000193 | 0.283 | 0.350 | 0.200 | 0.150 | 0.050 | 0.150 |
b006599 | 0.083 | 0.067 | 0.083 | 0.217 | 0.050 | 0.083 |
b010953 | 0.683 | 0.467 | 0.733 | 0.850 | 0.383 | 0.550 |
a003397 | 0.183 | 0.300 | 0.133 | 0.200 | 0.100 | 0.050 |
a006525 | 0.433 | 0.367 | 0.117 | 0.200 | 0.367 | 0.033 |
b012279 | 0.500 | 0.183 | 0.533 | 0.183 | 0.433 | 0.450 |
a004526 | 0.317 | 0.200 | 0.267 | 0.350 | 0.417 | 0.317 |
b010504 | 0.650 | 0.483 | 0.617 | 0.600 | 0.267 | 0.250 |
b017426 | 0.300 | 0.400 | 0.250 | 0.333 | 0.300 | 0.400 |
b011185 | 0.367 | 0.233 | 0.267 | 0.317 | 0.150 | 0.250 |
b011453 | 0.317 | 0.150 | 0.100 | 0.250 | 0.250 | 0.317 |
b006618 | 0.367 | 0.483 | 0.517 | 0.400 | 0.317 | 0.467 |
b017223 | 0.017 | 0.050 | 0.100 | 0.083 | 0.383 | 0.383 |
a001530 | 0.600 | 0.450 | 0.283 | 0.500 | 0.433 | 0.200 |
b019063 | 0.383 | 0.317 | 0.250 | 0.267 | 0.283 | 0.050 |
b005063 | 0.483 | 0.583 | 0.217 | 0.083 | 0.333 | 0.150 |
a004035 | 0.383 | 0.450 | 0.517 | 0.367 | 0.083 | 0.150 |
a003713 | 0.333 | 0.233 | 0.033 | 0.167 | 0.317 | 0.217 |
b015200 | 0.083 | 0.167 | 0.033 | 0.100 | 0.050 | 0.100 |
a004755 | 0.333 | 0.400 | 0.433 | 0.183 | 0.267 | 0.233 |
b019276 | 0.183 | 0.317 | 0.233 | 0.333 | 0.067 | 0.167 |
b018901 | 0.550 | 0.500 | 0.700 | 0.550 | 0.500 | 0.183 |
b005570 | 0.200 | 0.250 | 0.250 | 0.167 | 0.100 | 0.100 |
b006144 | 0.283 | 0.367 | 0.533 | 0.500 | 0.317 | 0.267 |
b002169 | 0.083 | 0.367 | 0.333 | 0.200 | 0.117 | 0.283 |
b016133 | 0.300 | 0.133 | 0.350 | 0.200 | 0.183 | 0.117 |
Ave. SDsum Score: | 0.324 | 0.305 | 0.290 | 0.271 | 0.246 | 0.216 |
Greater0 | ||||||
queryID | EP | TP | VS | LR | KWT | KWL |
a001528 | 0.733 | 0.600 | 0.600 | 0.733 | 0.533 | 0.400 |
a004667 | 0.600 | 0.733 | 0.800 | 0.800 | 0.600 | 0.600 |
a000518 | 0.067 | 0.333 | 0.400 | 0.467 | 0.267 | 0.467 |
a002693 | 0.933 | 0.867 | 0.600 | 0.800 | 0.600 | 0.667 |
a004830 | 0.600 | 0.533 | 0.467 | 0.533 | 0.733 | 0.600 |
a002784 | 0.667 | 0.733 | 0.533 | 0.667 | 0.533 | 0.467 |
a005705 | 0.800 | 1.000 | 0.733 | 0.667 | 0.533 | 0.467 |
a006272 | 0.333 | 0.267 | 0.200 | 0.000 | 0.133 | 0.200 |
a007005 | 0.333 | 0.133 | 0.333 | 0.267 | 0.333 | 0.267 |
a008401 | 0.467 | 0.133 | 0.333 | 0.133 | 0.733 | 0.333 |
a008850 | 0.067 | 0.733 | 0.400 | 0.267 | 0.133 | 0.133 |
a007054 | 0.333 | 0.400 | 0.600 | 0.467 | 0.467 | 0.667 |
a008365 | 0.333 | 0.600 | 0.600 | 0.400 | 0.600 | 0.133 |
b000990 | 0.667 | 0.667 | 0.667 | 0.600 | 0.733 | 0.600 |
b001799 | 0.667 | 0.800 | 0.533 | 0.867 | 0.600 | 0.733 |
b001516 | 0.533 | 0.733 | 0.600 | 0.400 | 0.400 | 0.400 |
b002576 | 0.200 | 0.467 | 0.400 | 0.467 | 0.667 | 0.200 |
b004483 | 0.867 | 0.800 | 0.333 | 0.333 | 0.800 | 0.400 |
b006517 | 0.867 | 0.933 | 0.133 | 0.933 | 0.800 | 0.200 |
b005395 | 0.733 | 0.733 | 0.667 | 0.867 | 0.800 | 0.667 |
b007493 | 1.000 | 0.800 | 1.000 | 0.733 | 0.733 | 1.000 |
b005447 | 0.600 | 0.533 | 0.867 | 0.267 | 0.667 | 0.600 |
b009401 | 1.000 | 0.933 | 1.000 | 1.000 | 1.000 | 0.800 |
b006979 | 0.333 | 0.333 | 0.600 | 0.400 | 0.200 | 0.267 |
b012801 | 0.467 | 0.200 | 0.467 | 0.667 | 0.867 | 0.867 |
b008611 | 0.467 | 0.533 | 0.333 | 0.400 | 0.400 | 0.267 |
b013992 | 0.600 | 0.533 | 0.533 | 0.467 | 0.400 | 0.133 |
b015082 | 0.667 | 0.867 | 0.867 | 0.867 | 0.400 | 0.800 |
b015991 | 1.000 | 0.933 | 1.000 | 1.000 | 0.400 | 0.533 |
b009364 | 0.600 | 0.600 | 0.800 | 0.533 | 0.667 | 0.533 |
a007915 | 0.933 | 0.800 | 0.933 | 0.467 | 0.467 | 0.533 |
a002856 | 0.533 | 0.133 | 0.067 | 0.267 | 0.800 | 0.600 |
a000751 | 0.867 | 0.467 | 0.467 | 0.400 | 0.133 | 0.200 |
a002907 | 0.533 | 0.667 | 0.533 | 0.200 | 0.400 | 0.533 |
a000193 | 0.533 | 0.600 | 0.400 | 0.400 | 0.200 | 0.400 |
b006599 | 0.333 | 0.267 | 0.333 | 0.467 | 0.200 | 0.333 |
b010953 | 0.933 | 0.867 | 0.933 | 1.000 | 0.733 | 0.800 |
a003397 | 0.533 | 0.600 | 0.333 | 0.400 | 0.200 | 0.200 |
a006525 | 0.533 | 0.467 | 0.267 | 0.200 | 0.467 | 0.133 |
b012279 | 0.800 | 0.333 | 0.733 | 0.333 | 0.733 | 0.800 |
a004526 | 0.667 | 0.600 | 0.467 | 0.600 | 0.867 | 0.867 |
b010504 | 1.000 | 0.933 | 0.867 | 0.800 | 0.467 | 0.600 |
b017426 | 0.600 | 0.800 | 0.800 | 0.733 | 0.800 | 0.800 |
b011185 | 0.867 | 0.533 | 0.867 | 0.867 | 0.400 | 0.800 |
b011453 | 0.667 | 0.400 | 0.400 | 0.600 | 0.800 | 0.667 |
b006618 | 0.867 | 0.733 | 0.867 | 0.800 | 0.867 | 0.867 |
b017223 | 0.067 | 0.200 | 0.200 | 0.333 | 0.733 | 0.733 |
a001530 | 1.000 | 1.000 | 0.533 | 1.000 | 0.933 | 0.600 |
b019063 | 0.733 | 0.667 | 0.600 | 0.667 | 0.533 | 0.200 |
b005063 | 0.733 | 0.933 | 0.667 | 0.333 | 0.933 | 0.400 |
a004035 | 0.733 | 0.800 | 0.867 | 0.867 | 0.333 | 0.400 |
a003713 | 0.533 | 0.533 | 0.133 | 0.467 | 0.667 | 0.667 |
b015200 | 0.333 | 0.467 | 0.133 | 0.400 | 0.200 | 0.400 |
a004755 | 0.733 | 0.800 | 0.933 | 0.733 | 0.667 | 0.533 |
b019276 | 0.533 | 0.867 | 0.533 | 0.733 | 0.267 | 0.667 |
b018901 | 1.000 | 1.000 | 1.000 | 1.000 | 0.800 | 0.733 |
b005570 | 0.600 | 0.600 | 0.600 | 0.667 | 0.400 | 0.400 |
b006144 | 0.733 | 0.867 | 0.933 | 0.800 | 0.667 | 0.467 |
b002169 | 0.333 | 0.467 | 0.733 | 0.600 | 0.467 | 0.533 |
b016133 | 0.800 | 0.533 | 0.600 | 0.600 | 0.533 | 0.267 |
Ave. greater0 Score: | 0.627 | 0.623 | 0.586 | 0.579 | 0.557 | 0.509 |
Greater1 | ||||||
queryID | EP | TP | VS | LR | KWT | KWL |
a001528 | 0.067 | 0.000 | 0.067 | 0.133 | 0.067 | 0.067 |
a004667 | 0.133 | 0.067 | 0.133 | 0.133 | 0.000 | 0.067 |
a000518 | 0.000 | 0.000 | 0.000 | 0.067 | 0.000 | 0.067 |
a002693 | 0.467 | 0.067 | 0.000 | 0.200 | 0.133 | 0.267 |
a004830 | 0.000 | 0.000 | 0.000 | 0.000 | 0.067 | 0.000 |
a002784 | 0.200 | 0.200 | 0.133 | 0.200 | 0.067 | 0.067 |
a005705 | 0.467 | 0.600 | 0.267 | 0.067 | 0.000 | 0.000 |
a006272 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
a007005 | 0.000 | 0.000 | 0.000 | 0.067 | 0.000 | 0.000 |
a008401 | 0.067 | 0.133 | 0.133 | 0.000 | 0.400 | 0.000 |
a008850 | 0.000 | 0.133 | 0.000 | 0.000 | 0.000 | 0.000 |
a007054 | 0.067 | 0.133 | 0.067 | 0.067 | 0.000 | 0.133 |
a008365 | 0.000 | 0.000 | 0.000 | 0.067 | 0.133 | 0.000 |
b000990 | 0.467 | 0.200 | 0.133 | 0.133 | 0.333 | 0.067 |
b001799 | 0.067 | 0.200 | 0.000 | 0.133 | 0.133 | 0.000 |
b001516 | 0.200 | 0.533 | 0.333 | 0.200 | 0.267 | 0.200 |
b002576 | 0.000 | 0.000 | 0.000 | 0.000 | 0.200 | 0.067 |
b004483 | 0.200 | 0.067 | 0.067 | 0.133 | 0.267 | 0.000 |
b006517 | 0.400 | 0.733 | 0.000 | 0.600 | 0.533 | 0.067 |
b005395 | 0.200 | 0.267 | 0.200 | 0.400 | 0.400 | 0.133 |
b007493 | 0.800 | 0.667 | 0.733 | 0.333 | 0.400 | 0.800 |
b005447 | 0.267 | 0.267 | 0.467 | 0.067 | 0.267 | 0.267 |
b009401 | 0.467 | 0.133 | 0.467 | 0.400 | 0.333 | 0.333 |
b006979 | 0.067 | 0.067 | 0.267 | 0.200 | 0.067 | 0.133 |
b012801 | 0.067 | 0.000 | 0.067 | 0.133 | 0.000 | 0.267 |
b008611 | 0.067 | 0.067 | 0.133 | 0.000 | 0.000 | 0.000 |
b013992 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
b015082 | 0.400 | 0.400 | 0.267 | 0.267 | 0.133 | 0.467 |
b015991 | 0.933 | 0.533 | 0.800 | 0.933 | 0.200 | 0.133 |
b009364 | 0.067 | 0.000 | 0.200 | 0.200 | 0.200 | 0.000 |
a007915 | 0.533 | 0.333 | 0.667 | 0.133 | 0.067 | 0.200 |
a002856 | 0.067 | 0.000 | 0.067 | 0.000 | 0.133 | 0.067 |
a000751 | 0.600 | 0.400 | 0.067 | 0.000 | 0.000 | 0.000 |
a002907 | 0.200 | 0.333 | 0.200 | 0.000 | 0.000 | 0.067 |
a000193 | 0.200 | 0.267 | 0.133 | 0.067 | 0.000 | 0.067 |
b006599 | 0.000 | 0.000 | 0.000 | 0.133 | 0.000 | 0.000 |
b010953 | 0.600 | 0.333 | 0.667 | 0.800 | 0.267 | 0.467 |
a003397 | 0.067 | 0.200 | 0.067 | 0.133 | 0.067 | 0.000 |
a006525 | 0.400 | 0.333 | 0.067 | 0.200 | 0.333 | 0.000 |
b012279 | 0.400 | 0.133 | 0.467 | 0.133 | 0.333 | 0.333 |
a004526 | 0.200 | 0.067 | 0.200 | 0.267 | 0.267 | 0.133 |
b010504 | 0.533 | 0.333 | 0.533 | 0.533 | 0.200 | 0.133 |
b017426 | 0.200 | 0.267 | 0.067 | 0.200 | 0.133 | 0.267 |
b011185 | 0.200 | 0.133 | 0.067 | 0.133 | 0.067 | 0.067 |
b011453 | 0.200 | 0.067 | 0.000 | 0.133 | 0.067 | 0.200 |
b006618 | 0.200 | 0.400 | 0.400 | 0.267 | 0.133 | 0.333 |
b017223 | 0.000 | 0.000 | 0.067 | 0.000 | 0.267 | 0.267 |
a001530 | 0.467 | 0.267 | 0.200 | 0.333 | 0.267 | 0.067 |
b019063 | 0.267 | 0.200 | 0.133 | 0.133 | 0.200 | 0.000 |
b005063 | 0.400 | 0.467 | 0.067 | 0.000 | 0.133 | 0.067 |
a004035 | 0.267 | 0.333 | 0.400 | 0.200 | 0.000 | 0.067 |
a003713 | 0.267 | 0.133 | 0.000 | 0.067 | 0.200 | 0.067 |
b015200 | 0.000 | 0.067 | 0.000 | 0.000 | 0.000 | 0.000 |
a004755 | 0.200 | 0.267 | 0.267 | 0.000 | 0.133 | 0.133 |
b019276 | 0.067 | 0.133 | 0.133 | 0.200 | 0.000 | 0.000 |
b018901 | 0.400 | 0.333 | 0.600 | 0.400 | 0.400 | 0.000 |
b005570 | 0.067 | 0.133 | 0.133 | 0.000 | 0.000 | 0.000 |
b006144 | 0.133 | 0.200 | 0.400 | 0.400 | 0.200 | 0.200 |
b002169 | 0.000 | 0.333 | 0.200 | 0.067 | 0.000 | 0.200 |
b016133 | 0.133 | 0.000 | 0.267 | 0.067 | 0.067 | 0.067 |
Ave. greater1 Score: | 0.223 | 0.199 | 0.191 | 0.169 | 0.142 | 0.118 |
Raw Scores
The raw data derived from the Evalutron 6000 human evaluations are located on the 2006:Audio Music Similarity and Retrieval Raw Data page.
Query Meta Data
queryID | artist | genre |
---|---|---|
a001528 | Xpression | Jazz |
a004667 | The Tony Rich Project | R&B |
a000518 | Junior C. | Reggae |
a002693 | B.J. Thomas | Country |
a004830 | Luciano & Co | Reggae |
a002784 | Elton John | Rock |
a005705 | Jessica | R&B |
a006272 | Orlando Barroso | Latin |
a007005 | Big Time Operator | Jazz |
a008401 | Prince Malachi | Reggae |
a008850 | Elida y Avante | Latin |
a007054 | Profyle | R&B |
a008365 | Barbara Sfraga | Jazz |
b000990 | Guns N' Roses | Rock |
b001799 | Enya | New Age |
b001516 | Britney Spears | Rock |
b002576 | Depeche Mode | Rock |
b004483 | Elvis Costello | Rock |
b006517 | Paul Van Dyk | Electronica & Dance |
b005395 | Ozzy Osbourne | Rock |
b007493 | Eminem | Rap & Hip Hop |
b005447 | Mudvayne | Rock |
b009401 | Ja Rule | Rap & Hip Hop |
b006979 | Cat Stevens | Rock |
b012801 | The Chemical Brothers | Electronica & Dance |
b008611 | The Cranberries | Rock |
b013992 | Enigma | New Age |
b015082 | DMX | Rap & Hip Hop |
b015991 | Tim McGraw | Country |
b009364 | Bon Jovi | Rock |
a007915 | Victor Sanz | Country |
a002856 | Atomic Babies | Electronica & Dance |
a000751 | Brian Hughes | Jazz |
a002907 | Gary Meek | Jazz |
a000193 | Mercurio | Latin |
b006599 | Selena | Latin |
b010953 | Jessica Andrews | Country |
a003397 | Roy Davis Jr. | Electronica & Dance |
a006525 | Wind Machine | New Age |
b012279 | OutKast | Rap & Hip Hop |
a004526 | Shannon | R&B |
b010504 | LL Cool J | Rap & Hip Hop |
b017426 | Shaggy | Reggae |
b011185 | Sting | Rock |
b011453 | Neil Young | Rock |
b006618 | Foo Fighters | Rock |
b017223 | Nirvana | Rock |
a001530 | M?tley Cr?e | Rock |
b019063 | Smashing Pumpkins | Rock |
b005063 | Sublime | Rock |
a004035 | Toy-Box | Electronica & Dance |
a003713 | Brian Bromberg | Jazz |
b015200 | Mike Oldfield | New Age |
a004755 | Profyle | R&B |
b019276 | Robbie Williams | Rock |
b018901 | Nelly | Rap & Hip Hop |
b005570 | Everything But the Girl | Rock |
b006144 | Def Leppard | Rock |
b002169 | No Doubt | Rock |
b016133 | Janet Jackson | Rock |
Results from Automatic Evaluation
Pohle | Pampalk | Lidy & Rauber | West (Trans) | West (Likely) | |
---|---|---|---|---|---|
top20genre% | 60.84% | 60.64% | 56.96% | 53.18% | 47.76% |
top20artist% | 41.32% | 34.73% | 27.73% | 20.68% | 15.85% |
top20album% | 36.57% | 30.54% | 32.16% | 24.72% | 19.42% |
artist-filtered genre% | 58.91% | 60.70% | 56.71% | 54.06% | 49.20% |
mean artist-filtered genre% | 27.27% | 28.27% | 26.01% | 21.62% | 19.56% |
avg dist - genre | 0.6970 | 0.9924 | 0.9524 | 0.9738 | 0.9830 |
avg dist - artist | 0.4244 | 0.9772 | 0.7339 | 0.8734 | 0.6010 |
avg dist - album | 0.3721 | 0.9758 | 0.7205 | 0.8689 | 0.5702 |
triangular inequality | 32.02% | 100.00% | 100.00% | 100.00% | 55.08% |
top20always-sim | 260 | 1928 | 137 | 173 | 90 |
top20never-sim% | 0.0% | 0.0% | 0.0% | 0.0% | 0.0% |
Other Results from Automatic Evaluation
See 2006:Audio Music Similarity and Retrieval Other Automatic Evaluation Results page.
Introduction to automatic evaluation
Automated evaluation of music similarity techniques based on a metadata catalogue has several advantages:
- It does not require costly human ΓÇÿgradersΓÇÖ
- Allows testing of incremental changes in indexing algorithms
- Can achieve complete coverage over the test collection
- Provides a target for machine-learning, feature-selection and optimisation experiments
- Can predict the visualisation performance of an indexing technique
- Can identify indexing ΓÇÿanomoliesΓÇÖ in the indices tested
Automated ΓÇÿpseudo-objectiveΓÇÖ evaluation of music similarity estimation techniques was introduced by Logan & Saloman [1] and were shown to be highly correlated with careful human-based evaluations by Pampalk [2]. The results of this contest support the conclusions of Pampalk [2] although further work is required to fully understand the evaluation statistics.
Description of evaluation statistics
The evaluation statistics
- Neighbourhood clustering (artist, genre, album)
- average % of the top N results for each query in the collection with the same same label
- Artist-filtered genre neighbourhood
- average % of the top N results for each query belonging to the same genre label, ignoring matches from the same artist (ensures that results reflect musical not audio similarity)
- Mean Artist-filtered genre neighbourhood
- normalised form of the above statistic equally weighting each genre, penalising lop-sided performance.
- Normalised average distance between examples
- average distance between examples with the same label, indicates degree of clustering and potential for visual organisation of a collection
- Always similar (hubs)
- largest # of times an example appears in top N results for other queries, a result that appears too often will adversely affect performance without affecting other statistics
- Never similar (orphans)
- % of examples that never appear in a top N result list and cannot be retrieved by search
- Triangular inequality (metric space)
- indicates whether the function produces a metric distance space and therefore what visualisation techniques may be applied to it
Normalisation
Each of the neighbourhood statistics, described above, has been normalised by the number of examples of each class (a genre, album or artist) that is available in the test database. E.g. if the collection contained 20 tracks by a particular artist and a particular system retrieved 10 of those examples in its top 50 results it would normally achieve an artist neighbourhood score of 20%, while the normalised form of the metric would report a score of 50% (of the available matches were retrieved). Such normalisation is intended to avoid bias introduced into the results by skewed distribution of the examples according to each label set.
The mean artist-filtered genre neighbourhood is a normalised form of the artist-filter genre neighbourhood metric, which gives equal weight to performance of a system on each genre class. This version of the statistic is intended to match the prior probabilities or distribtuion of examples according to genre labels used as queries in the human listening test (where an equal number of examples from each class was selected - stratified random sampling) instead of the prior probabilities or distribution of examples appearing in the database.
Music-similarity evaluation issues
Care must be taken with all evaluations of audio musical similarity estimation techniques as there is a great potential for over-fitting in these experiments and for over-optimistic estimates of the performance of a system on novel test data to be produced.
The metadata catalog used to conduct automated evaluations should be as accurate as possible. However, this technique seems relatively robust to a degree of noise in the catalogue, parhaps due to its coarse granularity.
Small test collections do not allow us to accurately predict performance on larger test collections, for example:
- Indexing anomalies (ΓÇÿhubsΓÇÖ and ΓÇÿorphansΓÇÖ) cannot yet be understood.
- a single ΓÇÿhubΓÇÖ was found in the results of one system
- appeared in nearly 2/5 of result lists
- removing this one example from the collection of 5000 tracks makes it appear that the system does not suffer from indexing anomolies.
- What will be the number and coverage of ΓÇÿhubsΓÇÖ in a 100,000 song DB?
- a single ΓÇÿhubΓÇÖ was found in the results of one system
Directions for further work on evaluating audio music similarity
- Establish whether stratified sampling used in the human evals is optimal for producing results that reflect human perception of the quality of music indexes or whether the database should be sampled randomly.
- will influence selection of a statistic for use in automated evaluations or optimisation exps (artist-filtered genre or the mean artist-filtered genre).
- Explain the indexing anomalies in some techniques.
- Determine a safe minimum size for a test collection to be used to predict performance on an ΓÇÿindustrial-sizedΓÇÖ collection
- Establish optimal granularity or range of granularities for a genre catalogue to be used in this type of evaluation (8, 32 or 256 classes?) and integrate a confusion-cost matrix to reduce the penalisation of confusion between similar genres of music (e.g Punk and Heavy Metal) relative to confusion between highly dissimilar genres (e. Classical and Heavy Metal).
Evaluation Tools in Music-2-Knowledge (M2K)
The tools used to produce the evaluation statistics for MIREX 2006 will be released as part of M2K 1.2 (forthcoming). These tools provide services to:
- import collection metadata and distance matrices
- generate a stratified query set
- extract artist-filtered results (for use in human evaluation exps)
- calculate any of the evaluation statistics described above.
These tools may be used on the command-line by implementing the MIREX distance matrix file format, with M2K in the Data-2-Knowledge toolkit (D2K) or integrated into existing Java code with the new M2K API.
To obtain a copy of the evaluation tools prior to the M2K 1.2 release, contact Kris West
Comments
The evaluation statistics for the MIREX 2006 Audio music similarity contest seem to support the contention that genre, artist and artist-filtered genre neighbourhood statistics are correlated with the human perception of the performance of music similarity estimators as they all reproduce the ranking produced by the human evaluation. However, the differences between systems in that evaluation are not statistically significant, so no firm conclusion can be made. Average distance statistics produce a different ranking but are intended to correlate with visualisation performance and not search. Kriswest
A statistic for evaluation and use in selection & optimization experiments
As each statitic was found to be correlated with the results of the listening test, any *may* be used to evaluate performance and to guide model optimisation or feature selection/weighting experiments. However, unfiltered genre and artist identification statistics are known to allow overfitting to produce over-optimistic performance estimates. In a model optimisation or feature selection experiment these statistics will be more likely to indicate Audio-similarity performance rather than actual Music-similarity performance and may lead to the selection of sub-optimal features or models. The artist-filtered genre neighbourhood can be used to avoid this effect.
The results from MIREX 2006 do not show a significant drop in performance using the artist-filtered genre statistic as would normally be expected. This may be due to the excessively skewed distribution of examples in the database (roughly 50% of examples are labelled as Rock/Pop, while a further 25% are Rap & Hip-Hop). Hence, the difference between the results produced and the random baseline are not well emphasized. Normalising this statistic by the prior probabilities of examples in the database (taking the mean of the diagonal of the artist-filtered genre confusion matrix) equally weights the contribution of each class to the final statistic and prevents performance on a single class dominating the statistic. This normalised statistic shows a drastic reduction in the performance estimates for each system and increases the relative distance between each of the systems in the evaluation. Kriswest
References
- Logan and Salomon (ICME 2001), A Music Similarity Function Based On Signal Analysis.
One of the first papers on this topic. Reports a small scale listening test (2 users) which rate items in a playlists as similar or not similar to the query song. In addition automatic evaluation is reported: percentage of top 5, 10, 20 most similar songs in the same genre/artist/album as query. - E. Pampalk, Computational Models of Music Similarity and their Application in Music Information Retrieval.
PhD thesis, Vienna University of Technology, Austria, March 2006