Difference between revisions of "2011:Symbolic Melodic Similarity Results"

From MIREX Wiki
(General Legend)
(General Legend)
Line 43: Line 43:
  
 
! UL1
 
! UL1
| Pitch ||  style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2011/UL1.pdf PDF] || [http://julian-urbano.info Julián Urbano], [http://www.kr.inf.uc3m.es Juan Lloréns],[https://sites.google.com/site/jorgemorato/ Jorge Morato]
+
| Shape ||  style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2011/UL1.pdf PDF] || [http://julian-urbano.info Julián Urbano], [http://www.kr.inf.uc3m.es Juan Lloréns],[https://sites.google.com/site/jorgemorato/ Jorge Morato]
 
|-
 
|-
  
  
 
! UL2
 
! UL2
| Shape ||  style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2011/UL2.pdf PDF] || [http://julian-urbano.info Julián Urbano], [http://www.kr.inf.uc3m.es Juan Lloréns],[https://sites.google.com/site/jorgemorato/ Jorge Morato]
+
| Pitch ||  style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2011/UL2.pdf PDF] || [http://julian-urbano.info Julián Urbano], [http://www.kr.inf.uc3m.es Juan Lloréns],[https://sites.google.com/site/jorgemorato/ Jorge Morato]
 
|-
 
|-
  

Revision as of 22:07, 21 October 2011

Introduction

These are the results for the 2011 running of the Symbolic Melodic Similarity task set. For background information about this task set please refer to the 2011:Symbolic Melodic Similarity page.

Each system was given a query and returned the 10 most melodically similar songs from those taken from the Essen Collection (5274 pieces in the MIDI format; see ESAC Data Homepage for more information). For each query, we made four classes of error-mutations, thus the set comprises the following query classes:

  • 0. No errors
  • 1. One note deleted
  • 2. One note inserted
  • 3. One interval enlarged
  • 4. One interval compressed

For each query (and its 4 mutations), the returned results (candidates) from all systems were then grouped together (query set) for evaluation by the human graders. The graders were provide with only heard perfect version against which to evaluate the candidates and did not know whether the candidates came from a perfect or mutated query. Each query/candidate set was evaluated by 1 individual grader. Using the Evalutron 6000 system, the graders gave each query/candidate pair two types of scores. Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 100).

Evalutron 6000 Summary Data

Number of evaluators = 6
Number of evaluations per query/candidate pair = 1
Number of queries per grader = 1
Total number of candidates returned = 3900
Total number of unique query/candidate pairs graded = 895
Average number of query/candidate pairs evaluated per grader: 149
Number of queries = 6 (perfect) with each perfect query error-mutated 4 different ways = 30

General Legend

General Legend

Sub code Submission name Abstract Contributors
KW1 Tir'a'Mir - melodic cosine PDF Jacek Wolkowicz,Vlado Keselj
LJY1 LEE1 PDF Juwan Lee,Seokhwan Jo,Chang D. Yoo
LJY2 LEE2 PDF Juwan Lee,Seokhwan Jo,Chang D. Yoo
UL1 Shape PDF Julián Urbano, Juan Lloréns,Jorge Morato
UL2 Pitch PDF Julián Urbano, Juan Lloréns,Jorge Morato
UL3 Time PDF Julián Urbano, Juan Lloréns,Jorge Morato
WK1 Tir'a'Mir - binary PDF Jacek Wolkowicz, Vlado Keselj
WK2 Tir'a'Mir - melodic cosine PDF Jacek Wolkowicz, Vlado Keselj
WK3 Tir'a'Mir - cosine combine PDF Jacek Wolkowicz, Vlado Keselj
WK4 Tir'a'Mir - tfidf PDF Jacek Wolkowicz, Vlado Keselj
WK5 Tir'a'Mir - melodic bm25 PDF Jacek Wolkowicz, Vlado Keselj
WK6 Tir'a'Mir - bm25 combine PDF Jacek Wolkowicz, Vlado Keselj

Broad Categories

NS = Not Similar
SS = Somewhat Similar
VS = Very Similar

Table Headings

ADR = Average Dynamic Recall
NRGB = Normalize Recall at Group Boundaries
AP = Average Precision (non-interpolated)
PND = Precision at N Documents

Calculating Summary Measures

Fine(1) = Sum of fine-grained human similarity decisions (0-100).
PSum(1) = Sum of human broad similarity decisions: NS=0, SS=1, VS=2.
WCsum(1) = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar).
SDsum(1) = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar).
Greater0(1) = NS=0, SS=1, VS=1 (binary relevance judgment).
Greater1(1) = NS=0, SS=0, VS=1 (binary relevance judgment using only Very Similar).

(1)Normalized to the range 0 to 1.

Summary Results

Run Times

file /nema-raid/www/mirex/results/2011/sms/sms_runtimes.csv not found

Overall Scores (Includes Perfect and Error Candidates)

SCORE LJY1 LJY2 UL1 UL2 UL3 WK1 WK2 WK3 WK4 WK5 WK6
ADR 0.6446 0.6595 0.6508 0.6752 0.7257 0.6761 0.665 0.6565 0.6504 0.6543 0.6529
NRGB 0.6255 0.6396 0.6269 0.6512 0.6962 0.6522 0.644 0.6393 0.6319 0.6344 0.6332
AP 0.4807 0.498 0.6262 0.6241 0.6122 0.6104 0.581 0.4618 0.5381 0.5371 0.4969
PND 0.4974 0.5281 0.6625 0.6325 0.6211 0.6048 0.5914 0.497 0.5388 0.5519 0.5059
Fine 0.481 0.494 0.594 0.568 0.552 0.515 0.489 0.434 0.462 0.465 0.458
Wcsum 0.407 0.43 0.543 0.519 0.511 0.457 0.434 0.369 0.393 0.392 0.394
Psum 0.463 0.487 0.615 0.575 0.572 0.497 0.477 0.42 0.428 0.428 0.443
Sdsum 0.378 0.402 0.508 0.491 0.481 0.437 0.413 0.343 0.376 0.374 0.37
Greater0 0.633 0.657 0.83 0.743 0.753 0.617 0.603 0.573 0.533 0.537 0.59
Greater1 0.293 0.317 0.4 0.407 0.39 0.377 0.35 0.267 0.323 0.32 0.297

download these results as csv

Scores by Query Error Types

SCORE LJY1 LJY2 UL1 UL2 UL3 WK1 WK2 WK3 WK4 WK5 WK6
ADR 0.667 0.6883 0.6827 0.6983 0.7231 0.6817 0.692 0.6671 0.6781 0.6877 0.6658
NRGB 0.6446 0.6679 0.647 0.6729 0.6874 0.6627 0.6708 0.6577 0.6628 0.672 0.6463
AP 0.4849 0.5255 0.6537 0.6328 0.593 0.5977 0.6006 0.4694 0.5517 0.565 0.5053
PND 0.5143 0.5548 0.6952 0.6619 0.6048 0.6119 0.6119 0.5238 0.5548 0.5952 0.5238
Fine 0.472 0.503 0.631 0.595 0.564 0.522 0.473 0.434 0.42 0.445 0.477
Wcsum 0.4 0.428 0.589 0.533 0.528 0.444 0.411 0.372 0.328 0.35 0.417
Psum 0.458 0.492 0.667 0.6 0.592 0.492 0.458 0.425 0.367 0.392 0.467
Sdsum 0.371 0.396 0.55 0.5 0.496 0.421 0.388 0.346 0.308 0.329 0.392
Greater0 0.633 0.683 0.9 0.8 0.783 0.633 0.6 0.583 0.483 0.517 0.617
Greater1 0.283 0.3 0.433 0.4 0.4 0.35 0.317 0.267 0.25 0.267 0.317

download these results as csv

SCORE LJY1 LJY2 UL1 UL2 UL3 WK1 WK2 WK3 WK4 WK5 WK6
ADR 0.6675 0.6783 0.6247 0.6897 0.7581 0.7047 0.6656 0.6521 0.6417 0.6522 0.6211
NRGB 0.6462 0.6612 0.616 0.6674 0.7331 0.6826 0.6456 0.6312 0.6204 0.6306 0.6067
AP 0.4767 0.4851 0.6124 0.589 0.6036 0.6062 0.5331 0.4238 0.4781 0.4806 0.4584
PND 0.4722 0.4889 0.6444 0.5778 0.6333 0.5889 0.5222 0.4389 0.4722 0.4722 0.4833
Fine 0.472 0.503 0.631 0.595 0.564 0.522 0.473 0.434 0.42 0.445 0.477
Wcsum 0.4 0.428 0.589 0.533 0.528 0.444 0.411 0.372 0.328 0.35 0.417
Psum 0.458 0.492 0.667 0.6 0.592 0.492 0.458 0.425 0.367 0.392 0.467
Sdsum 0.371 0.396 0.55 0.5 0.496 0.421 0.388 0.346 0.308 0.329 0.392
Greater0 0.633 0.683 0.9 0.8 0.783 0.633 0.6 0.583 0.483 0.517 0.617
Greater1 0.283 0.3 0.433 0.4 0.4 0.35 0.317 0.267 0.25 0.267 0.317

download these results as csv

SCORE LJY1 LJY2 UL1 UL2 UL3 WK1 WK2 WK3 WK4 WK5 WK6
ADR 0.5947 0.6054 0.6572 0.6639 0.6978 0.6541 0.6426 0.6391 0.6262 0.6197 0.6524
NRGB 0.5819 0.5842 0.6373 0.6446 0.6645 0.6294 0.6289 0.624 0.6128 0.6136 0.6403
AP 0.4385 0.4497 0.5977 0.596 0.5875 0.5914 0.5511 0.4294 0.5161 0.5011 0.4977
PND 0.481 0.4952 0.6563 0.6119 0.5714 0.5786 0.6063 0.4571 0.5159 0.5048 0.4905
Fine 0.486 0.484 0.604 0.557 0.561 0.503 0.502 0.424 0.458 0.465 0.463
Wcsum 0.411 0.411 0.561 0.528 0.511 0.461 0.444 0.344 0.394 0.389 0.422
Psum 0.475 0.475 0.642 0.583 0.575 0.492 0.483 0.4 0.425 0.425 0.467
Sdsum 0.379 0.379 0.521 0.5 0.479 0.446 0.425 0.317 0.379 0.371 0.4
Greater0 0.667 0.667 0.883 0.75 0.767 0.583 0.6 0.567 0.517 0.533 0.6
Greater1 0.283 0.283 0.4 0.417 0.383 0.4 0.367 0.233 0.333 0.317 0.333

download these results as csv

SCORE LJY1 LJY2 UL1 UL2 UL3 WK1 WK2 WK3 WK4 WK5 WK6
ADR 0.6413 0.6543 0.6379 0.6933 0.7266 0.67 0.6579 0.6504 0.633 0.6416 0.653
NRGB 0.6181 0.6355 0.6119 0.6681 0.7034 0.6417 0.6315 0.6292 0.6199 0.6137 0.6217
AP 0.4774 0.4865 0.6142 0.6444 0.6303 0.5871 0.5831 0.4571 0.5296 0.5097 0.4656
PND 0.4694 0.5139 0.6333 0.6444 0.6333 0.5778 0.5667 0.4778 0.5389 0.5333 0.4611
Fine 0.465 0.473 0.537 0.537 0.556 0.465 0.479 0.417 0.454 0.431 0.404
Wcsum 0.383 0.417 0.478 0.483 0.528 0.4 0.411 0.35 0.406 0.35 0.317
Psum 0.425 0.467 0.533 0.525 0.583 0.433 0.45 0.392 0.433 0.383 0.358
Sdsum 0.363 0.392 0.45 0.463 0.5 0.383 0.392 0.329 0.392 0.333 0.296
Greater0 0.55 0.617 0.7 0.65 0.75 0.533 0.567 0.517 0.517 0.483 0.483
Greater1 0.3 0.317 0.367 0.4 0.417 0.333 0.333 0.267 0.35 0.283 0.233

download these results as csv

SCORE LJY1 LJY2 UL1 UL2 UL3 WK1 WK2 WK3 WK4 WK5 WK6
ADR 0.6526 0.671 0.6515 0.6309 0.7228 0.6698 0.667 0.674 0.6733 0.6699 0.6723
NRGB 0.6369 0.6492 0.622 0.6031 0.6928 0.6448 0.6431 0.6546 0.6436 0.6421 0.6513
AP 0.526 0.543 0.653 0.6585 0.6468 0.6697 0.6372 0.5294 0.615 0.6292 0.5577
PND 0.55 0.5875 0.6833 0.6667 0.6625 0.6667 0.65 0.5875 0.6125 0.6542 0.5708
Fine 0.485 0.495 0.571 0.546 0.522 0.534 0.479 0.44 0.479 0.497 0.458
Wcsum 0.417 0.439 0.517 0.494 0.478 0.483 0.433 0.389 0.417 0.439 0.389
Psum 0.475 0.492 0.583 0.55 0.533 0.525 0.475 0.442 0.45 0.475 0.442
Sdsum 0.388 0.413 0.483 0.467 0.45 0.463 0.413 0.363 0.4 0.421 0.363
Greater0 0.65 0.65 0.783 0.717 0.7 0.65 0.6 0.6 0.55 0.583 0.6
Greater1 0.3 0.333 0.383 0.383 0.367 0.4 0.35 0.283 0.35 0.367 0.283

download these results as csv

Friedman Test with Multiple Comparisons Results (p=0.05)

The Friedman test was run in MATLAB against the Fine summary data over the 30 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

TeamID TeamID Lowerbound Mean Upperbound Significance
UL1 UL2 -1.9997 0.4667 2.9331 FALSE
UL1 UL3 -0.8664 1.6000 4.0664 FALSE
UL1 WK1 0.4003 2.8667 5.3331 TRUE
UL1 LJY2 0.8003 3.2667 5.7331 TRUE
UL1 WK2 0.7336 3.2000 5.6664 TRUE
UL1 LJY1 1.5503 4.0167 6.4831 TRUE
UL1 WK5 1.2503 3.7167 6.1831 TRUE
UL1 WK4 1.2836 3.7500 6.2164 TRUE
UL1 WK3 1.4836 3.9500 6.4164 TRUE
UL2 UL3 -1.3331 1.1333 3.5997 FALSE
UL2 WK1 -0.0664 2.4000 4.8664 FALSE
UL2 LJY2 0.3336 2.8000 5.2664 TRUE
UL2 WK2 0.2669 2.7333 5.1997 TRUE
UL2 LJY1 1.0836 3.5500 6.0164 TRUE
UL2 WK5 0.7836 3.2500 5.7164 TRUE
UL2 WK4 0.8169 3.2833 5.7497 TRUE
UL2 WK3 1.0169 3.4833 5.9497 TRUE
UL3 WK1 -1.1997 1.2667 3.7331 FALSE
UL3 LJY2 -0.7997 1.6667 4.1331 FALSE
UL3 WK2 -0.8664 1.6000 4.0664 FALSE
UL3 LJY1 -0.0497 2.4167 4.8831 FALSE
UL3 WK5 -0.3497 2.1167 4.5831 FALSE
UL3 WK4 -0.3164 2.1500 4.6164 FALSE
UL3 WK3 -0.1164 2.3500 4.8164 FALSE
WK1 LJY2 -2.0664 0.4000 2.8664 FALSE
WK1 WK2 -2.1331 0.3333 2.7997 FALSE
WK1 LJY1 -1.3164 1.1500 3.6164 FALSE
WK1 WK5 -1.6164 0.8500 3.3164 FALSE
WK1 WK4 -1.5831 0.8833 3.3497 FALSE
WK1 WK3 -1.3831 1.0833 3.5497 FALSE
LJY2 WK2 -2.5331 -0.0667 2.3997 FALSE
LJY2 LJY1 -1.7164 0.7500 3.2164 FALSE
LJY2 WK5 -2.0164 0.4500 2.9164 FALSE
LJY2 WK4 -1.9831 0.4833 2.9497 FALSE
LJY2 WK3 -1.7831 0.6833 3.1497 FALSE
WK2 LJY1 -1.6497 0.8167 3.2831 FALSE
WK2 WK5 -1.9497 0.5167 2.9831 FALSE
WK2 WK4 -1.9164 0.5500 3.0164 FALSE
WK2 WK3 -1.7164 0.7500 3.2164 FALSE
LJY1 WK5 -2.7664 -0.3000 2.1664 FALSE
LJY1 WK4 -2.7331 -0.2667 2.1997 FALSE
LJY1 WK3 -2.5331 -0.0667 2.3997 FALSE
WK5 WK4 -2.4331 0.0333 2.4997 FALSE
WK5 WK3 -2.2331 0.2333 2.6997 FALSE
WK4 WK3 -2.2664 0.2000 2.6664 FALSE

download these results as csv

2011 sms fine scores friedmans.png

Raw Scores

The raw data derived from the Evalutron 6000 human evaluations are located on the 2011:Symbolic Melodic Similarity Raw Data page.