2015:Symbolic Melodic Similarity Results

From MIREX Wiki
Revision as of 19:34, 20 October 2015 by Kahyun Choi (talk | contribs) (Evalutron 6000 Summary Data)

Introduction

These are the results for the 2015 running of the Symbolic Melodic Similarity task set. For background information about this task set please refer to the 2015:Symbolic Melodic Similarity page.

Each system was given a query and returned the 10 most melodically similar songs from those taken from the Essen Collection (5274 pieces in the MIDI format; see ESAC Data Homepage for more information). For each query, we made four classes of error-mutations, thus the set comprises the following query classes:

  • 0. No errors
  • 1. One note deleted
  • 2. One note inserted
  • 3. One interval enlarged
  • 4. One interval compressed

For each query (and its 4 mutations), the returned results (candidates) from all systems were then grouped together (query set) for evaluation by the human graders. The graders were provide with only heard perfect version against which to evaluate the candidates and did not know whether the candidates came from a perfect or mutated query. Each query/candidate set was evaluated by 1 individual grader. Using the Evalutron 6000 system, the graders gave each query/candidate pair two types of scores. Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 100).

Evalutron 6000 Summary Data

Number of evaluators = 4
Number of evaluations per query/candidate pair = 1.5
Number of queries per grader = 1.5
Total number of unique query/candidate pairs graded = 436
Average number of query/candidate pairs evaluated per grader: 73
Number of queries = 6 (perfect) with each perfect query error-mutated 4 different ways = 30

General Legend

Sub code Submission name Abstract Contributors
JU1 ShapeH PDF Julián Urbano
JU2 ShapeTime PDF Julián Urbano
JU3 Time PDF Julián Urbano
SNT1 FLDC-10-4 PDF Shiho Sugimoto,Yuto Nakashima, Masayuki Takeda
SNT2 FLDC-12-4 PDF Shiho Sugimoto,Yuto Nakashima, Masayuki Takeda
SNT3 NCD-LZF PDF Shiho Sugimoto,Yuto Nakashima, Masayuki Takeda

Broad Categories

NS = Not Similar
SS = Somewhat Similar
VS = Very Similar

Table Headings

ADR = Average Dynamic Recall
NRGB = Normalize Recall at Group Boundaries
AP = Average Precision (non-interpolated)
PND = Precision at N Documents

Calculating Summary Measures

Fine(1) = Sum of fine-grained human similarity decisions (0-100).
PSum(1) = Sum of human broad similarity decisions: NS=0, SS=1, VS=2.
WCsum(1) = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar).
SDsum(1) = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar).
Greater0(1) = NS=0, SS=1, VS=1 (binary relevance judgment).
Greater1(1) = NS=0, SS=0, VS=1 (binary relevance judgment using only Very Similar).

(1)Normalized to the range 0 to 1.

Summary Results

Overall Scores (Includes Perfect and Error Candidates)

SCORE JU1 JU2 JU3 SNT1 SNT2 SNT3
ADR 0.5184 0.7438 0.7267 0.7445 0.7372 0.6931
NRGB 0.4783 0.7118 0.6929 0.6895 0.6947 0.6482
AP 0.4822 0.6139 0.6178 0.6307 0.6426 0.5539
PND 0.4387 0.6209 0.5992 0.6117 0.6185 0.5429
Fine 45.4067 45.93 44.43 39.12 40.61 36.9767
PSum 1.0367 1.0433 1.0333 0.89 0.94333 0.80333
WCSum 1.38 1.3933 1.3867 1.2033 1.2833 1.0933
SDSum 1.7233 1.7433 1.74 1.5167 1.6233 1.3833
Greater0 0.69333 0.69333 0.68 0.57667 0.60333 0.51333
Greater1 0.34333 0.35 0.35333 0.31333 0.34 0.29

download these results as csv

Scores by Query Error Types

No Errors

SCORE JU1 JU2 JU3 SNT1 SNT2 SNT3
Fine 48.1333 48.1833 44.45 40.1167 42.0667 37.55
PSum 1.1 1.1 1.0167 0.91667 0.98333 0.81667
WCSum 1.45 1.45 1.3667 1.2333 1.3333 1.1167
SDSum 1.8 1.8 1.7167 1.55 1.6833 1.4167
Greater0 0.75 0.75 0.66667 0.6 0.63333 0.51667
Greater1 0.35 0.35 0.35 0.31667 0.35 0.3

download these results as csv

Note Deletions

SCORE JU1 JU2 JU3 SNT1 SNT2 SNT3
ADR 0.5243 0.7860 0.7540 0.7656 0.7392 0.7268
NRGB 0.4852 0.7534 0.7170 0.6932 0.7025 0.6683
AP 0.4676 0.6511 0.6473 0.6591 0.6150 0.6245
PND 0.4471 0.6693 0.6455 0.5899 0.6177 0.5754
Fine 48.6833 49.0333 45.7167 40.3 40.2 38.95
PSum 1.0833 1.1167 1.05 0.9 0.93333 0.86667
WCSum 1.45 1.4833 1.4333 1.2333 1.2667 1.2
SDSum 1.8167 1.85 1.8167 1.5667 1.6 1.5333
Greater0 0.71667 0.75 0.66667 0.56667 0.6 0.53333
Greater1 0.36667 0.36667 0.38333 0.33333 0.33333 0.33333

download these results as csv

Note Insertions

SCORE JU1 JU2 JU3 SNT1 SNT2 SNT3
ADR 0.5155 0.7426 0.7324 0.8020 0.7844 0.6716
NRGB 0.4844 0.7165 0.7072 0.7320 0.7413 0.6319
AP 0.4975 0.6472 0.6500 0.7145 0.7148 0.5275
PND 0.4500 0.6389 0.6111 0.6722 0.7000 0.5222
Fine 47.0833 48.5167 45.35 40.1167 39.6167 36.95
PSum 1.1333 1.1333 1.1 0.93333 0.91667 0.8
WCSum 1.4667 1.5 1.45 1.25 1.25 1.0667
SDSum 1.8 1.8667 1.8 1.5667 1.5833 1.3333
Greater0 0.8 0.76667 0.75 0.61667 0.58333 0.53333
Greater1 0.33333 0.36667 0.35 0.31667 0.33333 0.26667

download these results as csv

Enlarged Intervals

SCORE JU1 JU2 JU3 SNT1 SNT2 SNT3
ADR 0.4924 0.7086 0.6994 0.7176 0.7296 0.6810
NRGB 0.4407 0.6546 0.6546 0.6767 0.6952 0.6493
AP 0.4028 0.5444 0.6258 0.5528 0.6242 0.5217
PND 0.3556 0.5222 0.5222 0.5833 0.6389 0.5278
Fine 37.9833 38.0833 44.4833 32.4833 37.8667 34.5833
PSum 0.83333 0.83333 1.0167 0.73333 0.88333 0.75
WCSum 1.1333 1.1333 1.3833 0.98333 1.2 1.0167
SDSum 1.4333 1.4333 1.75 1.2333 1.5167 1.2833
Greater0 0.53333 0.53333 0.65 0.48333 0.56667 0.48333
Greater1 0.3 0.3 0.36667 0.25 0.31667 0.26667

download these results as csv

Compressed Intervals

SCORE JU1 JU2 JU3 SNT1 SNT2 SNT3
ADR 0.5114 0.7417 0.7145 0.7151 0.7313 0.6729
NRGB 0.4643 0.7156 0.6935 0.6545 0.6645 0.6195
AP 0.4888 0.6073 0.5530 0.6015 0.6310 0.5085
PND 0.4659 0.6325 0.6087 0.5714 0.5774 0.5226
Fine 45.15 45.8333 42.15 42.5833 43.3 36.85
PSum 1.0333 1.0333 0.98333 0.96667 1 0.78333
WCSum 1.4 1.4 1.3 1.3167 1.3667 1.0667
SDSum 1.7667 1.7667 1.6167 1.6667 1.7333 1.35
Greater0 0.66667 0.66667 0.66667 0.61667 0.63333 0.5
Greater1 0.36667 0.36667 0.31667 0.35 0.36667 0.28333

download these results as csv

Friedman Test with Multiple Comparisons Results (p=0.05)

The Friedman test was run in MATLAB against the Fine summary data over the 30 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

Row Labels JU1 JU2 JU3 SNT1 SNT2 SNT3
q01 48.8 48.8 35.3 47.5 44.2 34.7
q01_1 48.8 48.8 35.3 52.9 48.5 40.3
q01_2 33 38.3 36.9 47.6 44.3 22.4
q01_3 18 18 29.6 34.9 47 25.6
q01_4 36.2 36.2 22.5 48.9 51.3 25.6
q02 74.4 72.5 74.4 64.1 66.5 65.3
q02_1 76.7 76.7 78.5 64.1 65.8 71.6
q02_2 71.8 71.8 72.5 64.1 66.4 64.2
q02_3 70.1 70.1 72.9 58.2 63 62.2
q02_4 72.5 74.4 71 64.1 66.5 64.2
q03 37.5 37.5 25.7 9.8 15.8 27.4
q03_1 41.9 41.9 26.4 9.8 15.8 23.3
q03_2 37.5 37.5 26.4 9.8 16.2 31.1
q03_3 37.5 37.5 26.5 9.8 15.8 21.3
q03_4 37.5 37.5 25.7 9.8 15.8 25.3
q04 40.7 40.7 33.3 49.6 49.8 24.5
q04_1 32.4 32.4 42.4 55.6 52.2 23.9
q04_2 47.7 48.8 44.4 55.7 51.5 23.1
q04_3 37.1 37.1 40.1 24.8 30.9 24.8
q04_4 41 41 36.5 49.6 49.6 26.9
q05 59.1 61.3 59.6 43.5 39.2 37.8
q05_1 64 66.1 53.3 35.3 33.6 34
q05_2 59.1 61.3 57.4 36.7 34 46.2
q05_3 36.9 37.5 61.3 41 33.6 35.1
q05_4 59.1 61.3 59.6 44.5 39.2 46
q06 28.3 28.3 38.4 26.2 36.9 35.6
q06_1 28.3 28.3 38.4 24.1 25.3 40.6
q06_2 33.4 33.4 34.5 26.8 25.3 34.7
q06_3 28.3 28.3 36.5 26.2 36.9 38.5
q06_4 24.6 24.6 37.6 38.6 37.4 33.1

download these results as csv

TeamID TeamID Lowerbound Mean Upperbound Significance
JU2 JU1 -1.0440 0.3167 1.6774 FALSE
JU2 JU3 -1.0940 0.2667 1.6274 FALSE
JU2 SNT2 -0.1940 1.1667 2.5274 FALSE
JU2 SNT1 0.0893 1.4500 2.8107 TRUE
JU2 SNT3 0.3393 1.7000 3.0607 TRUE
JU1 JU3 -1.4107 -0.0500 1.3107 FALSE
JU1 SNT2 -0.5107 0.8500 2.2107 FALSE
JU1 SNT1 -0.2274 1.1333 2.4940 FALSE
JU1 SNT3 0.0226 1.3833 2.7440 TRUE
JU3 SNT2 -0.4607 0.9000 2.2607 FALSE
JU3 SNT1 -0.1774 1.1833 2.5440 FALSE
JU3 SNT3 0.0726 1.4333 2.7940 TRUE
SNT2 SNT1 -1.0774 0.2833 1.6440 FALSE
SNT2 SNT3 -0.8274 0.5333 1.8940 FALSE
SNT1 SNT3 -1.1107 0.2500 1.6107 FALSE

download these results as csv

2015 sms fine scores friedmans.png