2015:Symbolic Melodic Similarity Results

Introduction

These are the results for the 2015 running of the Symbolic Melodic Similarity task set. For background information about this task set please refer to the 2015:Symbolic Melodic Similarity page.

Each system was given a query and returned the 10 most melodically similar songs from those taken from the Essen Collection (5274 pieces in the MIDI format; see ESAC Data Homepage for more information). For each query, we made four classes of error-mutations, thus the set comprises the following query classes:

0. No errors
1. One note deleted
2. One note inserted
3. One interval enlarged
4. One interval compressed

For each query (and its 4 mutations), the returned results (candidates) from all systems were then grouped together (query set) for evaluation by the human graders. The graders were provide with only heard perfect version against which to evaluate the candidates and did not know whether the candidates came from a perfect or mutated query. Each query/candidate set was evaluated by 1 individual grader. Using the Evalutron 6000 system, the graders gave each query/candidate pair two types of scores. Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 100).

Evalutron 6000 Summary Data

Number of evaluators = 4
Number of evaluations per query/candidate pair = 1.5
Number of queries per grader = 1.5
Total number of unique query/candidate pairs graded = 436
Average number of query/candidate pairs evaluated per grader: 73
Number of queries = 6 (perfect) with each perfect query error-mutated 4 different ways = 30

General Legend

Sub code	Submission name	Abstract	Contributors
JU1	ShapeH	PDF	Julián Urbano
JU2	ShapeTime	PDF	Julián Urbano
JU3	Time	PDF	Julián Urbano
SNT1	FLDC-10-4	PDF	Shiho Sugimoto,Yuto Nakashima, Masayuki Takeda
SNT2	FLDC-12-4	PDF	Shiho Sugimoto,Yuto Nakashima, Masayuki Takeda
SNT3	NCD-LZF	PDF	Shiho Sugimoto,Yuto Nakashima, Masayuki Takeda

Broad Categories

NS = Not Similar
SS = Somewhat Similar
VS = Very Similar

Table Headings

ADR = Average Dynamic Recall
NRGB = Normalize Recall at Group Boundaries
AP = Average Precision (non-interpolated)
PND = Precision at N Documents

Calculating Summary Measures

Fine⁽¹⁾ = Sum of fine-grained human similarity decisions (0-100).
PSum⁽¹⁾ = Sum of human broad similarity decisions: NS=0, SS=1, VS=2.
WCsum⁽¹⁾ = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar).
SDsum⁽¹⁾ = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar).
Greater0⁽¹⁾ = NS=0, SS=1, VS=1 (binary relevance judgment).
Greater1⁽¹⁾ = NS=0, SS=0, VS=1 (binary relevance judgment using only Very Similar).

⁽¹⁾Normalized to the range 0 to 1.

Summary Results

Overall Scores (Includes Perfect and Error Candidates)

SCORE	JU1	JU2	JU3	SNT1	SNT2	SNT3
ADR	0.5184	0.7438	0.7267	0.7445	0.7372	0.6931
NRGB	0.4783	0.7118	0.6929	0.6895	0.6947	0.6482
AP	0.4822	0.6139	0.6178	0.6307	0.6426	0.5539
PND	0.4387	0.6209	0.5992	0.6117	0.6185	0.5429
Fine	45.4067	45.93	44.43	39.12	40.61	36.9767
PSum	1.0367	1.0433	1.0333	0.89	0.94333	0.80333
WCSum	1.38	1.3933	1.3867	1.2033	1.2833	1.0933
SDSum	1.7233	1.7433	1.74	1.5167	1.6233	1.3833
Greater0	0.69333	0.69333	0.68	0.57667	0.60333	0.51333
Greater1	0.34333	0.35	0.35333	0.31333	0.34	0.29

download these results as csv

Scores by Query Error Types

No Errors

SCORE	JU1	JU2	JU3	SNT1	SNT2	SNT3
Fine	48.1333	48.1833	44.45	40.1167	42.0667	37.55
PSum	1.1	1.1	1.0167	0.91667	0.98333	0.81667
WCSum	1.45	1.45	1.3667	1.2333	1.3333	1.1167
SDSum	1.8	1.8	1.7167	1.55	1.6833	1.4167
Greater0	0.75	0.75	0.66667	0.6	0.63333	0.51667
Greater1	0.35	0.35	0.35	0.31667	0.35	0.3

download these results as csv

Note Deletions

SCORE	JU1	JU2	JU3	SNT1	SNT2	SNT3
ADR	0.5243	0.7860	0.7540	0.7656	0.7392	0.7268
NRGB	0.4852	0.7534	0.7170	0.6932	0.7025	0.6683
AP	0.4676	0.6511	0.6473	0.6591	0.6150	0.6245
PND	0.4471	0.6693	0.6455	0.5899	0.6177	0.5754
Fine	48.6833	49.0333	45.7167	40.3	40.2	38.95
PSum	1.0833	1.1167	1.05	0.9	0.93333	0.86667
WCSum	1.45	1.4833	1.4333	1.2333	1.2667	1.2
SDSum	1.8167	1.85	1.8167	1.5667	1.6	1.5333
Greater0	0.71667	0.75	0.66667	0.56667	0.6	0.53333
Greater1	0.36667	0.36667	0.38333	0.33333	0.33333	0.33333

download these results as csv

Note Insertions

SCORE	JU1	JU2	JU3	SNT1	SNT2	SNT3
ADR	0.5155	0.7426	0.7324	0.8020	0.7844	0.6716
NRGB	0.4844	0.7165	0.7072	0.7320	0.7413	0.6319
AP	0.4975	0.6472	0.6500	0.7145	0.7148	0.5275
PND	0.4500	0.6389	0.6111	0.6722	0.7000	0.5222
Fine	47.0833	48.5167	45.35	40.1167	39.6167	36.95
PSum	1.1333	1.1333	1.1	0.93333	0.91667	0.8
WCSum	1.4667	1.5	1.45	1.25	1.25	1.0667
SDSum	1.8	1.8667	1.8	1.5667	1.5833	1.3333
Greater0	0.8	0.76667	0.75	0.61667	0.58333	0.53333
Greater1	0.33333	0.36667	0.35	0.31667	0.33333	0.26667

download these results as csv

Enlarged Intervals

SCORE	JU1	JU2	JU3	SNT1	SNT2	SNT3
ADR	0.4924	0.7086	0.6994	0.7176	0.7296	0.6810
NRGB	0.4407	0.6546	0.6546	0.6767	0.6952	0.6493
AP	0.4028	0.5444	0.6258	0.5528	0.6242	0.5217
PND	0.3556	0.5222	0.5222	0.5833	0.6389	0.5278
Fine	37.9833	38.0833	44.4833	32.4833	37.8667	34.5833
PSum	0.83333	0.83333	1.0167	0.73333	0.88333	0.75
WCSum	1.1333	1.1333	1.3833	0.98333	1.2	1.0167
SDSum	1.4333	1.4333	1.75	1.2333	1.5167	1.2833
Greater0	0.53333	0.53333	0.65	0.48333	0.56667	0.48333
Greater1	0.3	0.3	0.36667	0.25	0.31667	0.26667

download these results as csv

Compressed Intervals

SCORE	JU1	JU2	JU3	SNT1	SNT2	SNT3
ADR	0.5114	0.7417	0.7145	0.7151	0.7313	0.6729
NRGB	0.4643	0.7156	0.6935	0.6545	0.6645	0.6195
AP	0.4888	0.6073	0.5530	0.6015	0.6310	0.5085
PND	0.4659	0.6325	0.6087	0.5714	0.5774	0.5226
Fine	45.15	45.8333	42.15	42.5833	43.3	36.85
PSum	1.0333	1.0333	0.98333	0.96667	1	0.78333
WCSum	1.4	1.4	1.3	1.3167	1.3667	1.0667
SDSum	1.7667	1.7667	1.6167	1.6667	1.7333	1.35
Greater0	0.66667	0.66667	0.66667	0.61667	0.63333	0.5
Greater1	0.36667	0.36667	0.31667	0.35	0.36667	0.28333

download these results as csv

Friedman Test with Multiple Comparisons Results (p=0.05)

The Friedman test was run in MATLAB against the Fine summary data over the 30 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);

Row Labels	JU1	JU2	JU3	SNT1	SNT2	SNT3
q01	48.8	48.8	35.3	47.5	44.2	34.7
q01_1	48.8	48.8	35.3	52.9	48.5	40.3
q01_2	33	38.3	36.9	47.6	44.3	22.4
q01_3	18	18	29.6	34.9	47	25.6
q01_4	36.2	36.2	22.5	48.9	51.3	25.6
q02	74.4	72.5	74.4	64.1	66.5	65.3
q02_1	76.7	76.7	78.5	64.1	65.8	71.6
q02_2	71.8	71.8	72.5	64.1	66.4	64.2
q02_3	70.1	70.1	72.9	58.2	63	62.2
q02_4	72.5	74.4	71	64.1	66.5	64.2
q03	37.5	37.5	25.7	9.8	15.8	27.4
q03_1	41.9	41.9	26.4	9.8	15.8	23.3
q03_2	37.5	37.5	26.4	9.8	16.2	31.1
q03_3	37.5	37.5	26.5	9.8	15.8	21.3
q03_4	37.5	37.5	25.7	9.8	15.8	25.3
q04	40.7	40.7	33.3	49.6	49.8	24.5
q04_1	32.4	32.4	42.4	55.6	52.2	23.9
q04_2	47.7	48.8	44.4	55.7	51.5	23.1
q04_3	37.1	37.1	40.1	24.8	30.9	24.8
q04_4	41	41	36.5	49.6	49.6	26.9
q05	59.1	61.3	59.6	43.5	39.2	37.8
q05_1	64	66.1	53.3	35.3	33.6	34
q05_2	59.1	61.3	57.4	36.7	34	46.2
q05_3	36.9	37.5	61.3	41	33.6	35.1
q05_4	59.1	61.3	59.6	44.5	39.2	46
q06	28.3	28.3	38.4	26.2	36.9	35.6
q06_1	28.3	28.3	38.4	24.1	25.3	40.6
q06_2	33.4	33.4	34.5	26.8	25.3	34.7
q06_3	28.3	28.3	36.5	26.2	36.9	38.5
q06_4	24.6	24.6	37.6	38.6	37.4	33.1

download these results as csv

TeamID	TeamID	Lowerbound	Mean	Upperbound	Significance
JU2	JU1	-1.0440	0.3167	1.6774	FALSE
JU2	JU3	-1.0940	0.2667	1.6274	FALSE
JU2	SNT2	-0.1940	1.1667	2.5274	FALSE
JU2	SNT1	0.0893	1.4500	2.8107	TRUE
JU2	SNT3	0.3393	1.7000	3.0607	TRUE
JU1	JU3	-1.4107	-0.0500	1.3107	FALSE
JU1	SNT2	-0.5107	0.8500	2.2107	FALSE
JU1	SNT1	-0.2274	1.1333	2.4940	FALSE
JU1	SNT3	0.0226	1.3833	2.7440	TRUE
JU3	SNT2	-0.4607	0.9000	2.2607	FALSE
JU3	SNT1	-0.1774	1.1833	2.5440	FALSE
JU3	SNT3	0.0726	1.4333	2.7940	TRUE
SNT2	SNT1	-1.0774	0.2833	1.6440	FALSE
SNT2	SNT3	-0.8274	0.5333	1.8940	FALSE
SNT1	SNT3	-1.1107	0.2500	1.6107	FALSE

download these results as csv

2015:Symbolic Melodic Similarity Results

Contents

Introduction

Evalutron 6000 Summary Data

General Legend

Broad Categories

Table Headings

Calculating Summary Measures

Summary Results

Overall Scores (Includes Perfect and Error Candidates)

Scores by Query Error Types

No Errors

Note Deletions

Note Insertions

Enlarged Intervals

Compressed Intervals

Friedman Test with Multiple Comparisons Results (p=0.05)

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools