2007:Symbolic Melodic Similarity Results

Introduction

These are the results for the 2007 running of the Symbolic Melodic Similarity task set. For background information about this task set please refer to the Symbolic Melodic Similarity page.

Each system was given a query and returned the 10 most melodically similar songs from those taken from the Essen Collection (5274 pieces in the MIDI format)[insert ref]. For each query, we made four classes of error-mutations, thus the set comprises the following query classes:

0 No errors
One note deleted
One note inserted
One interval enlarged
One interval compressed

For each query (and its 4 mutations), the returned results (candidates) from all systems were then grouped together (query set) for evaluation by the human graders. The graders were provide with only heard perfect version against which to evaluate the candidates and did not know whether the candidates came from a perfect or mutated query. Each query/candidate set was evaluated by 1 individual grader. Using the Evalutron 6000 system, the graders gave each query/candidate pair two types of scores. Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 10).

Evalutron 6000 Summary Data

Number of evaluators = 6
Number of evaluations per query/candidate pair = 1
Number of queries per grader = 1
Total number of candidates returned = 2400
Total number of unique query/candidate pairs graded = 799
Average number of query/candidate pairs evaluated per grader: 133
Number of queries = 6 (perfect) with each perfect query error-mutated 4 different ways = 30

General Legend

Team ID

Prefix R = RISM collection, K = Karaoke collection, M = Polyphonic collection

FHAR = Ferraro, Hanna, Allai & Robine
GAR1 = Gomez, C., Abad-Mota & Ruckhaus
GAR2 = Gomez, C., Abad-Mota & Ruckhaus
AP1 = Pinto, A.
AP2 = Pinto, A.
AP2 =
AU1 =
AU2 =
AU3 =

Broad Categories

NS = Not Similar
SS = Somewhat Similar
VS = Very Similar

Table Headings

ADR = Average Dynamic Recall
NRGB = Normalize Recall at Group Boundaries
AP = Average Precision (non-interpolated)
PND = Precision at N Documents

Calculating Summary Measures

Fine⁽¹⁾ = Sum of fine-grained human similarity decisions (0-10).
PSum⁽¹⁾ = Sum of human broad similarity decisions: NS=0, SS=1, VS=2.
WCsum⁽¹⁾ = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar).
SDsum⁽¹⁾ = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar).
Greater0⁽¹⁾ = NS=0, SS=1, VS=1 (binary relevance judgement).
Greater1⁽¹⁾ = NS=0, SS=0, VS=1 (binary relevance judgement using only Very Similar).

⁽¹⁾Normalized to the range 0 to 1.

Overall Summary Results

Overall Summaries Presented by Error Types

file /nema-raid/www/mirex/results/SMS07_Results_by_error.csv not found

Task I: RISM Collection Summary Results

There is an error with this data set...please stand by. file /nema-raid/www/mirex/results/sms06_rism_results3.csv not found

Task IIa: Karaoke Collection Summary Results

file /nema-raid/www/mirex/results/sms06_kar_results3.csv not found

Task IIb: Mixed Polyphonic Collection Summary Results

file /nema-raid/www/mirex/results/sms06_mix_results3.csv not found

Raw Scores

The raw data derived from the Evalutron 6000 human evaluations are located on the Symbolic Melodic Similarity Raw Data page.

2007:Symbolic Melodic Similarity Results

Contents

Introduction

Evalutron 6000 Summary Data

General Legend

Team ID

Broad Categories

Table Headings

Calculating Summary Measures

Overall Summary Results

Overall Summaries Presented by Error Types

Task I: RISM Collection Summary Results

Task IIa: Karaoke Collection Summary Results

Task IIb: Mixed Polyphonic Collection Summary Results

Raw Scores

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools