Difference between revisions of "2010:Symbolic Melodic Similarity Results"

From MIREX Wiki
(Created page with '==Introduction== These are the results for the 2010 running of the Symbolic Melodic Similarity task set. For background information about this task set please refer to the [[201…')
 
Line 24: Line 24:
 
====Team ID ====
 
====Team ID ====
 
'''HFAR''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> Pierre Hanna, Pascal Ferraro, Julien Allali, Matthias Robine]<br />
 
'''HFAR''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> Pierre Hanna, Pascal Ferraro, Julien Allali, Matthias Robine]<br />
'''GAR1''' = [https://www.music-ir.org/mirex/abstracts/2007/QBSH_SMS_gomez.pdf Carlos Gómez, Soraya Abad-Mota, Edna Ruckhaus 1]<br />
+
'''DR_C''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> David Rizo 1]<br />
'''GAR2''' = [https://www.music-ir.org/mirex/abstracts/2007/QBSH_SMS_gomez.pdf Carlos Gómez, Soraya Abad-Mota, Edna Ruckhaus 2]<br />  
+
'''DR_T''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> David Rizo 2]<br />  
'''AP1''' = [https://www.music-ir.org/mirex/abstracts/2007/SMS_pinto.pdf Alberto Pinto 1]<br />
+
'''DR_T3''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> David Rizo 3]<br />
'''AP2''' = [https://www.music-ir.org/mirex/abstracts/2007/SMS_pinto.pdf Alberto Pinto 2]<br />
+
'''DR_PR''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> David Rizo 4]<br />
'''AU1''' = [https://www.music-ir.org/mirex/abstracts/2007/QBSH_SMS_uitdenbogerd.pdf Alexandra L. Uitdenbogerd 1]<br />
+
'''JU1''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> Julián Urbano 1]<br />
'''AU2''' = [https://www.music-ir.org/mirex/abstracts/2007/QBSH_SMS_uitdenbogerd.pdf Alexandra L. Uitdenbogerd 2]<br />
+
'''JU2''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> Julián Urbano 2]<br />
'''AU3''' = [https://www.music-ir.org/mirex/abstracts/2007/QBSH_SMS_uitdenbogerd.pdf Alexandra L. Uitdenbogerd 3]<br />
+
'''JU3''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> Julián Urbano 3]<br />
 +
'''JU4''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> Julián Urbano 4]<br />
 +
'''LL_S2''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> Mika Laitinen, Kjell Lemström 1]<br />
 +
'''LL_W2''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> Mika Laitinen, Kjell Lemström 2]<br />
 +
'''SU_ngr5''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> Iman Suyoto, Alexandra Uitdenbogerd 1]<br />
 +
'''SU_pioi''' = <!--[https://www.music-ir.org/mirex/abstracts/2010/SMS_ferraro.pdf--> Iman Suyoto, Alexandra Uitdenbogerd 2]<br />
  
 
====Broad Categories====
 
====Broad Categories====
Line 37: Line 42:
 
'''VS''' = Very Similar<br />
 
'''VS''' = Very Similar<br />
  
====Table Headings (Other metrics to be added soon to results by Xiao Hu )====
+
====Table Headings ====
 
'''ADR''' = Average Dynamic Recall <br />
 
'''ADR''' = Average Dynamic Recall <br />
 
'''NRGB''' = Normalize Recall at Group Boundaries <br />
 
'''NRGB''' = Normalize Recall at Group Boundaries <br />
Line 44: Line 49:
  
 
===Calculating Summary Measures===
 
===Calculating Summary Measures===
'''Fine'''<sup>(1)</sup> = Sum of fine-grained human similarity decisions (0-10). <br />
+
'''Fine'''<sup>(1)</sup> = Sum of fine-grained human similarity decisions (0-100). <br />
 
'''PSum'''<sup>(1)</sup> = Sum of human broad similarity decisions: NS=0, SS=1, VS=2. <br />
 
'''PSum'''<sup>(1)</sup> = Sum of human broad similarity decisions: NS=0, SS=1, VS=2. <br />
 
'''WCsum'''<sup>(1)</sup> = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar). <br />
 
'''WCsum'''<sup>(1)</sup> = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar). <br />
 
'''SDsum'''<sup>(1)</sup> = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar). <br />
 
'''SDsum'''<sup>(1)</sup> = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar). <br />
'''Greater0'''<sup>(1)</sup> = NS=0, SS=1, VS=1 (binary relevance judgement).<br />
+
'''Greater0'''<sup>(1)</sup> = NS=0, SS=1, VS=1 (binary relevance judgment).<br />
'''Greater1'''<sup>(1)</sup> = NS=0, SS=0, VS=1 (binary relevance judgement using only Very Similar).<br />
+
'''Greater1'''<sup>(1)</sup> = NS=0, SS=0, VS=1 (binary relevance judgment using only Very Similar).<br />
  
 
<sup>(1)</sup>Normalized to the range 0 to 1.
 
<sup>(1)</sup>Normalized to the range 0 to 1.
Line 55: Line 60:
 
==Summary Results==
 
==Summary Results==
 
===Run Times===
 
===Run Times===
<csv>2007/sms_runtimes.csv</csv>
+
<csv>2010/sms_runtimes.csv</csv>
 
===Overall Scores (Includes Perfect and Error Candidates)===
 
===Overall Scores (Includes Perfect and Error Candidates)===
<csv>2007/SMS07_overall_norm.csv</csv>
+
<csv>2010/SMS10_overall_norm.csv</csv>
  
 
===Overall Summaries (Presented by Error Types)===
 
===Overall Summaries (Presented by Error Types)===
  
<csv>2007/SMS07_errors_norm.csv</csv>
+
<csv>2010/SMS10_errors_norm.csv</csv>
  
 
===Friedman Test with Multiple Comparisons Results (p=0.05)===
 
===Friedman Test with Multiple Comparisons Results (p=0.05)===
 
The Friedman test was run in MATLAB against the Fine summary data over the 100 queries.<br />
 
The Friedman test was run in MATLAB against the Fine summary data over the 100 queries.<br />
 
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);
 
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05);
<csv>2007/sms07_sum_friedman_fine.csv</csv>
+
<csv>2010/sms10_sum_friedman_fine.csv</csv>
<csv>2007/sms07_detail_friedman_fine.csv</csv>
+
<csv>2007/sms10_detail_friedman_fine.csv</csv>
  
[[Image:2007 sms fine scores friedmans.png]]
+
[[Image:2010 sms fine scores friedmans.png]]
  
 
==Raw Scores==
 
==Raw Scores==
The raw data derived from the Evalutron 6000 human evaluations are located on the [[2007:Symbolic Melodic Similarity Raw Data]] page.
+
The raw data derived from the Evalutron 6000 human evaluations are located on the [[2010:Symbolic Melodic Similarity Raw Data]] page.
  
 
[[Category: Results]]
 
[[Category: Results]]

Revision as of 11:54, 27 July 2010

Introduction

These are the results for the 2010 running of the Symbolic Melodic Similarity task set. For background information about this task set please refer to the 2010:Symbolic Melodic Similarity page.

Each system was given a query and returned the 10 most melodically similar songs from those taken from the Essen Collection (5274 pieces in the MIDI format; see ESAC Data Homepage for more information). For each query, we made four classes of error-mutations, thus the set comprises the following query classes:

  • 0. No errors
  • 1. One note deleted
  • 2. One note inserted
  • 3. One interval enlarged
  • 4. One interval compressed

For each query (and its 4 mutations), the returned results (candidates) from all systems were then grouped together (query set) for evaluation by the human graders. The graders were provide with only heard perfect version against which to evaluate the candidates and did not know whether the candidates came from a perfect or mutated query. Each query/candidate set was evaluated by 1 individual grader. Using the Evalutron 6000 system, the graders gave each query/candidate pair two types of scores. Graders were asked to provide 1 categorical score with 3 categories: NS,SS,VS as explained below, and one fine score (in the range from 0 to 100).

Evalutron 6000 Summary Data

Number of evaluators = 6
Number of evaluations per query/candidate pair = 1
Number of queries per grader = 1
Total number of candidates returned = 3900
Total number of unique query/candidate pairs graded = 895
Average number of query/candidate pairs evaluated per grader: 149
Number of queries = 6 (perfect) with each perfect query error-mutated 4 different ways = 30

General Legend

Team ID

HFAR = Pierre Hanna, Pascal Ferraro, Julien Allali, Matthias Robine]
DR_C = David Rizo 1]
DR_T = David Rizo 2]
DR_T3 = David Rizo 3]
DR_PR = David Rizo 4]
JU1 = Julián Urbano 1]
JU2 = Julián Urbano 2]
JU3 = Julián Urbano 3]
JU4 = Julián Urbano 4]
LL_S2 = Mika Laitinen, Kjell Lemström 1]
LL_W2 = Mika Laitinen, Kjell Lemström 2]
SU_ngr5 = Iman Suyoto, Alexandra Uitdenbogerd 1]
SU_pioi = Iman Suyoto, Alexandra Uitdenbogerd 2]

Broad Categories

NS = Not Similar
SS = Somewhat Similar
VS = Very Similar

Table Headings

ADR = Average Dynamic Recall
NRGB = Normalize Recall at Group Boundaries
AP = Average Precision (non-interpolated)
PND = Precision at N Documents

Calculating Summary Measures

Fine(1) = Sum of fine-grained human similarity decisions (0-100).
PSum(1) = Sum of human broad similarity decisions: NS=0, SS=1, VS=2.
WCsum(1) = 'World Cup' scoring: NS=0, SS=1, VS=3 (rewards Very Similar).
SDsum(1) = 'Stephen Downie' scoring: NS=0, SS=1, VS=4 (strongly rewards Very Similar).
Greater0(1) = NS=0, SS=1, VS=1 (binary relevance judgment).
Greater1(1) = NS=0, SS=0, VS=1 (binary relevance judgment using only Very Similar).

(1)Normalized to the range 0 to 1.

Summary Results

Run Times

file /nema-raid/www/mirex/results/2010/sms_runtimes.csv not found

Overall Scores (Includes Perfect and Error Candidates)

file /nema-raid/www/mirex/results/2010/SMS10_overall_norm.csv not found

Overall Summaries (Presented by Error Types)

file /nema-raid/www/mirex/results/2010/SMS10_errors_norm.csv not found

Friedman Test with Multiple Comparisons Results (p=0.05)

The Friedman test was run in MATLAB against the Fine summary data over the 100 queries.
Command: [c,m,h,gnames] = multcompare(stats, 'ctype', 'tukey-kramer','estimate', 'friedman', 'alpha', 0.05); file /nema-raid/www/mirex/results/2010/sms10_sum_friedman_fine.csv not found file /nema-raid/www/mirex/results/2007/sms10_detail_friedman_fine.csv not found

2010 sms fine scores friedmans.png

Raw Scores

The raw data derived from the Evalutron 6000 human evaluations are located on the 2010:Symbolic Melodic Similarity Raw Data page.