Introduction
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the 2018:Music and/or Speech Detection page.
General Legend
Sub code
|
Abstract
|
Contributors
|
DD1
|
PDF |
David Doukhan
|
JHKK1
|
PDF |
Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|
JHKK2
|
PDF |
Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|
JHKK3
|
PDF |
Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|
LN1
|
PDF |
Minsuk Choi, Jongpil Lee, Juhan Nam
|
MM1
|
PDF |
Matija Marolt
|
MM2
|
PDF |
Matija Marolt
|
MM3
|
PDF |
Matija Marolt
|
MMG1
|
PDF |
Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|
MMG2
|
PDF |
Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|
Statistics notation
<class>_F = segment-level F-measure for the class <class>
<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>
<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>
<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>
<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>
Task 1: Music Detection
Dataset 1
Segment-level Evaluation
Sub code
|
Accuracy
|
Music_F
|
No-Music_F
|
DD1
|
0.6860 |
0.5424 |
0.7611
|
JHKK1
|
0.7798 |
0.7123 |
0.8215
|
JHKK2
|
0.8005 |
0.7415 |
0.8375
|
LN1
|
0.6251 |
0.5022 |
0.6987
|
MM1
|
0.6135 |
0.3899 |
0.7172
|
MM2
|
0.6807 |
0.5478 |
0.7531
|
MM3
|
0.6075 |
0.3124 |
0.7254
|
MMG1
|
0.9049 |
0.8996 |
0.9097
|
Event-level Evaluation
Sub code
|
Music_F_500_on
|
Music_F_500_onoff
|
Music_F_1000_on
|
Music_F_1000_onoff
|
DD1
|
0.2877 |
0.093 |
0.312 |
0.1142
|
JHKK1
|
0.2303 |
0.0765 |
0.294 |
0.1173
|
JHKK2
|
0.2522 |
0.0931 |
0.3245 |
0.1389
|
LN1
|
0.1348 |
0.0139 |
0.1704 |
0.0231
|
MM1
|
0.2044 |
0.0662 |
0.2137 |
0.0831
|
MM2
|
0.2464 |
0.0817 |
0.2736 |
0.1049
|
MM3
|
0.1379 |
0.0525 |
0.1619 |
0.0676
|
MMG1
|
0.5177 |
0.2693 |
0.5813 |
0.3502
|
Dataset 2
Sub code
|
Accuracy
|
Music_F
|
No-Music_F
|
DD1
|
0.9257 |
0.9334 |
0.9162
|
JHKK1
|
0.9415 |
0.9487 |
0.9318
|
JHKK2
|
0.9153 |
0.9309 |
0.8907
|
LN1
|
0.7814 |
0.8053 |
0.7499
|
MM1
|
0.915 |
0.9228 |
0.9054
|
MM2
|
0.9032 |
0.9158 |
0.8859
|
MM3
|
0.8725 |
0.8791 |
0.8652
|
MMG1
|
0.9025 |
0.9223 |
0.8691
|
Event-level Evaluation
Sub code
|
Music_F_500_on
|
Music_F_500_onoff
|
Music_F_1000_on
|
Music_F_1000_onoff
|
DD1
|
0.4089 |
0.2235 |
0.4402 |
0.248
|
JHKK1
|
0.1659 |
0.0347 |
0.2334 |
0.0636
|
JHKK2
|
0.167 |
0.029 |
0.2015 |
0.0599
|
LN1
|
0.0991 |
0.0228 |
0.1319 |
0.0428
|
MM1
|
0.1412 |
0.0159 |
0.1843 |
0.0392
|
MM2
|
0.1540 |
0.0312 |
0.231 |
0.0791
|
MM3
|
0.1516 |
0.0223 |
0.1962 |
0.0535
|
MMG1
|
0.1358 |
0.0173 |
0.1936 |
0.0347
|
Task 2: Speech Detection
Dataset 1
Segment-level Evaluation
Sub code
|
Accuracy
|
Speech_F
|
No-Speech_F
|
DD1
|
0.877 |
0.9186 |
0.7493
|
JHKK3
|
0.8307 |
0.8795 |
0.7143
|
LN1
|
0.6908 |
0.7472 |
0.6007
|
MM1
|
0.8626 |
0.9115 |
0.6948
|
MM2
|
0.8619 |
0.909 |
0.713
|
MM3
|
0.8508 |
0.9086 |
0.5966
|
Event-level Evaluation
Sub code
|
Speech_F_500_on
|
Speech_F_500_onoff
|
Speech_F_1000_on
|
Speech_F_1000_onoff
|
DD1
|
0.415 |
0.1603 |
0.4477 |
0.2122
|
JHKK3
|
0.2882 |
0.0777 |
0.3289 |
0.0962
|
LN1
|
0.2686 |
0.0529 |
0.3484 |
0.0883
|
MM1
|
0.4607 |
0.2068 |
0.4898 |
0.2336
|
MM2
|
0.4422 |
0.1999 |
0.5093 |
0.266
|
MM3
|
0.4439 |
0.1775 |
0.4879 |
0.2122
|
Dataset 2
Segment-level Evaluation
Sub code
|
Accuracy
|
Speech_F
|
No-Speech_F
|
DD1
|
0.9617 |
0.9583 |
0.9648
|
JHKK3
|
0.8575 |
0.8305 |
0.8765
|
LN1
|
0.8636 |
0.8314 |
0.885
|
MM1
|
0.9367 |
0.9326 |
0.9405
|
MM2
|
0.9226 |
0.914 |
0.9296
|
MM3
|
0.8973 |
0.8973 |
0.8974
|
Event-level Evaluation
Sub code
|
Speech_F_500_on
|
Speech_F_500_onoff
|
Speech_F_1000_on
|
Speech_F_1000_onoff
|
DD1
|
0.6037 |
0.4139 |
0.6318 |
0.435
|
JHKK3
|
0.1585 |
0.0405 |
0.2095 |
0.0563
|
LN1
|
0.1775 |
0.0399 |
0.2426 |
0.0738
|
MM1
|
0.0632 |
0.0015 |
0.0947 |
0.0150
|
MM2
|
0.1162 |
0.0211 |
0.1737 |
-
|
MM3
|
0.0796 |
0.0152 |
0.123 |
0.0281
|
Task 3: Music and Speech Detection
Dataset 1
Segment-level Evaluation
Sub code
|
Music_F
|
Speech_F
|
LN1
|
0.4936 |
0.7718
|
MM1
|
0.3899 |
0.9115
|
MM2
|
0.5478 |
0.909
|
MM3
|
0.3124 |
0.9086
|
Event-level Evaluation
Sub code
|
Music_F_500_on
|
Music_F_500_onoff
|
Music_F_1000_on
|
Music_F_1000_onoff
|
Speech_F_500_on
|
Speech_F_500_onoff
|
Speech_F_1000_on
|
Speech_F_1000_onoff
|
LN1
|
0.1116 |
0.0088 |
0.1459 |
0.0186 |
0.2645 |
0.0462 |
0.348 |
0.0786
|
MM1
|
0.2044 |
0.0662 |
0.2137 |
0.0831 |
0.4607 |
0.2068 |
0.4898 |
0.2336
|
MM2
|
0.2464 |
0.0817 |
0.2736 |
0.1049 |
0.4422 |
0.1999 |
0.5093 |
0.266
|
MM3
|
0.1379 |
0.0525 |
0.1619 |
0.0676 |
0.4439 |
0.1775 |
0.4879 |
0.2122
|
Task 4: Music Relative Loudness Estimation
Dataset 1
Segment-level Evaluation
Sub code
|
Accuracy
|
Fg-Music_F
|
Bg-Music_F
|
No-Music_F
|
MMG2
|
0.8615 |
0.788 |
0.821 |
0.9064
|
Event-level Evaluation
Sub code
|
Fg-Music_F_500_on
|
Fg-Music_F_500_onoff
|
Fg-Music_F_1000_on
|
Fg-Music_F_1000_onoff
|
Bg-Music_F_500_on
|
Bg-Music_F_500_onoff
|
Bg-Music_F_1000_on
|
Bg-Music_F_1000_onoff
|
Speech_F_500_on
|
Speech_F_500_onoff
|
Speech_F_1000_on
|
Speech_F_1000_onoff
|
MMG2
|
0.3298 |
0.1775 |
0.4106 |
0.2742 |
0.3853 |
0.1388 |
0.4463 |
0.2024 |
0.5254 |
0.3123 |
0.5927 |
0.3925
|