Introduction
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the 2018:Music and/or Speech Detection page.
General Legend
Sub code
|
Abstract
|
Contributors
|
DD1
|
PDF |
David Doukhan, Eliott Lechapt, Marc Evrard, Jean Carrive
|
JHKK1
|
PDF |
Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|
JHKK2
|
PDF |
Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|
JHKK3
|
PDF |
Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|
LN1
|
PDF |
Minsuk Choi, Jongpil Lee, Juhan Nam
|
MM1
|
PDF |
Matija Marolt
|
MM2
|
PDF |
Matija Marolt
|
MM3
|
PDF |
Matija Marolt
|
MMG1
|
PDF |
Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|
MMG2
|
PDF |
Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|
MMG3
|
PDF |
Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|
Statistics notation
Accuracy = segment-level accuracy
<class>_P = segment-level precision for the class <class>
<class>_R = segment-level recall for the class <class>
<class>_F = segment-level F-measure for the class <class>
<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>
<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>
<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>
<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>
Datasets description
Dataset description
Task 1: Music Detection
Dataset 1
Segment-level Evaluation
Sub code
|
Accuracy
|
Music_P
|
Music_R
|
Music_F
|
No-Music_P
|
No-Music_R
|
No-Music_F
|
DD1
|
0.6860 |
0.905 |
0.3873 |
0.5424 |
0.6294 |
0.9624 |
0.7611
|
JHKK1
|
0.7798 |
0.9564 |
0.5675 |
0.7123 |
0.7092 |
9761 |
0.8215
|
JHKK2
|
0.8005 |
0.9824 |
0.5955 |
0.7415 |
0.7256 |
0.9902 |
0.8375
|
LN1(GAFMFSF)
|
0.6251 |
0.6915 |
0.3943 |
0.5022 |
0.5988 |
0.8385 |
0.6987
|
MM1
|
0.6135 |
0.8072 |
0.257 |
0.3899 |
0.5786 |
0.9432 |
0.7172
|
MM2
|
0.6807 |
0.857 |
0.4026 |
0.5478 |
0.6292 |
0.938 |
0.7531
|
MM3
|
0.6075 |
0.9873 |
0.1856 |
0.3124 |
0.5698 |
0.9978 |
0.7254
|
MMG1
|
0.9049 |
0.9131 |
0.8865 |
0.8996 |
0.8978 |
0.9219 |
0.9097
|
MMG3
|
0.8506 |
0.967 |
0.7134 |
0.8211 |
0.7866 |
0.9775 |
0.8717
|
Event-level Evaluation
Sub code
|
Music_F_500_on
|
Music_F_500_onoff
|
Music_F_1000_on
|
Music_F_1000_onoff
|
DD1
|
0.2877 |
0.093 |
0.312 |
0.1142
|
JHKK1
|
0.2303 |
0.0765 |
0.294 |
0.1173
|
JHKK2
|
0.2522 |
0.0931 |
0.3245 |
0.1389
|
LN1(GAFMFSF)
|
0.1348 |
0.0139 |
0.1704 |
0.0231
|
MM1
|
0.2044 |
0.0662 |
0.2137 |
0.0831
|
MM2
|
0.2464 |
0.0817 |
0.2736 |
0.1049
|
MM3
|
0.1379 |
0.0525 |
0.1619 |
0.0676
|
MMG1
|
0.5177 |
0.2693 |
0.5813 |
0.3502
|
MMG3
|
0.4403 |
0.1991 |
0.4973 |
0.2788
|
Dataset 2
Segment-level Evaluation
Sub code
|
Accuracy
|
Music_P
|
Music_R
|
Music_F
|
No-Music_P
|
No-Music_R
|
No-Music_F
|
DD1
|
0.9257 |
0.9751 |
0.8950 |
0.9334 |
0.8694 |
0.9683 |
0.9162
|
JHKK1
|
0.9415 |
0.9665 |
0.9315 |
0.9487 |
0.9094 |
0.9553 |
0.9318
|
JHKK2
|
0.9153 |
0.885 |
0.9817 |
0.9309 |
0.97 |
0.8233 |
0.8907
|
LN1(GAFMFSF)
|
0.7814 |
0.8319 |
0.7804 |
0.8053 |
0.7196 |
0.7828 |
0.7499
|
LN1(GAFMF)
|
0.7751 |
0.8481 |
0.7456 |
0.7936 |
0.6978 |
0.8161 |
0.7523
|
LN1(GAFSF)
|
0.7996 |
0.836 |
0.8137 |
0.8247 |
0.7507 |
0.78 |
0.7651
|
MM1
|
0.915 |
0.9765 |
0.8747 |
0.9228 |
0.8483 |
0.9708 |
0.9054
|
MM2
|
0.9032 |
0.9246 |
0.9072 |
0.9158 |
0.8745 |
0.8977 |
0.8859
|
MM3
|
0.8725 |
0.9794 |
0.7973 |
0.8791 |
0.7764 |
0.9769 |
0.8652
|
MMG1
|
0.9025 |
0.8586 |
0.9961 |
0.9223 |
0.9931 |
0.7726 |
0.8691
|
MMG3
|
0.949 |
0.9299 |
0.9865 |
0.9574 |
0.9795 |
0.8969 |
0.9364
|
Event-level Evaluation
Sub code
|
Music_F_500_on
|
Music_F_500_onoff
|
Music_F_1000_on
|
Music_F_1000_onoff
|
DD1
|
0.4089 |
0.2235 |
0.4402 |
0.248
|
JHKK1
|
0.1659 |
0.0347 |
0.2334 |
0.0636
|
JHKK2
|
0.167 |
0.029 |
0.2015 |
0.0599
|
LN1(GAFMFSF)
|
0.0991 |
0.0228 |
0.1319 |
0.0428
|
LN1(GAFMF)
|
0.1037 |
0.0257 |
0.139 |
0.0449
|
LN1(GAFSF)
|
0.1026 |
0.0249 |
0.1385 |
0.0425
|
MM1
|
0.1412 |
0.0159 |
0.1843 |
0.0392
|
MM2
|
0.1540 |
0.0312 |
0.231 |
0.0791
|
MM3
|
0.1516 |
0.0223 |
0.1962 |
0.0535
|
MMG1
|
0.1358 |
0.0173 |
0.1936 |
0.0347
|
MMG3
|
0.1785 |
0.0298 |
0.2645 |
0.0595
|
Task 2: Speech Detection
Dataset 1
Segment-level Evaluation
Sub code
|
Accuracy
|
Speech_P
|
Speech_R
|
Speech_F
|
No-Speech_P
|
No-Speech_R
|
No-Speech_F
|
DD1
|
0.877 |
0.909 |
0.9285 |
0.9186 |
0.7751 |
0.7251 |
0.7493
|
JHKK3
|
0.8307 |
0.9379 |
0.8279 |
0.8795 |
0.6219 |
0.839 |
0.7143
|
LN1(GAFMFSF)
|
0.6908 |
0.9579 |
0.6125 |
0.7472 |
0.4457 |
0.9213 |
0.6007
|
MM1
|
0.8626 |
0.8795 |
0.946 |
0.9115 |
0.7953 |
0.6169 |
0.6948
|
MM2
|
0.8619 |
0.8945 |
0.9241 |
0.909 |
0.7516 |
0.6782 |
0.713
|
MM3
|
0.8508 |
0.8383 |
0.9917 |
0.9086 |
0.9458 |
0.4357 |
0.5966
|
Event-level Evaluation
Sub code
|
Speech_F_500_on
|
Speech_F_500_onoff
|
Speech_F_1000_on
|
Speech_F_1000_onoff
|
DD1
|
0.415 |
0.1603 |
0.4477 |
0.2122
|
JHKK3
|
0.2882 |
0.0777 |
0.3289 |
0.0962
|
LN1
|
0.2686 |
0.0529 |
0.3484 |
0.0883
|
MM1
|
0.4607 |
0.2068 |
0.4898 |
0.2336
|
MM2
|
0.4422 |
0.1999 |
0.5093 |
0.266
|
MM3
|
0.4439 |
0.1775 |
0.4879 |
0.2122
|
Dataset 2
Segment-level Evaluation
Sub code
|
Accuracy
|
Speech_P
|
Speech_R
|
Speech_F
|
No-Speech_P
|
No-Speech_R
|
No-Speech_F
|
DD1
|
0.9617 |
0.9603 |
0.9564 |
0.9583 |
0.9633 |
0.9662 |
0.9648
|
JHKK3
|
0.8575 |
0.9125 |
0.7619 |
0.8305 |
0.8222 |
0.9384 |
0.8765
|
LN1(GAFMFSF)
|
0.8636 |
0.9587 |
0.7339 |
0.8314 |
0.8113 |
0.9733 |
0.885
|
LN1(GAFMF)
|
0.8754 |
0.9591 |
0.7604 |
0.8483 |
0.8267 |
0.9726 |
0.8937
|
LN1(GAFSF)
|
0.8597 |
0.959 |
0.7249 |
0.8256 |
0.8062 |
0.9739 |
0.8821
|
MM1
|
0.9367 |
0.9134 |
0.9526 |
0.9326 |
0.9585 |
0.9232 |
0.9405
|
MM2
|
0.9226 |
0.9328 |
0.8959 |
0.914 |
0.9147 |
0.9451 |
0.9296
|
MM3
|
0.8973 |
0.8289 |
0.9781 |
0.8973 |
0.978 |
0.829 |
0.8974
|
Event-level Evaluation
Sub code
|
Speech_F_500_on
|
Speech_F_500_onoff
|
Speech_F_1000_on
|
Speech_F_1000_onoff
|
DD1
|
0.6037 |
0.4139 |
0.6318 |
0.435
|
JHKK3
|
0.1585 |
0.0405 |
0.2095 |
0.0563
|
LN1(GAFMFSF)
|
0.1775 |
0.0399 |
0.2426 |
0.0738
|
LN1(GAFMF)
|
0.1903 |
0.0548 |
0.2606 |
0.0918
|
LN1(GAFSF)
|
0.1839 |
0.0452 |
0.2446 |
0.0731
|
MM1
|
0.0632 |
0.0015 |
0.0947 |
0.0150
|
MM2
|
0.1162 |
0.0211 |
0.1737 |
0.0469
|
MM3
|
0.0796 |
0.0152 |
0.123 |
0.0281
|
Task 3: Music and Speech Detection
Dataset 1
Segment-level Evaluation
Sub code
|
Music_P
|
Music_R
|
Music_F
|
Speech_P
|
Speech_R
|
Speech_F
|
LN1(GAFMFSF)
|
0.624 |
0.4082 |
0.4936 |
0.9683 |
0.6415 |
0.7718
|
MM1
|
0.8072 |
0.257 |
0.3899 |
0.8795 |
0.946 |
0.9115
|
MM2
|
0.857 |
0.4026 |
0.5478 |
0.8945 |
0.9241 |
0.909
|
MM3
|
0.9873 |
0.1856 |
0.3124 |
0.8383 |
0.9917 |
0.9086
|
Event-level Evaluation
Sub code
|
Music_F_500_on
|
Music_F_500_onoff
|
Music_F_1000_on
|
Music_F_1000_onoff
|
Speech_F_500_on
|
Speech_F_500_onoff
|
Speech_F_1000_on
|
Speech_F_1000_onoff
|
LN1(GAFMFSF)
|
0.1116 |
0.0088 |
0.1459 |
0.0186 |
0.2645 |
0.0462 |
0.348 |
0.0786
|
MM1
|
0.2044 |
0.0662 |
0.2137 |
0.0831 |
0.4607 |
0.2068 |
0.4898 |
0.2336
|
MM2
|
0.2464 |
0.0817 |
0.2736 |
0.1049 |
0.4422 |
0.1999 |
0.5093 |
0.266
|
MM3
|
0.1379 |
0.0525 |
0.1619 |
0.0676 |
0.4439 |
0.1775 |
0.4879 |
0.2122
|
Dataset 2
Segment-level Evaluation
Sub code
|
Music_P
|
Music_R
|
Music_F
|
Speech_P
|
Speech_R
|
Speech_F
|
LN1(GAFMFSF)
|
0.813 |
0.7599 |
0.7855 |
0.9671 |
0.7511 |
0.8455
|
LN1(GAFMF)
|
0.7682 |
0.7504 |
0.7592 |
0.9747 |
0.6625 |
0.7888
|
LN1(GAFSF)
|
0.797 |
0.7965 |
0.7968 |
0.9637 |
0.7178 |
0.8227
|
MM1
|
0.9765 |
0.8747 |
0.9228 |
0.9134 |
0.9526 |
0.9326
|
MM2
|
0.9246 |
0.9072 |
0.9158 |
0.9328 |
0.8959 |
0.914
|
MM3
|
0.9794 |
0.7973 |
0.8791 |
0.8289 |
0.9781 |
0.8973
|
Event-level Evaluation
Sub code
|
Music_F_500_on
|
Music_F_500_onoff
|
Music_F_1000_on
|
Music_F_1000_onoff
|
Speech_F_500_on
|
Speech_F_500_onoff
|
Speech_F_1000_on
|
Speech_F_1000_onoff
|
LN1(GAFMFSF)
|
0.087 |
0.0232 |
0.1133 |
0.0375 |
0.2233 |
0.0766 |
0.3148 |
0.1277
|
LN1(GAFMF)
|
0.0727 |
0.0197 |
0.0965 |
0.031 |
0.1918 |
0.0505 |
0.2637 |
0.0889
|
LN1(GAFSF)
|
0.0677 |
0.0145 |
0.0977 |
0.0266 |
0.2063 |
0.0524 |
0.2804 |
0.092
|
MM1
|
0.1412 |
0.0157 |
0.1843 |
0.0392 |
0.0632 |
0.0015 |
0.0947 |
0.015
|
MM2
|
0.154 |
0.0312 |
0.231 |
0.0791 |
0.1162 |
0.0211 |
0.1737 |
0.0469
|
MM3
|
0.1516 |
0.0223 |
0.1962 |
0.0535 |
0.0796 |
0.0152 |
0.123 |
0.0281
|
Task 4: Music Relative Loudness Estimation
Dataset 1
Segment-level Evaluation
Sub code
|
Accuracy
|
Fg-Music_P
|
Fg-Music_R
|
Fg-Music_F
|
Bg-Music_P
|
Bg-Music_R
|
Bg-Music_F
|
No-Music_P
|
No-Music_R
|
No-Music_F
|
MMG2
|
0.8615 |
0.8025 |
0.774 |
0.788 |
0.8211 |
0.821 |
0.821 |
0.9026 |
0.9103 |
0.9064
|
Event-level Evaluation
Sub code
|
Fg-Music_F_500_on
|
Fg-Music_F_500_onoff
|
Fg-Music_F_1000_on
|
Fg-Music_F_1000_onoff
|
Bg-Music_F_500_on
|
Bg-Music_F_500_onoff
|
Bg-Music_F_1000_on
|
Bg-Music_F_1000_onoff
|
No-Music_F_500_on
|
No-Music_F_500_onoff
|
No-Music_F_1000_on
|
Speech_F_1000_onoff
|
MMG2
|
0.3298 |
0.1775 |
0.4106 |
0.2742 |
0.3853 |
0.1388 |
0.4463 |
0.2024 |
0.5254 |
0.3123 |
0.5927 |
0.3925
|