Difference between revisions of "2018:Music and or Speech Detection Results"
(Created page with "Coming soon.") |
(→Event-level Evaluation) |
||
(62 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
− | + | ==Introduction== | |
+ | These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page. | ||
+ | |||
+ | ==General Legend== | ||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 800px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Abstract | ||
+ | ! width="440" | Contributors | ||
+ | |- | ||
+ | ! DD1 | ||
+ | | [https://www.music-ir.org/mirex/abstracts/2018/DD1.pdf PDF] || David Doukhan, Eliott Lechapt, Marc Evrard, Jean Carrive | ||
+ | |- | ||
+ | ! JHKK1 | ||
+ | | [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon | ||
+ | |- | ||
+ | ! JHKK2 | ||
+ | | [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon | ||
+ | |- | ||
+ | ! JHKK3 | ||
+ | | [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon | ||
+ | |- | ||
+ | ! LN1 | ||
+ | | [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt | ||
+ | |- | ||
+ | ! MMG1 | ||
+ | | [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez | ||
+ | |- | ||
+ | ! MMG2 | ||
+ | | [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez | ||
+ | |- | ||
+ | ! MMG3 | ||
+ | | [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez | ||
+ | |} | ||
+ | |||
+ | ==Statistics notation== | ||
+ | |||
+ | Accuracy = segment-level accuracy | ||
+ | |||
+ | <class>_P = segment-level precision for the class <class> | ||
+ | |||
+ | <class>_R = segment-level recall for the class <class> | ||
+ | |||
+ | <class>_F = segment-level F-measure for the class <class> | ||
+ | |||
+ | <class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class> | ||
+ | |||
+ | <class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class> | ||
+ | |||
+ | <class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class> | ||
+ | |||
+ | <class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class> | ||
+ | |||
+ | ==Datasets description== | ||
+ | |||
+ | [https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description] | ||
+ | |||
+ | ==Task 1: Music Detection== | ||
+ | |||
+ | ===Dataset 1=== | ||
+ | |||
+ | ====Segment-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Accuracy | ||
+ | ! width="80" | Music_P | ||
+ | ! width="80" | Music_R | ||
+ | ! width="80" | Music_F | ||
+ | ! width="80" | No-Music_P | ||
+ | ! width="80" | No-Music_R | ||
+ | ! width="80" | No-Music_F | ||
+ | |- | ||
+ | ! DD1 | ||
+ | | 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611 | ||
+ | |- | ||
+ | ! JHKK1 | ||
+ | | 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215 | ||
+ | |- | ||
+ | ! JHKK2 | ||
+ | | 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375 | ||
+ | |- | ||
+ | ! LN1(GAFMFSF) | ||
+ | | 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987 | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172 | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531 | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254 | ||
+ | |- | ||
+ | ! MMG1 | ||
+ | | 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097 | ||
+ | |- | ||
+ | ! MMG3 | ||
+ | | 0.8506 || 0.967 || 0.7134 || 0.8211 || 0.7866 || 0.9775 || 0.8717 | ||
+ | |} | ||
+ | |||
+ | ====Event-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Music_F_500_on | ||
+ | ! width="80" | Music_F_500_onoff | ||
+ | ! width="80" | Music_F_1000_on | ||
+ | ! width="80" | Music_F_1000_onoff | ||
+ | |- | ||
+ | ! DD1 | ||
+ | | 0.2877 || 0.093 || 0.312 || 0.1142 | ||
+ | |- | ||
+ | ! JHKK1 | ||
+ | | 0.2303 || 0.0765 || 0.294 || 0.1173 | ||
+ | |- | ||
+ | ! JHKK2 | ||
+ | | 0.2522 || 0.0931 || 0.3245 || 0.1389 | ||
+ | |- | ||
+ | ! LN1(GAFMFSF) | ||
+ | | 0.1348 || 0.0139 || 0.1704 || 0.0231 | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | 0.2044 || 0.0662 || 0.2137 || 0.0831 | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | 0.2464 || 0.0817 || 0.2736 || 0.1049 | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | 0.1379 || 0.0525 || 0.1619 || 0.0676 | ||
+ | |- | ||
+ | ! MMG1 | ||
+ | | 0.5177 || 0.2693 || 0.5813 || 0.3502 | ||
+ | |- | ||
+ | ! MMG3 | ||
+ | | 0.4403 || 0.1991 || 0.4973 || 0.2788 | ||
+ | |} | ||
+ | |||
+ | ===Dataset 2=== | ||
+ | |||
+ | ====Segment-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Accuracy | ||
+ | ! width="80" | Music_P | ||
+ | ! width="80" | Music_R | ||
+ | ! width="80" | Music_F | ||
+ | ! width="80" | No-Music_P | ||
+ | ! width="80" | No-Music_R | ||
+ | ! width="80" | No-Music_F | ||
+ | |- | ||
+ | ! DD1 | ||
+ | | 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162 | ||
+ | |- | ||
+ | ! JHKK1 | ||
+ | | 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318 | ||
+ | |- | ||
+ | ! JHKK2 | ||
+ | | 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907 | ||
+ | |- | ||
+ | ! LN1(GAFMFSF) | ||
+ | | 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499 | ||
+ | |- | ||
+ | ! LN1(GAFMF) | ||
+ | | 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523 | ||
+ | |- | ||
+ | ! LN1(GAFSF) | ||
+ | | 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651 | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054 | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859 | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652 | ||
+ | |- | ||
+ | ! MMG1 | ||
+ | | 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691 | ||
+ | |- | ||
+ | ! MMG3 | ||
+ | | 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364 | ||
+ | |} | ||
+ | |||
+ | ====Event-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Music_F_500_on | ||
+ | ! width="80" | Music_F_500_onoff | ||
+ | ! width="80" | Music_F_1000_on | ||
+ | ! width="80" | Music_F_1000_onoff | ||
+ | |- | ||
+ | ! DD1 | ||
+ | | 0.4089 || 0.2235 || 0.4402 || 0.248 | ||
+ | |- | ||
+ | ! JHKK1 | ||
+ | | 0.1659 || 0.0347 || 0.2334 || 0.0636 | ||
+ | |- | ||
+ | ! JHKK2 | ||
+ | | 0.167 || 0.029 || 0.2015 || 0.0599 | ||
+ | |- | ||
+ | ! LN1(GAFMFSF) | ||
+ | | 0.0991 || 0.0228 || 0.1319 || 0.0428 | ||
+ | |- | ||
+ | ! LN1(GAFMF) | ||
+ | | 0.1037 || 0.0257 || 0.139 || 0.0449 | ||
+ | |- | ||
+ | ! LN1(GAFSF) | ||
+ | | 0.1026 || 0.0249 || 0.1385 || 0.0425 | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | 0.1412 || 0.0159 || 0.1843 || 0.0392 | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | 0.1540 || 0.0312 || 0.231 || 0.0791 | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | 0.1516 || 0.0223 || 0.1962 || 0.0535 | ||
+ | |- | ||
+ | ! MMG1 | ||
+ | | 0.1358 || 0.0173 || 0.1936 || 0.0347 | ||
+ | |- | ||
+ | ! MMG3 | ||
+ | | 0.1785 || 0.0298 || 0.2645 || 0.0595 | ||
+ | |} | ||
+ | |||
+ | ==Task 2: Speech Detection== | ||
+ | |||
+ | ===Dataset 1=== | ||
+ | |||
+ | ====Segment-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Accuracy | ||
+ | ! width="80" | Speech_P | ||
+ | ! width="80" | Speech_R | ||
+ | ! width="80" | Speech_F | ||
+ | ! width="80" | No-Speech_P | ||
+ | ! width="80" | No-Speech_R | ||
+ | ! width="80" | No-Speech_F | ||
+ | |- | ||
+ | ! DD1 | ||
+ | | 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493 | ||
+ | |- | ||
+ | ! JHKK3 | ||
+ | | 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143 | ||
+ | |- | ||
+ | ! LN1(GAFMFSF) | ||
+ | | 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007 | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948 | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713 | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966 | ||
+ | |} | ||
+ | |||
+ | ====Event-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Speech_F_500_on | ||
+ | ! width="80" | Speech_F_500_onoff | ||
+ | ! width="80" | Speech_F_1000_on | ||
+ | ! width="80" | Speech_F_1000_onoff | ||
+ | |- | ||
+ | ! DD1 | ||
+ | | 0.415 || 0.1603 || 0.4477 || 0.2122 | ||
+ | |- | ||
+ | ! JHKK3 | ||
+ | | 0.2882 || 0.0777 || 0.3289 || 0.0962 | ||
+ | |- | ||
+ | ! LN1 | ||
+ | | 0.2686 || 0.0529 || 0.3484 || 0.0883 | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | 0.4607 || 0.2068 || 0.4898 || 0.2336 | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | 0.4422 || 0.1999 || 0.5093 || 0.266 | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | 0.4439 || 0.1775 || 0.4879 || 0.2122 | ||
+ | |} | ||
+ | |||
+ | ===Dataset 2=== | ||
+ | |||
+ | ====Segment-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Accuracy | ||
+ | ! width="80" | Speech_P | ||
+ | ! width="80" | Speech_R | ||
+ | ! width="80" | Speech_F | ||
+ | ! width="80" | No-Speech_P | ||
+ | ! width="80" | No-Speech_R | ||
+ | ! width="80" | No-Speech_F | ||
+ | |- | ||
+ | ! DD1 | ||
+ | | 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648 | ||
+ | |- | ||
+ | ! JHKK3 | ||
+ | | 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765 | ||
+ | |- | ||
+ | ! LN1(GAFMFSF) | ||
+ | | 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885 | ||
+ | |- | ||
+ | ! LN1(GAFMF) | ||
+ | | 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937 | ||
+ | |- | ||
+ | ! LN1(GAFSF) | ||
+ | | 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821 | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405 | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296 | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974 | ||
+ | |} | ||
+ | |||
+ | ====Event-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Speech_F_500_on | ||
+ | ! width="80" | Speech_F_500_onoff | ||
+ | ! width="80" | Speech_F_1000_on | ||
+ | ! width="80" | Speech_F_1000_onoff | ||
+ | |- | ||
+ | ! DD1 | ||
+ | | 0.6037 || 0.4139 || 0.6318 || 0.435 | ||
+ | |- | ||
+ | ! JHKK3 | ||
+ | | 0.1585 || 0.0405 || 0.2095 || 0.0563 | ||
+ | |- | ||
+ | ! LN1(GAFMFSF) | ||
+ | | 0.1775 || 0.0399 || 0.2426 || 0.0738 | ||
+ | |- | ||
+ | ! LN1(GAFMF) | ||
+ | | 0.1903 || 0.0548 || 0.2606 || 0.0918 | ||
+ | |- | ||
+ | ! LN1(GAFSF) | ||
+ | | 0.1839 || 0.0452 || 0.2446 || 0.0731 | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | 0.0632 || 0.0015 || 0.0947 || 0.0150 | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | 0.1162 || 0.0211 || 0.1737 || 0.0469 | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | 0.0796 || 0.0152 || 0.123 || 0.0281 | ||
+ | |} | ||
+ | |||
+ | ==Task 3: Music and Speech Detection== | ||
+ | |||
+ | ===Dataset 1=== | ||
+ | |||
+ | ====Segment-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Music_P | ||
+ | ! width="80" | Music_R | ||
+ | ! width="80" | Music_F | ||
+ | ! width="80" | Speech_P | ||
+ | ! width="80" | Speech_R | ||
+ | ! width="80" | Speech_F | ||
+ | |- | ||
+ | ! LN1(GAFMFSF) | ||
+ | | 0.624 || 0.4082 || 0.4936 || 0.9683 || 0.6415 || 0.7718 | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | 0.8072 || 0.257 || 0.3899 || 0.8795 || 0.946 || 0.9115 | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | 0.857 || 0.4026 || 0.5478 || 0.8945 || 0.9241 || 0.909 | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | 0.9873 || 0.1856 || 0.3124 || 0.8383 || 0.9917 || 0.9086 | ||
+ | |} | ||
+ | |||
+ | ====Event-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Music_F_500_on | ||
+ | ! width="80" | Music_F_500_onoff | ||
+ | ! width="80" | Music_F_1000_on | ||
+ | ! width="80" | Music_F_1000_onoff | ||
+ | ! width="80" | Speech_F_500_on | ||
+ | ! width="80" | Speech_F_500_onoff | ||
+ | ! width="80" | Speech_F_1000_on | ||
+ | ! width="80" | Speech_F_1000_onoff | ||
+ | |- | ||
+ | ! LN1(GAFMFSF) | ||
+ | | 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786 | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336 | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266 | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122 | ||
+ | |} | ||
+ | |||
+ | ===Dataset 2=== | ||
+ | |||
+ | ====Segment-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Music_P | ||
+ | ! width="80" | Music_R | ||
+ | ! width="80" | Music_F | ||
+ | ! width="80" | Speech_P | ||
+ | ! width="80" | Speech_R | ||
+ | ! width="80" | Speech_F | ||
+ | |- | ||
+ | ! LN1(GAFMFSF) | ||
+ | | 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455 | ||
+ | |- | ||
+ | ! LN1(GAFMF) | ||
+ | | 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888 | ||
+ | |- | ||
+ | ! LN1(GAFSF) | ||
+ | | 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227 | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326 | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914 | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973 | ||
+ | |} | ||
+ | |||
+ | ====Event-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Music_F_500_on | ||
+ | ! width="80" | Music_F_500_onoff | ||
+ | ! width="80" | Music_F_1000_on | ||
+ | ! width="80" | Music_F_1000_onoff | ||
+ | ! width="80" | Speech_F_500_on | ||
+ | ! width="80" | Speech_F_500_onoff | ||
+ | ! width="80" | Speech_F_1000_on | ||
+ | ! width="80" | Speech_F_1000_onoff | ||
+ | |- | ||
+ | ! LN1(GAFMFSF) | ||
+ | | 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277 | ||
+ | |- | ||
+ | ! LN1(GAFMF) | ||
+ | | 0.0727 || 0.0197 || 0.0965 || 0.031 || 0.1918 || 0.0505 || 0.2637 || 0.0889 | ||
+ | |- | ||
+ | ! LN1(GAFSF) | ||
+ | | 0.0677 || 0.0145 || 0.0977 || 0.0266 || 0.2063 || 0.0524 || 0.2804 || 0.092 | ||
+ | |- | ||
+ | ! MM1 | ||
+ | | 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015 | ||
+ | |- | ||
+ | ! MM2 | ||
+ | | 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469 | ||
+ | |- | ||
+ | ! MM3 | ||
+ | | 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281 | ||
+ | |} | ||
+ | |||
+ | ==Task 4: Music Relative Loudness Estimation== | ||
+ | |||
+ | ===Dataset 1=== | ||
+ | |||
+ | ====Segment-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Accuracy | ||
+ | ! width="80" | Fg-Music_P | ||
+ | ! width="80" | Fg-Music_R | ||
+ | ! width="80" | Fg-Music_F | ||
+ | ! width="80" | Bg-Music_P | ||
+ | ! width="80" | Bg-Music_R | ||
+ | ! width="80" | Bg-Music_F | ||
+ | ! width="80" | No-Music_P | ||
+ | ! width="80" | No-Music_R | ||
+ | ! width="80" | No-Music_F | ||
+ | |- | ||
+ | ! MMG2 | ||
+ | | 0.8615 || 0.8025 || 0.774 || 0.788 || 0.8211 || 0.821 || 0.821 || 0.9026 || 0.9103 || 0.9064 | ||
+ | |} | ||
+ | |||
+ | ====Event-level Evaluation==== | ||
+ | |||
+ | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
+ | |- style="background: yellow;" | ||
+ | ! width="80" | Sub code | ||
+ | ! width="80" style="text-align: center;" | Fg-Music_F_500_on | ||
+ | ! width="80" | Fg-Music_F_500_onoff | ||
+ | ! width="80" | Fg-Music_F_1000_on | ||
+ | ! width="80" | Fg-Music_F_1000_onoff | ||
+ | ! width="80" | Bg-Music_F_500_on | ||
+ | ! width="80" | Bg-Music_F_500_onoff | ||
+ | ! width="80" | Bg-Music_F_1000_on | ||
+ | ! width="80" | Bg-Music_F_1000_onoff | ||
+ | ! width="80" | No-Music_F_500_on | ||
+ | ! width="80" | No-Music_F_500_onoff | ||
+ | ! width="80" | No-Music_F_1000_on | ||
+ | ! width="80" | Speech_F_1000_onoff | ||
+ | |- | ||
+ | ! MMG2 | ||
+ | | 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925 | ||
+ | |} |
Latest revision as of 15:44, 24 September 2018
Contents
Introduction
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the 2018:Music and/or Speech Detection page.
General Legend
Sub code | Abstract | Contributors |
---|---|---|
DD1 | David Doukhan, Eliott Lechapt, Marc Evrard, Jean Carrive | |
JHKK1 | Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon | |
JHKK2 | Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon | |
JHKK3 | Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon | |
LN1 | Minsuk Choi, Jongpil Lee, Juhan Nam | |
MM1 | Matija Marolt | |
MM2 | Matija Marolt | |
MM3 | Matija Marolt | |
MMG1 | Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez | |
MMG2 | Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez | |
MMG3 | Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez |
Statistics notation
Accuracy = segment-level accuracy
<class>_P = segment-level precision for the class <class>
<class>_R = segment-level recall for the class <class>
<class>_F = segment-level F-measure for the class <class>
<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>
<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>
<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>
<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>
Datasets description
Task 1: Music Detection
Dataset 1
Segment-level Evaluation
Sub code | Accuracy | Music_P | Music_R | Music_F | No-Music_P | No-Music_R | No-Music_F |
---|---|---|---|---|---|---|---|
DD1 | 0.6860 | 0.905 | 0.3873 | 0.5424 | 0.6294 | 0.9624 | 0.7611 |
JHKK1 | 0.7798 | 0.9564 | 0.5675 | 0.7123 | 0.7092 | 9761 | 0.8215 |
JHKK2 | 0.8005 | 0.9824 | 0.5955 | 0.7415 | 0.7256 | 0.9902 | 0.8375 |
LN1(GAFMFSF) | 0.6251 | 0.6915 | 0.3943 | 0.5022 | 0.5988 | 0.8385 | 0.6987 |
MM1 | 0.6135 | 0.8072 | 0.257 | 0.3899 | 0.5786 | 0.9432 | 0.7172 |
MM2 | 0.6807 | 0.857 | 0.4026 | 0.5478 | 0.6292 | 0.938 | 0.7531 |
MM3 | 0.6075 | 0.9873 | 0.1856 | 0.3124 | 0.5698 | 0.9978 | 0.7254 |
MMG1 | 0.9049 | 0.9131 | 0.8865 | 0.8996 | 0.8978 | 0.9219 | 0.9097 |
MMG3 | 0.8506 | 0.967 | 0.7134 | 0.8211 | 0.7866 | 0.9775 | 0.8717 |
Event-level Evaluation
Sub code | Music_F_500_on | Music_F_500_onoff | Music_F_1000_on | Music_F_1000_onoff |
---|---|---|---|---|
DD1 | 0.2877 | 0.093 | 0.312 | 0.1142 |
JHKK1 | 0.2303 | 0.0765 | 0.294 | 0.1173 |
JHKK2 | 0.2522 | 0.0931 | 0.3245 | 0.1389 |
LN1(GAFMFSF) | 0.1348 | 0.0139 | 0.1704 | 0.0231 |
MM1 | 0.2044 | 0.0662 | 0.2137 | 0.0831 |
MM2 | 0.2464 | 0.0817 | 0.2736 | 0.1049 |
MM3 | 0.1379 | 0.0525 | 0.1619 | 0.0676 |
MMG1 | 0.5177 | 0.2693 | 0.5813 | 0.3502 |
MMG3 | 0.4403 | 0.1991 | 0.4973 | 0.2788 |
Dataset 2
Segment-level Evaluation
Sub code | Accuracy | Music_P | Music_R | Music_F | No-Music_P | No-Music_R | No-Music_F |
---|---|---|---|---|---|---|---|
DD1 | 0.9257 | 0.9751 | 0.8950 | 0.9334 | 0.8694 | 0.9683 | 0.9162 |
JHKK1 | 0.9415 | 0.9665 | 0.9315 | 0.9487 | 0.9094 | 0.9553 | 0.9318 |
JHKK2 | 0.9153 | 0.885 | 0.9817 | 0.9309 | 0.97 | 0.8233 | 0.8907 |
LN1(GAFMFSF) | 0.7814 | 0.8319 | 0.7804 | 0.8053 | 0.7196 | 0.7828 | 0.7499 |
LN1(GAFMF) | 0.7751 | 0.8481 | 0.7456 | 0.7936 | 0.6978 | 0.8161 | 0.7523 |
LN1(GAFSF) | 0.7996 | 0.836 | 0.8137 | 0.8247 | 0.7507 | 0.78 | 0.7651 |
MM1 | 0.915 | 0.9765 | 0.8747 | 0.9228 | 0.8483 | 0.9708 | 0.9054 |
MM2 | 0.9032 | 0.9246 | 0.9072 | 0.9158 | 0.8745 | 0.8977 | 0.8859 |
MM3 | 0.8725 | 0.9794 | 0.7973 | 0.8791 | 0.7764 | 0.9769 | 0.8652 |
MMG1 | 0.9025 | 0.8586 | 0.9961 | 0.9223 | 0.9931 | 0.7726 | 0.8691 |
MMG3 | 0.949 | 0.9299 | 0.9865 | 0.9574 | 0.9795 | 0.8969 | 0.9364 |
Event-level Evaluation
Sub code | Music_F_500_on | Music_F_500_onoff | Music_F_1000_on | Music_F_1000_onoff |
---|---|---|---|---|
DD1 | 0.4089 | 0.2235 | 0.4402 | 0.248 |
JHKK1 | 0.1659 | 0.0347 | 0.2334 | 0.0636 |
JHKK2 | 0.167 | 0.029 | 0.2015 | 0.0599 |
LN1(GAFMFSF) | 0.0991 | 0.0228 | 0.1319 | 0.0428 |
LN1(GAFMF) | 0.1037 | 0.0257 | 0.139 | 0.0449 |
LN1(GAFSF) | 0.1026 | 0.0249 | 0.1385 | 0.0425 |
MM1 | 0.1412 | 0.0159 | 0.1843 | 0.0392 |
MM2 | 0.1540 | 0.0312 | 0.231 | 0.0791 |
MM3 | 0.1516 | 0.0223 | 0.1962 | 0.0535 |
MMG1 | 0.1358 | 0.0173 | 0.1936 | 0.0347 |
MMG3 | 0.1785 | 0.0298 | 0.2645 | 0.0595 |
Task 2: Speech Detection
Dataset 1
Segment-level Evaluation
Sub code | Accuracy | Speech_P | Speech_R | Speech_F | No-Speech_P | No-Speech_R | No-Speech_F |
---|---|---|---|---|---|---|---|
DD1 | 0.877 | 0.909 | 0.9285 | 0.9186 | 0.7751 | 0.7251 | 0.7493 |
JHKK3 | 0.8307 | 0.9379 | 0.8279 | 0.8795 | 0.6219 | 0.839 | 0.7143 |
LN1(GAFMFSF) | 0.6908 | 0.9579 | 0.6125 | 0.7472 | 0.4457 | 0.9213 | 0.6007 |
MM1 | 0.8626 | 0.8795 | 0.946 | 0.9115 | 0.7953 | 0.6169 | 0.6948 |
MM2 | 0.8619 | 0.8945 | 0.9241 | 0.909 | 0.7516 | 0.6782 | 0.713 |
MM3 | 0.8508 | 0.8383 | 0.9917 | 0.9086 | 0.9458 | 0.4357 | 0.5966 |
Event-level Evaluation
Sub code | Speech_F_500_on | Speech_F_500_onoff | Speech_F_1000_on | Speech_F_1000_onoff |
---|---|---|---|---|
DD1 | 0.415 | 0.1603 | 0.4477 | 0.2122 |
JHKK3 | 0.2882 | 0.0777 | 0.3289 | 0.0962 |
LN1 | 0.2686 | 0.0529 | 0.3484 | 0.0883 |
MM1 | 0.4607 | 0.2068 | 0.4898 | 0.2336 |
MM2 | 0.4422 | 0.1999 | 0.5093 | 0.266 |
MM3 | 0.4439 | 0.1775 | 0.4879 | 0.2122 |
Dataset 2
Segment-level Evaluation
Sub code | Accuracy | Speech_P | Speech_R | Speech_F | No-Speech_P | No-Speech_R | No-Speech_F |
---|---|---|---|---|---|---|---|
DD1 | 0.9617 | 0.9603 | 0.9564 | 0.9583 | 0.9633 | 0.9662 | 0.9648 |
JHKK3 | 0.8575 | 0.9125 | 0.7619 | 0.8305 | 0.8222 | 0.9384 | 0.8765 |
LN1(GAFMFSF) | 0.8636 | 0.9587 | 0.7339 | 0.8314 | 0.8113 | 0.9733 | 0.885 |
LN1(GAFMF) | 0.8754 | 0.9591 | 0.7604 | 0.8483 | 0.8267 | 0.9726 | 0.8937 |
LN1(GAFSF) | 0.8597 | 0.959 | 0.7249 | 0.8256 | 0.8062 | 0.9739 | 0.8821 |
MM1 | 0.9367 | 0.9134 | 0.9526 | 0.9326 | 0.9585 | 0.9232 | 0.9405 |
MM2 | 0.9226 | 0.9328 | 0.8959 | 0.914 | 0.9147 | 0.9451 | 0.9296 |
MM3 | 0.8973 | 0.8289 | 0.9781 | 0.8973 | 0.978 | 0.829 | 0.8974 |
Event-level Evaluation
Sub code | Speech_F_500_on | Speech_F_500_onoff | Speech_F_1000_on | Speech_F_1000_onoff |
---|---|---|---|---|
DD1 | 0.6037 | 0.4139 | 0.6318 | 0.435 |
JHKK3 | 0.1585 | 0.0405 | 0.2095 | 0.0563 |
LN1(GAFMFSF) | 0.1775 | 0.0399 | 0.2426 | 0.0738 |
LN1(GAFMF) | 0.1903 | 0.0548 | 0.2606 | 0.0918 |
LN1(GAFSF) | 0.1839 | 0.0452 | 0.2446 | 0.0731 |
MM1 | 0.0632 | 0.0015 | 0.0947 | 0.0150 |
MM2 | 0.1162 | 0.0211 | 0.1737 | 0.0469 |
MM3 | 0.0796 | 0.0152 | 0.123 | 0.0281 |
Task 3: Music and Speech Detection
Dataset 1
Segment-level Evaluation
Sub code | Music_P | Music_R | Music_F | Speech_P | Speech_R | Speech_F |
---|---|---|---|---|---|---|
LN1(GAFMFSF) | 0.624 | 0.4082 | 0.4936 | 0.9683 | 0.6415 | 0.7718 |
MM1 | 0.8072 | 0.257 | 0.3899 | 0.8795 | 0.946 | 0.9115 |
MM2 | 0.857 | 0.4026 | 0.5478 | 0.8945 | 0.9241 | 0.909 |
MM3 | 0.9873 | 0.1856 | 0.3124 | 0.8383 | 0.9917 | 0.9086 |
Event-level Evaluation
Sub code | Music_F_500_on | Music_F_500_onoff | Music_F_1000_on | Music_F_1000_onoff | Speech_F_500_on | Speech_F_500_onoff | Speech_F_1000_on | Speech_F_1000_onoff |
---|---|---|---|---|---|---|---|---|
LN1(GAFMFSF) | 0.1116 | 0.0088 | 0.1459 | 0.0186 | 0.2645 | 0.0462 | 0.348 | 0.0786 |
MM1 | 0.2044 | 0.0662 | 0.2137 | 0.0831 | 0.4607 | 0.2068 | 0.4898 | 0.2336 |
MM2 | 0.2464 | 0.0817 | 0.2736 | 0.1049 | 0.4422 | 0.1999 | 0.5093 | 0.266 |
MM3 | 0.1379 | 0.0525 | 0.1619 | 0.0676 | 0.4439 | 0.1775 | 0.4879 | 0.2122 |
Dataset 2
Segment-level Evaluation
Sub code | Music_P | Music_R | Music_F | Speech_P | Speech_R | Speech_F |
---|---|---|---|---|---|---|
LN1(GAFMFSF) | 0.813 | 0.7599 | 0.7855 | 0.9671 | 0.7511 | 0.8455 |
LN1(GAFMF) | 0.7682 | 0.7504 | 0.7592 | 0.9747 | 0.6625 | 0.7888 |
LN1(GAFSF) | 0.797 | 0.7965 | 0.7968 | 0.9637 | 0.7178 | 0.8227 |
MM1 | 0.9765 | 0.8747 | 0.9228 | 0.9134 | 0.9526 | 0.9326 |
MM2 | 0.9246 | 0.9072 | 0.9158 | 0.9328 | 0.8959 | 0.914 |
MM3 | 0.9794 | 0.7973 | 0.8791 | 0.8289 | 0.9781 | 0.8973 |
Event-level Evaluation
Sub code | Music_F_500_on | Music_F_500_onoff | Music_F_1000_on | Music_F_1000_onoff | Speech_F_500_on | Speech_F_500_onoff | Speech_F_1000_on | Speech_F_1000_onoff |
---|---|---|---|---|---|---|---|---|
LN1(GAFMFSF) | 0.087 | 0.0232 | 0.1133 | 0.0375 | 0.2233 | 0.0766 | 0.3148 | 0.1277 |
LN1(GAFMF) | 0.0727 | 0.0197 | 0.0965 | 0.031 | 0.1918 | 0.0505 | 0.2637 | 0.0889 |
LN1(GAFSF) | 0.0677 | 0.0145 | 0.0977 | 0.0266 | 0.2063 | 0.0524 | 0.2804 | 0.092 |
MM1 | 0.1412 | 0.0157 | 0.1843 | 0.0392 | 0.0632 | 0.0015 | 0.0947 | 0.015 |
MM2 | 0.154 | 0.0312 | 0.231 | 0.0791 | 0.1162 | 0.0211 | 0.1737 | 0.0469 |
MM3 | 0.1516 | 0.0223 | 0.1962 | 0.0535 | 0.0796 | 0.0152 | 0.123 | 0.0281 |
Task 4: Music Relative Loudness Estimation
Dataset 1
Segment-level Evaluation
Sub code | Accuracy | Fg-Music_P | Fg-Music_R | Fg-Music_F | Bg-Music_P | Bg-Music_R | Bg-Music_F | No-Music_P | No-Music_R | No-Music_F |
---|---|---|---|---|---|---|---|---|---|---|
MMG2 | 0.8615 | 0.8025 | 0.774 | 0.788 | 0.8211 | 0.821 | 0.821 | 0.9026 | 0.9103 | 0.9064 |
Event-level Evaluation
Sub code | Fg-Music_F_500_on | Fg-Music_F_500_onoff | Fg-Music_F_1000_on | Fg-Music_F_1000_onoff | Bg-Music_F_500_on | Bg-Music_F_500_onoff | Bg-Music_F_1000_on | Bg-Music_F_1000_onoff | No-Music_F_500_on | No-Music_F_500_onoff | No-Music_F_1000_on | Speech_F_1000_onoff |
---|---|---|---|---|---|---|---|---|---|---|---|---|
MMG2 | 0.3298 | 0.1775 | 0.4106 | 0.2742 | 0.3853 | 0.1388 | 0.4463 | 0.2024 | 0.5254 | 0.3123 | 0.5927 | 0.3925 |