Difference between revisions of "2018:Music and or Speech Detection Results"
From MIREX Wiki
| Line 51: | Line 51: | ||
<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class> | <class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class> | ||
| + | |||
| + | ==Datasets description== | ||
| + | |||
| + | [https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description] | ||
==Task 1: Music Detection== | ==Task 1: Music Detection== | ||
===Dataset 1=== | ===Dataset 1=== | ||
| − | |||
| − | |||
====Segment-level Evaluation==== | ====Segment-level Evaluation==== | ||
| Line 92: | Line 94: | ||
|} | |} | ||
| − | ===Event-level Evaluation=== | + | ====Event-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
| Line 127: | Line 129: | ||
|} | |} | ||
| − | + | ===Dataset 2=== | |
| − | ===Segment-level Evaluation=== | + | ====Segment-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
| Line 163: | Line 165: | ||
|} | |} | ||
| − | ===Event-level Evaluation=== | + | ====Event-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
| Line 200: | Line 202: | ||
==Task 2: Speech Detection== | ==Task 2: Speech Detection== | ||
| − | + | ===Dataset 1=== | |
| − | ===Segment-level Evaluation=== | + | ====Segment-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
| Line 230: | Line 232: | ||
|} | |} | ||
| − | ===Event-level Evaluation=== | + | ====Event-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
| Line 259: | Line 261: | ||
|} | |} | ||
| − | + | ===Dataset 2=== | |
| − | ===Segment-level Evaluation=== | + | ====Segment-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
| Line 289: | Line 291: | ||
|} | |} | ||
| − | ===Event-level Evaluation=== | + | ====Event-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
| Line 319: | Line 321: | ||
==Task 3: Music and Speech Detection== | ==Task 3: Music and Speech Detection== | ||
| − | + | ===Dataset 1=== | |
| − | ===Segment-level Evaluation=== | + | ====Segment-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
| Line 342: | Line 344: | ||
|} | |} | ||
| − | ===Event-level Evaluation=== | + | ====Event-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
| Line 369: | Line 371: | ||
|} | |} | ||
| − | + | ===Dataset 2=== | |
| − | ===Segment-level Evaluation=== | + | ====Segment-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
| Line 392: | Line 394: | ||
|} | |} | ||
| − | ===Event-level Evaluation=== | + | ====Event-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
| Line 421: | Line 423: | ||
==Task 4: Music Relative Loudness Estimation== | ==Task 4: Music Relative Loudness Estimation== | ||
| − | + | ===Dataset 1=== | |
| − | ===Segment-level Evaluation=== | + | ====Segment-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
| Line 437: | Line 439: | ||
|} | |} | ||
| − | ===Event-level Evaluation=== | + | ====Event-level Evaluation==== |
{| border="1" cellspacing="0" style="text-align: left; width: 240px;" | {| border="1" cellspacing="0" style="text-align: left; width: 240px;" | ||
Revision as of 03:34, 19 September 2018
Contents
Introduction
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the 2018:Music and/or Speech Detection page.
General Legend
| Sub code | Abstract | Contributors |
|---|---|---|
| DD1 | David Doukhan | |
| JHKK1 | Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon | |
| JHKK2 | Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon | |
| JHKK3 | Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon | |
| LN1 | Minsuk Choi, Jongpil Lee, Juhan Nam | |
| MM1 | Matija Marolt | |
| MM2 | Matija Marolt | |
| MM3 | Matija Marolt | |
| MMG1 | Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez | |
| MMG2 | Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez |
Statistics notation
<class>_F = segment-level F-measure for the class <class>
<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>
<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>
<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>
<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>
Datasets description
Task 1: Music Detection
Dataset 1
Segment-level Evaluation
| Sub code | Accuracy | Music_F | No-Music_F |
|---|---|---|---|
| DD1 | 0.6860 | 0.5424 | 0.7611 |
| JHKK1 | 0.7798 | 0.7123 | 0.8215 |
| JHKK2 | 0.8005 | 0.7415 | 0.8375 |
| LN1 | 0.6251 | 0.5022 | 0.6987 |
| MM1 | 0.6135 | 0.3899 | 0.7172 |
| MM2 | 0.6807 | 0.5478 | 0.7531 |
| MM3 | 0.6075 | 0.3124 | 0.7254 |
| MMG1 | 0.9049 | 0.8996 | 0.9097 |
Event-level Evaluation
| Sub code | Music_F_500_on | Music_F_500_onoff | Music_F_1000_on | Music_F_1000_onoff |
|---|---|---|---|---|
| DD1 | 0.2877 | 0.093 | 0.312 | 0.1142 |
| JHKK1 | 0.2303 | 0.0765 | 0.294 | 0.1173 |
| JHKK2 | 0.2522 | 0.0931 | 0.3245 | 0.1389 |
| LN1 | 0.1348 | 0.0139 | 0.1704 | 0.0231 |
| MM1 | 0.2044 | 0.0662 | 0.2137 | 0.0831 |
| MM2 | 0.2464 | 0.0817 | 0.2736 | 0.1049 |
| MM3 | 0.1379 | 0.0525 | 0.1619 | 0.0676 |
| MMG1 | 0.5177 | 0.2693 | 0.5813 | 0.3502 |
Dataset 2
Segment-level Evaluation
| Sub code | Accuracy | Music_F | No-Music_F |
|---|---|---|---|
| DD1 | 0.9257 | 0.9334 | 0.9162 |
| JHKK1 | 0.9415 | 0.9487 | 0.9318 |
| JHKK2 | 0.9153 | 0.9309 | 0.8907 |
| LN1 | 0.7814 | 0.8053 | 0.7499 |
| MM1 | 0.915 | 0.9228 | 0.9054 |
| MM2 | 0.9032 | 0.9158 | 0.8859 |
| MM3 | 0.8725 | 0.8791 | 0.8652 |
| MMG1 | 0.9025 | 0.9223 | 0.8691 |
Event-level Evaluation
| Sub code | Music_F_500_on | Music_F_500_onoff | Music_F_1000_on | Music_F_1000_onoff |
|---|---|---|---|---|
| DD1 | 0.4089 | 0.2235 | 0.4402 | 0.248 |
| JHKK1 | 0.1659 | 0.0347 | 0.2334 | 0.0636 |
| JHKK2 | 0.167 | 0.029 | 0.2015 | 0.0599 |
| LN1 | 0.0991 | 0.0228 | 0.1319 | 0.0428 |
| MM1 | 0.1412 | 0.0159 | 0.1843 | 0.0392 |
| MM2 | 0.1540 | 0.0312 | 0.231 | 0.0791 |
| MM3 | 0.1516 | 0.0223 | 0.1962 | 0.0535 |
| MMG1 | 0.1358 | 0.0173 | 0.1936 | 0.0347 |
Task 2: Speech Detection
Dataset 1
Segment-level Evaluation
| Sub code | Accuracy | Speech_F | No-Speech_F |
|---|---|---|---|
| DD1 | 0.877 | 0.9186 | 0.7493 |
| JHKK3 | 0.8307 | 0.8795 | 0.7143 |
| LN1 | 0.6908 | 0.7472 | 0.6007 |
| MM1 | 0.8626 | 0.9115 | 0.6948 |
| MM2 | 0.8619 | 0.909 | 0.713 |
| MM3 | 0.8508 | 0.9086 | 0.5966 |
Event-level Evaluation
| Sub code | Speech_F_500_on | Speech_F_500_onoff | Speech_F_1000_on | Speech_F_1000_onoff |
|---|---|---|---|---|
| DD1 | 0.415 | 0.1603 | 0.4477 | 0.2122 |
| JHKK3 | 0.2882 | 0.0777 | 0.3289 | 0.0962 |
| LN1 | 0.2686 | 0.0529 | 0.3484 | 0.0883 |
| MM1 | 0.4607 | 0.2068 | 0.4898 | 0.2336 |
| MM2 | 0.4422 | 0.1999 | 0.5093 | 0.266 |
| MM3 | 0.4439 | 0.1775 | 0.4879 | 0.2122 |
Dataset 2
Segment-level Evaluation
| Sub code | Accuracy | Speech_F | No-Speech_F |
|---|---|---|---|
| DD1 | 0.9617 | 0.9583 | 0.9648 |
| JHKK3 | 0.8575 | 0.8305 | 0.8765 |
| LN1 | 0.8636 | 0.8314 | 0.885 |
| MM1 | 0.9367 | 0.9326 | 0.9405 |
| MM2 | 0.9226 | 0.914 | 0.9296 |
| MM3 | 0.8973 | 0.8973 | 0.8974 |
Event-level Evaluation
| Sub code | Speech_F_500_on | Speech_F_500_onoff | Speech_F_1000_on | Speech_F_1000_onoff | |||||
|---|---|---|---|---|---|---|---|---|---|
| DD1 | 0.6037 | 0.4139 | 0.6318 | 0.435 | |||||
| JHKK3 | 0.1585 | 0.0405 | 0.2095 | 0.0563 | |||||
| LN1 | 0.1775 | 0.0399 | 0.2426 | 0.0738 | |||||
| MM1 | 0.0632 | 0.0015 | 0.0947 | 0.0150 | |||||
| MM2 | 0.1162 | 0.0211 | 0.1737 | - | MM3 | 0.0796 | 0.0152 | 0.123 | 0.0281 |
Task 3: Music and Speech Detection
Dataset 1
Segment-level Evaluation
| Sub code | Music_F | Speech_F |
|---|---|---|
| LN1 | 0.4936 | 0.7718 |
| MM1 | 0.3899 | 0.9115 |
| MM2 | 0.5478 | 0.909 |
| MM3 | 0.3124 | 0.9086 |
Event-level Evaluation
| Sub code | Music_F_500_on | Music_F_500_onoff | Music_F_1000_on | Music_F_1000_onoff | Speech_F_500_on | Speech_F_500_onoff | Speech_F_1000_on | Speech_F_1000_onoff |
|---|---|---|---|---|---|---|---|---|
| LN1 | 0.1116 | 0.0088 | 0.1459 | 0.0186 | 0.2645 | 0.0462 | 0.348 | 0.0786 |
| MM1 | 0.2044 | 0.0662 | 0.2137 | 0.0831 | 0.4607 | 0.2068 | 0.4898 | 0.2336 |
| MM2 | 0.2464 | 0.0817 | 0.2736 | 0.1049 | 0.4422 | 0.1999 | 0.5093 | 0.266 |
| MM3 | 0.1379 | 0.0525 | 0.1619 | 0.0676 | 0.4439 | 0.1775 | 0.4879 | 0.2122 |
Dataset 2
Segment-level Evaluation
| Sub code | Music_F | Speech_F |
|---|---|---|
| LN1 | 0.7855 | 0.8455 |
| MM1 | 0.9228 | 0.9326 |
| MM2 | 0.9158 | 0.914 |
| MM3 | 0.8791 | 0.8973 |
Event-level Evaluation
| Sub code | Music_F_500_on | Music_F_500_onoff | Music_F_1000_on | Music_F_1000_onoff | Speech_F_500_on | Speech_F_500_onoff | Speech_F_1000_on | Speech_F_1000_onoff |
|---|---|---|---|---|---|---|---|---|
| LN1 | 0.087 | 0.0232 | 0.1133 | 0.0375 | 0.2233 | 0.0766 | 0.3148 | 0.1277 |
| MM1 | 0.1412 | 0.0157 | 0.1843 | 0.0392 | 0.0632 | 0.0015 | 0.0947 | 0.015 |
| MM2 | 0.154 | 0.0312 | 0.231 | 0.0791 | 0.1162 | 0.0211 | 0.1737 | 0.0469 |
| MM3 | 0.1516 | 0.0223 | 0.1962 | 0.0535 | 0.0796 | 0.0152 | 0.123 | 0.0281 |
Task 4: Music Relative Loudness Estimation
Dataset 1
Segment-level Evaluation
| Sub code | Accuracy | Fg-Music_F | Bg-Music_F | No-Music_F |
|---|---|---|---|---|
| MMG2 | 0.8615 | 0.788 | 0.821 | 0.9064 |
Event-level Evaluation
| Sub code | Fg-Music_F_500_on | Fg-Music_F_500_onoff | Fg-Music_F_1000_on | Fg-Music_F_1000_onoff | Bg-Music_F_500_on | Bg-Music_F_500_onoff | Bg-Music_F_1000_on | Bg-Music_F_1000_onoff | Speech_F_500_on | Speech_F_500_onoff | Speech_F_1000_on | Speech_F_1000_onoff |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| MMG2 | 0.3298 | 0.1775 | 0.4106 | 0.2742 | 0.3853 | 0.1388 | 0.4463 | 0.2024 | 0.5254 | 0.3123 | 0.5927 | 0.3925 |