Difference between revisions of "2005:Audio Genre Classification Results"
From MIREX Wiki
(→USPOP Dataset) |
|||
(14 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | + | ==Introduction== | |
− | + | ===Goal=== | |
+ | To classify polyphonic music audio (in PCM format) into genre categories. | ||
− | {| border="1" | + | ===Dataset=== |
+ | Two sets of data were used: Magnatune and USPOP. The Magnatune dataset has a hierarchical genre taxonomy, while the USPOP categories are at a single level. The audio sampling rates used were either 44.1 KHz or 22.05 KHz (mono). More data information is in the following table: | ||
+ | |||
+ | {| border="1" cellspacing="0" | ||
|- style="background: yellow; text-align: center;" | |- style="background: yellow; text-align: center;" | ||
! Dataset !! Size (@ 44.1 KHz) !! Number of Training Files !! Number of Testing Files | ! Dataset !! Size (@ 44.1 KHz) !! Number of Training Files !! Number of Testing Files | ||
Line 13: | Line 17: | ||
|} | |} | ||
− | + | ==Results== | |
− | {| border="1" | + | |
+ | ===Overall=== | ||
+ | {| border="1" cellspacing="0" | ||
|- style="background: yellow; text-align: center;" | |- style="background: yellow; text-align: center;" | ||
! colspan="3" | OVERALL | ! colspan="3" | OVERALL | ||
|-style="background: yellow;" | |-style="background: yellow;" | ||
− | ! Rank !! Participant !! Mean of Magnatune Hierarchical Classification <br> | + | ! Rank !! Participant !! Mean of Magnatune Hierarchical Classification Accuracy <br> and USPOP Raw Classification Accuracy |
|- | |- | ||
− | | 1 || Bergstra, Casagrande & Eck (2) || 82.34% | + | | 1 || [https://www.music-ir.org/mirex/abstracts/2005/bergstra.pdf Bergstra, Casagrande & Eck (2)] || 82.34% |
|- | |- | ||
− | | 2 || Bergstra, Casagrande & Eck (1) || 81.77% | + | | 2 || [https://www.music-ir.org/mirex/abstracts/2005/bergstra.pdf Bergstra, Casagrande & Eck (1)] || 81.77% |
|- | |- | ||
− | | 3 || Mandel & Ellis || 78.81% | + | | 3 || [https://www.music-ir.org/mirex/abstracts/2005/mandel.pdf Mandel & Ellis] || 78.81% |
|- | |- | ||
− | | 4 || West, K. || 75.29% | + | | 4 || [https://www.music-ir.org/mirex/abstracts/2005/west.pdf West, K.] || 75.29% |
|- | |- | ||
− | | 5 || Lidy & Rauber (SSD+RH) || 75.27% | + | | 5 || [https://www.music-ir.org/mirex/abstracts/2005/lidy.pdf Lidy & Rauber (SSD+RH)] || 75.27% |
|- | |- | ||
− | | 6 || Pampalk, E. || 75.14% | + | | 6 || [https://www.music-ir.org/mirex/abstracts/2005/pampalk.pdf Pampalk, E.] || 75.14% |
|- | |- | ||
− | | 7 || Lidy & Rauber (RP+SSD) || 74.78% | + | | 7 || [https://www.music-ir.org/mirex/abstracts/2005/lidy.pdf Lidy & Rauber (RP+SSD)] || 74.78% |
|- | |- | ||
− | | 8 || Lidy & Rauber (RP+SSD+RH) || 74.58% | + | | 8 || [https://www.music-ir.org/mirex/abstracts/2005/lidy.pdf Lidy & Rauber (RP+SSD+RH)] || 74.58% |
|- | |- | ||
− | | 9 || Scaringella, N. || 73.11% | + | | 9 || [https://www.music-ir.org/mirex/abstracts/2005/scaringella.pdf Scaringella, N.] || 73.11% |
|- | |- | ||
− | | 10 || Ahrendt, P. || 71.55% | + | | 10 || [https://www.music-ir.org/mirex/abstracts/2005/ahrendt.pdf Ahrendt, P.] || 71.55% |
|- | |- | ||
− | | 11 || Burred, J. || 62.63% | + | | 11 || [https://www.music-ir.org/mirex/abstracts/2005/burred.pdf Burred, J.] || 62.63% |
|- | |- | ||
− | | 12 || Soares, V. || 60.98% | + | | 12 || [https://www.music-ir.org/mirex/abstracts/2005/soares.pdf Soares, V.] || 60.98% |
|- | |- | ||
− | | 13 || Tzanetakis, G. || 60.72% | + | | 13 || [https://www.music-ir.org/mirex/abstracts/2005/tzanetakis.pdf Tzanetakis, G.] || 60.72% |
|- | |- | ||
|} | |} | ||
− | + | ===Magnatune Dataset=== | |
− | + | {| border="1" cellspacing="0" | |
− | {| border="1" | ||
|- style="background: yellow; text-align: center;" | |- style="background: yellow; text-align: center;" | ||
− | ! colspan=" | + | ! colspan="9" | Magnatune Dataset |
|-style="background: yellow;" | |-style="background: yellow;" | ||
! Rank !! Participant !! Hierarchical Classification Accuracy !! Normalized Hierarchical Classification Accuracy !! Raw Classification Accuracy !! Normalized Raw Classification Accuracy !! Runtime (s) !! Machine !! Confusion Matrix Files | ! Rank !! Participant !! Hierarchical Classification Accuracy !! Normalized Hierarchical Classification Accuracy !! Raw Classification Accuracy !! Normalized Raw Classification Accuracy !! Runtime (s) !! Machine !! Confusion Matrix Files | ||
|- | |- | ||
− | | 1 || Bergstra, Casagrande & Eck (2) || 77.75% || 73.04% || 75.10% || 69.49% || -- || -- || BCE_2_MTeval.txt | + | | 1 || Bergstra, Casagrande & Eck (2) || 77.75% || 73.04% || 75.10% || 69.49% || -- || -- || [https://www.music-ir.org/mirex/results/2005/audio-genre/BCE_2_MTeval.txt BCE_2_MTeval.txt] |
|- | |- | ||
− | | 2 || Bergstra, Casagrande & Eck (1) || 77.25% || 72.13% || 74.71% || 68.73% || 23400 || B0 || BCE_1_MTeval.txt | + | | 2 || Bergstra, Casagrande & Eck (1) || 77.25% || 72.13% || 74.71% || 68.73% || 23400 || B0 || [https://www.music-ir.org/mirex/results/2005/audio-genre/BCE_1_MTeval.txt BCE_1_MTeval.txt] |
|- | |- | ||
− | | 3 || Mandel & Ellis ||71.96%|| 69.63%|| 67.65% ||63.99%|| 8729 ||R ||ME_MTeval.txt | + | | 3 || Mandel & Ellis ||71.96%|| 69.63%|| 67.65% ||63.99%|| 8729 ||R ||[https://www.music-ir.org/mirex/results/2005/audio-genre/ME_MTeval.txt ME_MTeval.txt] |
|- | |- | ||
− | | 4 ||West, K.|| 71.67%|| 68.33% ||68.43%|| 63.87%|| 43327 ||B4|| W_MTeval.txt | + | | 4 ||West, K.|| 71.67%|| 68.33% ||68.43%||63.87%||43327 ||B4|| [https://www.music-ir.org/mirex/results/2005/audio-genre/W_MTeval.txt W_MTeval.txt] |
|- | |- | ||
− | | 5 || Lidy & Rauber (RP+SSD) || 71.08% || 70.90% || 67.65% ||66.85% ||6372 ||B1|| LR_RP+SSD_MTeval.txt | + | | 5 || Lidy & Rauber (RP+SSD) || 71.08% || 70.90%|| 67.65%||66.85%||6372||B1||[https://www.music-ir.org/mirex/results/2005/audio-genre/LR_RP+SSD_MTeval.txt LR_RP+SSD_MTeval.txt] |
|- | |- | ||
− | | 6 || Lidy & Rauber (RP+SSD+RH) || 70.88% || 70.52% || 67.25% || 66.27% || 6372 || B1 || LR_RP+SSD+RH_MTeval.txt | + | | 6 || Lidy & Rauber (RP+SSD+RH) || 70.88% ||70.52% ||67.25% ||66.27% ||6372 ||B1 ||[https://www.music-ir.org/mirex/results/2005/audio-genre/LR_RP+SSD+RH_MTeval.txt LR_RP+SSD+RH_MTeval.txt] |
|- | |- | ||
− | | 7 || Lidy & Rauber (SSD+RH) || 70.78% || 69.31% || 67.65% || 65.54% || 6372 || B1 || LR_SSD+RH_MTeval.txt | + | | 7 || Lidy & Rauber (SSD+RH) ||70.78% ||69.31% ||67.65% ||65.54% || 6372 || B1 ||[https://www.music-ir.org/mirex/results/2005/audio-genre/LR_SSD+RH_MTeval.txt LR_SSD+RH_MTeval.txt] |
|- | |- | ||
− | | 8 || Scaringella, N. || 70.47% || 72.30% || 66.14% || 67.12% || 22740 || G || SN_MTeval.txt | + | | 8 || Scaringella, N. || 70.47% || 72.30%|| 66.14%|| 67.12%|| 22740|| G|| [https://www.music-ir.org/mirex/results/2005/audio-genre/SN_MTeval.txt SN_MTeval.txt] |
|- | |- | ||
− | | 9 || Pampalk, E. || 69.90% || 70.91% || 66.47% || 66.26% || 3312 || B0 || P_MTeval.txt | + | | 9 || Pampalk, E. || 69.90% || 70.91% || 66.47% || 66.26% || 3312 || B0 || [https://www.music-ir.org/mirex/results/2005/audio-genre/P_MTeval.txt P_MTeval.txt] |
|- | |- | ||
− | | 10 || Ahrendt, P. || 64.61% || 61.40% || 60.98% || 57.15% || 4920 || B1 || A_MTeval.txt | + | | 10 || Ahrendt, P. || 64.61% || 61.40% || 60.98% || 57.15% || 4920 || B1 || [https://www.music-ir.org/mirex/results/2005/audio-genre/A_MTeval.txt A_MTeval.txt] |
|- | |- | ||
− | | 11 || Burred, J. || 59.22% || 61.96% || 54.12% || 55.68% || 12483 || B2 || B_MTeval.txt | + | | 11 || Burred, J. || 59.22% || 61.96% || 54.12% || 55.68% || 12483 || B2 ||[https://www.music-ir.org/mirex/results/2005/audio-genre/B_MTeval.txt B_MTeval.txt] |
|- | |- | ||
− | | 12 || Tzanetakis, G. || 58.14% || 53.47% || 55.49% || 50.39% || 1312 || B0 || T_MTeval.txt | + | | 12 || Tzanetakis, G. || 58.14% || 53.47% || 55.49% || 50.39% || 1312 || B0 || [https://www.music-ir.org/mirex/results/2005/audio-genre/T_MTeval.txt T_MTeval.txt] |
|- | |- | ||
− | | 13 || Soares, V. || 55.29% || 60.73% || 49.41% || 53.54% || 23880 || Y || SV_MTeval.txt | + | | 13 || Soares, V. || 55.29% || 60.73% || 49.41% || 53.54% || 23880 || Y ||[https://www.music-ir.org/mirex/results/2005/audio-genre/SV_MTeval.txt SV_MTeval.txt] |
|- | |- | ||
− | | 14 || Li, M. || TO * || -- || -- || -- || -- || -- || -- | + | | 14 || Li, M. || TO * || -- || -- || -- || -- || -- || -- |
|- | |- | ||
− | | 15 || Chen & Gao || DNC * || -- || -- || -- || -- || -- || -- | + | | 15 || Chen & Gao || DNC * || -- || -- || -- || -- || -- || -- |
|- | |- | ||
+ | |} | ||
+ | |||
+ | ===USPOP Dataset=== | ||
+ | {| border="1" cellspacing="0" | ||
+ | |- style="background: yellow; text-align: center;" | ||
+ | ! colspan="7" | USPOP Dataset | ||
+ | |-style="background: yellow;" | ||
+ | |---- | ||
+ | !Rank | ||
+ | !Participant | ||
+ | !Raw Classification Accuracy | ||
+ | !Normalized Raw Classification Accuracy | ||
+ | !Runtime (s) | ||
+ | !Machine | ||
+ | !Confusion Matrix Files | ||
+ | |---- | ||
+ | |1 | ||
+ | |Bergstra, Casagrande & Eck (2) | ||
+ | |86.92% | ||
+ | |82.91% | ||
+ | | | ||
+ | | | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/BCE_2_USeval.txt BCE_2_USeval.txt] | ||
+ | |---- | ||
+ | |2 | ||
+ | |Bergstra, Casagrande & Eck (1) | ||
+ | |86.29% | ||
+ | |82.50% | ||
+ | |23400 | ||
+ | |B0 | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/BCE_1_USeval.txt BCE_1_USeval.txt] | ||
+ | |---- | ||
+ | |3 | ||
+ | |Mandel & Ellis | ||
+ | |85.65% | ||
+ | |76.91% | ||
+ | |7856 | ||
+ | |R | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/ME_USeval.txt ME_USeval.txt] | ||
+ | |---- | ||
+ | |4 | ||
+ | |Pampalk, E. | ||
+ | |80.38% | ||
+ | |78.74% | ||
+ | |3090 | ||
+ | |B0 | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/P_USeval.txt P_USeval.txt] | ||
+ | |---- | ||
+ | |5 | ||
+ | |Lidy & Rauber (SSD+RH) | ||
+ | |79.75% | ||
+ | |75.45% | ||
+ | |5164 | ||
+ | |B1 | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/LR_SSD+RH_USeval.txt LR_SSD+RH_USeval.txt] | ||
+ | |---- | ||
+ | |6 | ||
+ | |West, K. | ||
+ | |78.90% | ||
+ | |74.67% | ||
+ | |18557 | ||
+ | |B4 | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/W_USeval.txt W_USeval.txt] | ||
+ | |---- | ||
+ | |7 | ||
+ | |Lidy & Rauber (RP+SSD) | ||
+ | |78.48% | ||
+ | |77.62% | ||
+ | |5164 | ||
+ | |B1 | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/LR_RP+SSD_USeval.txt LR_RP+SSD_USeval.txt] | ||
+ | |---- | ||
+ | |8 | ||
+ | |Ahrendt, P. | ||
+ | |78.48% | ||
+ | |73.23% | ||
+ | |9702 | ||
+ | |B1 | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/A_USeval.txt A_USeval.txt] | ||
+ | |---- | ||
+ | |9 | ||
+ | |Lidy & Rauber (RP+SSD+RH) | ||
+ | |78.27% | ||
+ | |76.84% | ||
+ | |5194 | ||
+ | |B1 | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/LR_RP+SSD+RH_USeval.txt LR_RP+SSD+RH_USeval.txt] | ||
+ | |---- | ||
+ | |10 | ||
+ | |Scaringella, N. | ||
+ | |75.74% | ||
+ | |77.67% | ||
+ | |24606 | ||
+ | |G | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/SN_USeval.txt SN_USeval.txt] | ||
+ | |---- | ||
+ | |11 | ||
+ | |Soares, V. | ||
+ | |66.67% | ||
+ | |67.28% | ||
+ | |14369 | ||
+ | |Y | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/SV_USeval.txt SV_USeval.txt] | ||
+ | |---- | ||
+ | |12 | ||
+ | |Burred, J. | ||
+ | |66.03% | ||
+ | |72.50% | ||
+ | |9233 | ||
+ | |B2 | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/B_USeval.txt B_USeval.txt] | ||
+ | |---- | ||
+ | |13 | ||
+ | |Tzanetakis, G. | ||
+ | |63.29% | ||
+ | |50.19% | ||
+ | |1320 | ||
+ | |B0 | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/T_USeval.txt T_USeval.txt] | ||
+ | |---- | ||
+ | |14 | ||
+ | |Chen & Gao | ||
+ | |22.93% | ||
+ | |17.96% | ||
+ | |N/A | ||
+ | |Y | ||
+ | |[https://www.music-ir.org/mirex/results/2005/audio-genre/CG_USeval.txt CG_USeval.txt] | ||
+ | |---- | ||
+ | |15 | ||
+ | |Li, M. | ||
+ | |TO * | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | | | ||
+ | |---- | ||
|} | |} |
Latest revision as of 10:41, 2 August 2010
Contents
Introduction
Goal
To classify polyphonic music audio (in PCM format) into genre categories.
Dataset
Two sets of data were used: Magnatune and USPOP. The Magnatune dataset has a hierarchical genre taxonomy, while the USPOP categories are at a single level. The audio sampling rates used were either 44.1 KHz or 22.05 KHz (mono). More data information is in the following table:
Dataset | Size (@ 44.1 KHz) | Number of Training Files | Number of Testing Files |
---|---|---|---|
Magnatune | 34.3 GB | 1005 | 510 |
USPOP | 28.4 GB | 940 | 474 |
Results
Overall
OVERALL | ||
---|---|---|
Rank | Participant | Mean of Magnatune Hierarchical Classification Accuracy and USPOP Raw Classification Accuracy |
1 | Bergstra, Casagrande & Eck (2) | 82.34% |
2 | Bergstra, Casagrande & Eck (1) | 81.77% |
3 | Mandel & Ellis | 78.81% |
4 | West, K. | 75.29% |
5 | Lidy & Rauber (SSD+RH) | 75.27% |
6 | Pampalk, E. | 75.14% |
7 | Lidy & Rauber (RP+SSD) | 74.78% |
8 | Lidy & Rauber (RP+SSD+RH) | 74.58% |
9 | Scaringella, N. | 73.11% |
10 | Ahrendt, P. | 71.55% |
11 | Burred, J. | 62.63% |
12 | Soares, V. | 60.98% |
13 | Tzanetakis, G. | 60.72% |
Magnatune Dataset
Magnatune Dataset | ||||||||
---|---|---|---|---|---|---|---|---|
Rank | Participant | Hierarchical Classification Accuracy | Normalized Hierarchical Classification Accuracy | Raw Classification Accuracy | Normalized Raw Classification Accuracy | Runtime (s) | Machine | Confusion Matrix Files |
1 | Bergstra, Casagrande & Eck (2) | 77.75% | 73.04% | 75.10% | 69.49% | -- | -- | BCE_2_MTeval.txt |
2 | Bergstra, Casagrande & Eck (1) | 77.25% | 72.13% | 74.71% | 68.73% | 23400 | B0 | BCE_1_MTeval.txt |
3 | Mandel & Ellis | 71.96% | 69.63% | 67.65% | 63.99% | 8729 | R | ME_MTeval.txt |
4 | West, K. | 71.67% | 68.33% | 68.43% | 63.87% | 43327 | B4 | W_MTeval.txt |
5 | Lidy & Rauber (RP+SSD) | 71.08% | 70.90% | 67.65% | 66.85% | 6372 | B1 | LR_RP+SSD_MTeval.txt |
6 | Lidy & Rauber (RP+SSD+RH) | 70.88% | 70.52% | 67.25% | 66.27% | 6372 | B1 | LR_RP+SSD+RH_MTeval.txt |
7 | Lidy & Rauber (SSD+RH) | 70.78% | 69.31% | 67.65% | 65.54% | 6372 | B1 | LR_SSD+RH_MTeval.txt |
8 | Scaringella, N. | 70.47% | 72.30% | 66.14% | 67.12% | 22740 | G | SN_MTeval.txt |
9 | Pampalk, E. | 69.90% | 70.91% | 66.47% | 66.26% | 3312 | B0 | P_MTeval.txt |
10 | Ahrendt, P. | 64.61% | 61.40% | 60.98% | 57.15% | 4920 | B1 | A_MTeval.txt |
11 | Burred, J. | 59.22% | 61.96% | 54.12% | 55.68% | 12483 | B2 | B_MTeval.txt |
12 | Tzanetakis, G. | 58.14% | 53.47% | 55.49% | 50.39% | 1312 | B0 | T_MTeval.txt |
13 | Soares, V. | 55.29% | 60.73% | 49.41% | 53.54% | 23880 | Y | SV_MTeval.txt |
14 | Li, M. | TO * | -- | -- | -- | -- | -- | -- |
15 | Chen & Gao | DNC * | -- | -- | -- | -- | -- | -- |
USPOP Dataset
USPOP Dataset | ||||||
---|---|---|---|---|---|---|
Rank | Participant | Raw Classification Accuracy | Normalized Raw Classification Accuracy | Runtime (s) | Machine | Confusion Matrix Files |
1 | Bergstra, Casagrande & Eck (2) | 86.92% | 82.91% | BCE_2_USeval.txt | ||
2 | Bergstra, Casagrande & Eck (1) | 86.29% | 82.50% | 23400 | B0 | BCE_1_USeval.txt |
3 | Mandel & Ellis | 85.65% | 76.91% | 7856 | R | ME_USeval.txt |
4 | Pampalk, E. | 80.38% | 78.74% | 3090 | B0 | P_USeval.txt |
5 | Lidy & Rauber (SSD+RH) | 79.75% | 75.45% | 5164 | B1 | LR_SSD+RH_USeval.txt |
6 | West, K. | 78.90% | 74.67% | 18557 | B4 | W_USeval.txt |
7 | Lidy & Rauber (RP+SSD) | 78.48% | 77.62% | 5164 | B1 | LR_RP+SSD_USeval.txt |
8 | Ahrendt, P. | 78.48% | 73.23% | 9702 | B1 | A_USeval.txt |
9 | Lidy & Rauber (RP+SSD+RH) | 78.27% | 76.84% | 5194 | B1 | LR_RP+SSD+RH_USeval.txt |
10 | Scaringella, N. | 75.74% | 77.67% | 24606 | G | SN_USeval.txt |
11 | Soares, V. | 66.67% | 67.28% | 14369 | Y | SV_USeval.txt |
12 | Burred, J. | 66.03% | 72.50% | 9233 | B2 | B_USeval.txt |
13 | Tzanetakis, G. | 63.29% | 50.19% | 1320 | B0 | T_USeval.txt |
14 | Chen & Gao | 22.93% | 17.96% | N/A | Y | CG_USeval.txt |
15 | Li, M. | TO * |