2005:Audio Genre Classification Results

Introduction

Goal

To classify polyphonic music audio (in PCM format) into genre categories.

Dataset

Two sets of data were used: Magnatune and USPOP. The Magnatune dataset has a hierarchical genre taxonomy, while the USPOP categories are at a single level. The audio sampling rates used were either 44.1 KHz or 22.05 KHz (mono). More data information is in the following table:

Dataset	Size (@ 44.1 KHz)	Number of Training Files	Number of Testing Files
Magnatune	34.3 GB	1005	510
USPOP	28.4 GB	940	474

Results

Overall

OVERALL
Rank	Participant	Mean of Magnatune Hierarchical Classification Accuracy and USPOP Raw Classification Accuracy
1	Bergstra, Casagrande & Eck (2)	82.34%
2	Bergstra, Casagrande & Eck (1)	81.77%
3	Mandel & Ellis	78.81%
4	West, K.	75.29%
5	Lidy & Rauber (SSD+RH)	75.27%
6	Pampalk, E.	75.14%
7	Lidy & Rauber (RP+SSD)	74.78%
8	Lidy & Rauber (RP+SSD+RH)	74.58%
9	Scaringella, N.	73.11%
10	Ahrendt, P.	71.55%
11	Burred, J.	62.63%
12	Soares, V.	60.98%
13	Tzanetakis, G.	60.72%

Magnatune Dataset

Magnatune Dataset
Rank	Participant	Hierarchical Classification Accuracy	Normalized Hierarchical Classification Accuracy	Raw Classification Accuracy	Normalized Raw Classification Accuracy	Runtime (s)	Machine	Confusion Matrix Files
1	Bergstra, Casagrande & Eck (2)	77.75%	73.04%	75.10%	69.49%	--	--	[1]
2	Bergstra, Casagrande & Eck (1)	77.25%	72.13%	74.71%	68.73%	23400	B0	BCE_1_MTeval.txt
3	Mandel & Ellis	71.96%	69.63%	67.65%	63.99%	8729	R	ME_MTeval.txt
4	West, K.	71.67%	68.33%	68.43%	63.87%	43327	B4	W_MTeval.txt
5	Lidy & Rauber (RP+SSD)	71.08%	70.90%	67.65%	66.85%	6372	B1	LR_RP+SSD_MTeval.txt
6	Lidy & Rauber (RP+SSD+RH)	70.88%	70.52%	67.25%	66.27%	6372	B1	LR_RP+SSD+RH_MTeval.txt
7	Lidy & Rauber (SSD+RH)	70.78%	69.31%	67.65%	65.54%	6372	B1	LR_SSD+RH_MTeval.txt
8	Scaringella, N.	70.47%	72.30%	66.14%	67.12%	22740	G	SN_MTeval.txt
9	Pampalk, E.	69.90%	70.91%	66.47%	66.26%	3312	B0	P_MTeval.txt
10	Ahrendt, P.	64.61%	61.40%	60.98%	57.15%	4920	B1	A_MTeval.txt
11	Burred, J.	59.22%	61.96%	54.12%	55.68%	12483	B2	B_MTeval.txt
12	Tzanetakis, G.	58.14%	53.47%	55.49%	50.39%	1312	B0	T_MTeval.txt
13	Soares, V.	55.29%	60.73%	49.41%	53.54%	23880	Y	SV_MTeval.txt
14	Li, M.	TO *	--	--	--	--	--	--
15	Chen & Gao	DNC *	--	--	--	--	--	--

USPOP Dataset

USPOP Dataset
Rank	Participant	Raw Classification Accuracy	Normalized Raw Classification Accuracy	Runtime (s)	Machine	Confusion Matrix Files
1	Bergstra, Casagrande & Eck (2)	86.92%	82.91%			BCE_2_USeval.txt
2	Bergstra, Casagrande & Eck (1)	86.29%	82.50%	23400	B0	BCE_1_USeval.txt
3	Mandel & Ellis	85.65%	76.91%	7856	R	ME_USeval.txt
4	Pampalk, E.	80.38%	78.74%	3090	B0	P_USeval.txt
5	Lidy & Rauber (SSD+RH)	79.75%	75.45%	5164	B1	LR_SSD+RH_USeval.txt
6	West, K.	78.90%	74.67%	18557	B4	W_USeval.txt
7	Lidy & Rauber (RP+SSD)	78.48%	77.62%	5164	B1	LR_RP+SSD_USeval.txt
8	Ahrendt, P.	78.48%	73.23%	9702	B1	A_USeval.txt
9	Lidy & Rauber (RP+SSD+RH)	78.27%	76.84%	5194	B1	LR_RP+SSD+RH_USeval.txt
10	Scaringella, N.	75.74%	77.67%	24606	G	SN_USeval.txt
11	Soares, V.	66.67%	67.28%	14369	Y	SV_USeval.txt
12	Burred, J.	66.03%	72.50%	9233	B2	B_USeval.txt
13	Tzanetakis, G.	63.29%	50.19%	1320	B0	T_USeval.txt
14	Chen & Gao	22.93%	17.96%	N/A	Y	CG_USeval.txt
15	Li, M.	TO *

2005:Audio Genre Classification Results

Contents

Introduction

Goal

Dataset

Results

Overall

Magnatune Dataset

USPOP Dataset

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools