Revision as of 23:43, 12 September 2025

Submissions

Team	Extended Abstract	Methods
RWKV (Zhou-Zheng et al.)	[1]	RWKV
PixelGen	[2]	Hierarchical Transformer
MuseCoco (BL-1)	[3]	Transformer
Anticipatory Music Transformer (BL-2)	[4]	Transformer

Results

Team	Subjective Evaluation
Team	Coherecy ↑	Structure ↑	Creativity ↑	Musicality ↑
RWKV (Zhou-Zheng et al.)	3.57 ± 0.10^a	3.58 ± 0.10^a	3.26 ± 0.10^a	3.50 ± 0.10^a
PixelGen	2.39 ± 0.10^c	2.37 ± 0.09^c	2.85 ± 0.09^b	2.48 ± 0.09^c
MuseCoco (BL-1)	3.11 ± 0.10^b	3.07 ± 0.09^b	3.08 ± 0.09^ab	2.95 ± 0.09^b
Anticipatory Music Transformer (BL-2)	3.70 ± 0.10^a	3.69 ± 0.09^a	3.30 ± 0.10^a	3.45 ± 0.10^a

Evaluation Results

Results are reported in the form of mean ± sem^s (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test with Holm-Bonferroni correction.

Baseline Models

For MuseCoco, we use the *xlarge* model variant with 1.2 billion learnable parameters. For Anticipatory Music Transformer, we use the *Large* model variant with 780M learnable parameters.

Subjective Evaluation Details

A double-blind online survey was conducted to test music quality. Each model was anonymised, and for each test prompt, a sample was cherry-picked from 8 generated candidates. A total of 8 prompts of varied styles (pop, classical, and jazzy) were tested, resulting in an 8-page survey. The page order and the sample order within each page were both randomised.

Responses were collected from 20 participants with diverse music backgrounds. 14 participants completed all 8 pages with an average completion time of 32 minutes.

@@ Line 64: / Line 64: @@
-'''Notes on Evaluation Results'''
+'''Evaluation Results'''
 Results are reported in the form of mean ± sem<sup>s</sup> (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test with Holm-Bonferroni correction.
-'''Notes on Baseline Models'''
+'''Baseline Models'''
 For MuseCoco, we use the *xlarge* model variant with 1.2 billion learnable parameters. For Anticipatory Music Transformer, we use the *Large* model variant with 780M learnable parameters.
-'''Subjective Evaluation Details''': Each test sample was cherry-picked from 8 samples generated from the corresponding prompt. A total of 6 prompts of varied styles (Pop, Classical, and Jazz) were tested, resulting in a 6-page survey. Responses were collected from 20 participants with diverse music backgrounds.
+'''Subjective Evaluation Details'''
+A double-blind online survey was conducted to test music quality. Each model was anonymised, and for each test prompt, a sample was cherry-picked from 8 generated candidates. A total of 8 prompts of varied styles (pop, classical, and jazzy) were tested, resulting in an 8-page survey. The page order and the sample order within each page were both randomised.
+Responses were collected from 20 participants with diverse music backgrounds. 14 participants completed all 8 pages with an average completion time of 32 minutes.

Difference between revisions of "2025:Symbolic Music Generation Results"

Revision as of 23:43, 12 September 2025

Submissions

Results

Navigation menu

Views

Personal tools

MIREX by Year

Results by Year

Account Request

Search

Navigation

Tools