2025:Symbolic Music Generation Results

From MIREX Wiki
Revision as of 00:43, 13 September 2025 by Zhaojw1998 (talk | contribs) (Results)

Submissions

Team Extended Abstract Methods
RWKV (Zhou-Zheng et al.) [1] RWKV
PixelGen [2] Hierarchical Transformer
MuseCoco (BL-1) [3] Transformer
Anticipatory Music Transformer (BL-2) [4] Transformer

Results

Team Subjective Evaluation
Coherecy ↑ Structure ↑ Creativity ↑ Musicality ↑
RWKV (Zhou-Zheng et al.) 3.57 ± 0.10a 3.58 ± 0.10a 3.26 ± 0.10a 3.50 ± 0.10a
PixelGen 2.39 ± 0.10c 2.37 ± 0.09c 2.85 ± 0.09b 2.48 ± 0.09c
MuseCoco (BL-1) 3.11 ± 0.10b 3.07 ± 0.09b 3.08 ± 0.09ab 2.95 ± 0.09b
Anticipatory Music Transformer (BL-2) 3.70 ± 0.10a 3.69 ± 0.09a 3.30 ± 0.10a 3.45 ± 0.10a


Evaluation Results

Results are reported in the form of mean ± sems (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test with Holm-Bonferroni correction.


Baseline Models

For MuseCoco, we use the *xlarge* model variant with 1.2 billion learnable parameters. For Anticipatory Music Transformer, we use the *Large* model variant with 780M learnable parameters.


Subjective Evaluation Details

A double-blind online survey was conducted to test music quality. Each model was anonymised, and for each test prompt, a sample was cherry-picked from 8 generated candidates. A total of 8 prompts of varied styles (pop, classical, and jazzy) were tested, resulting in an 8-page survey. The page order and the sample order within each page were both randomised.

Responses were collected from 20 participants with diverse music backgrounds. 14 participants completed all 8 pages with an average completion time of 32 minutes.