2025:Symbolic Music Generation Results
Submissions
Team | Extended Abstract | Methods |
---|---|---|
RWKV (Zhou-Zheng et al.) | [1] | RWKV |
PixelGen | [2] | Hierarchical Transformer |
MuseCoco (BL-1) | [3] | Transformer |
Anticipatory Music Transformer (BL-2) | [4] | Transformer |
Results
Team | Subjective Evaluation | |||
---|---|---|---|---|
Coherecy ↑ | Structure ↑ | Creativity ↑ | Musicality ↑ | |
RWKV (Zhou-Zheng et al.) | 3.57 ± 0.10a | 3.58 ± 0.10a | 3.26 ± 0.10a | 3.50 ± 0.10a |
PixelGen | 2.39 ± 0.10c | 2.37 ± 0.09c | 2.85 ± 0.09b | 2.48 ± 0.09c |
MuseCoco (BL-1) | 3.11 ± 0.10b | 3.07 ± 0.09b | 3.08 ± 0.09ab | 2.95 ± 0.09b |
Anticipatory Music Transformer (BL-2) | 3.70 ± 0.10a | 3.69 ± 0.09a | 3.30 ± 0.10a | 3.45 ± 0.10a |
Evaluation Results
Results are reported in the form of mean ± sems (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test with Holm-Bonferroni correction.
Baseline Models
For MuseCoco, we use the xlarge model variant with 1.2 billion learnable parameters. For Anticipatory Music Transformer, we use the Large model variant with 780M learnable parameters.
Subjective Evaluation Details
A double-blind online survey was conducted to test music quality. Each model was anonymised, and for each test prompt, a sample was cherry-picked from 8 generated candidates. A total of 8 prompts of varied styles (pop, classical, and jazzy) were tested, resulting in an 8-page survey. The page order and the sample order within each page were both randomised.
Responses were collected from 20 participants with diverse music backgrounds. 14 participants completed all 8 pages with an average completion time of 32 minutes.