Difference between revisions of "2025:Symbolic Music Generation Results"
Zhaojw1998 (talk | contribs) (→Results) |
Zhaojw1998 (talk | contribs) (→Results) |
||
Line 64: | Line 64: | ||
− | ''' | + | '''Evaluation Results''' |
Results are reported in the form of mean ± sem<sup>s</sup> (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test with Holm-Bonferroni correction. | Results are reported in the form of mean ± sem<sup>s</sup> (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test with Holm-Bonferroni correction. | ||
− | ''' | + | '''Baseline Models''' |
For MuseCoco, we use the *xlarge* model variant with 1.2 billion learnable parameters. For Anticipatory Music Transformer, we use the *Large* model variant with 780M learnable parameters. | For MuseCoco, we use the *xlarge* model variant with 1.2 billion learnable parameters. For Anticipatory Music Transformer, we use the *Large* model variant with 780M learnable parameters. | ||
− | '''Subjective Evaluation Details''' | + | '''Subjective Evaluation Details''' |
+ | |||
+ | A double-blind online survey was conducted to test music quality. Each model was anonymised, and for each test prompt, a sample was cherry-picked from 8 generated candidates. A total of 8 prompts of varied styles (pop, classical, and jazzy) were tested, resulting in an 8-page survey. The page order and the sample order within each page were both randomised. | ||
+ | |||
+ | Responses were collected from 20 participants with diverse music backgrounds. 14 participants completed all 8 pages with an average completion time of 32 minutes. |
Revision as of 00:43, 13 September 2025
Submissions
Team | Extended Abstract | Methods |
---|---|---|
RWKV (Zhou-Zheng et al.) | [1] | RWKV |
PixelGen | [2] | Hierarchical Transformer |
MuseCoco (BL-1) | [3] | Transformer |
Anticipatory Music Transformer (BL-2) | [4] | Transformer |
Results
Team | Subjective Evaluation | |||
---|---|---|---|---|
Coherecy ↑ | Structure ↑ | Creativity ↑ | Musicality ↑ | |
RWKV (Zhou-Zheng et al.) | 3.57 ± 0.10a | 3.58 ± 0.10a | 3.26 ± 0.10a | 3.50 ± 0.10a |
PixelGen | 2.39 ± 0.10c | 2.37 ± 0.09c | 2.85 ± 0.09b | 2.48 ± 0.09c |
MuseCoco (BL-1) | 3.11 ± 0.10b | 3.07 ± 0.09b | 3.08 ± 0.09ab | 2.95 ± 0.09b |
Anticipatory Music Transformer (BL-2) | 3.70 ± 0.10a | 3.69 ± 0.09a | 3.30 ± 0.10a | 3.45 ± 0.10a |
Evaluation Results
Results are reported in the form of mean ± sems (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test with Holm-Bonferroni correction.
Baseline Models
For MuseCoco, we use the *xlarge* model variant with 1.2 billion learnable parameters. For Anticipatory Music Transformer, we use the *Large* model variant with 780M learnable parameters.
Subjective Evaluation Details
A double-blind online survey was conducted to test music quality. Each model was anonymised, and for each test prompt, a sample was cherry-picked from 8 generated candidates. A total of 8 prompts of varied styles (pop, classical, and jazzy) were tested, resulting in an 8-page survey. The page order and the sample order within each page were both randomised.
Responses were collected from 20 participants with diverse music backgrounds. 14 participants completed all 8 pages with an average completion time of 32 minutes.