Difference between revisions of "2025:Symbolic Music Generation Results"
From MIREX Wiki
Zizzi wang (talk | contribs) (→Results) |
Zizzi wang (talk | contribs) (→Results) |
||
| Line 65: | Line 65: | ||
'''Note''': Results are reported in the form of mean ± sem<sup>s</sup> (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test. | '''Note''': Results are reported in the form of mean ± sem<sup>s</sup> (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test. | ||
| − | '''Subjective Evaluation Details''': One piece cherry-picked from | + | '''Subjective Evaluation Details''': One piece cherry-picked from 8 samples of each test piece, resulting in 6 pages of questions. We collect responses from 22 participants (18 complete submissions and 4 partial submissions). For complete submissions, the average completion time is 16min 59s. |
Revision as of 06:51, 11 September 2025
Submissions
| Team | Extended Abstract | Methods |
|---|---|---|
| RWKV (Zhou-Zheng et al.) | [1] | RWKV |
| PixelGen | [2] | Hierarchical Transformer |
| MuseCoco (BL-1) | [3] | Transformer |
| Anticipatory Music Transformer (BL-2) | [4] | Transformer |
Results
| Team | Subjective Evaluation | |||
|---|---|---|---|---|
| Coherecy ↑ | Structure ↑ | Creativity ↑ | Musicality ↑ | |
| RWKV (Zhou-Zheng et al.) | 3.57 ± 0.10a | 3.58 ± 0.10a | 3.26 ± 0.10a | 3.5 ± 0.10a |
| PixelGen | 2.39 ± 0.10c | 2.37 ± 0.09c | 2.85 ± 0.09b | 2.48 ± 0.09c |
| MuseCoco (BL-1) | 3.11 ± 0.10b | 3.07 ± 0.09b | 3.08 ± 0.09ab | 2.95 ± 0.09b |
| Anticipatory Music Transformer (BL-2) | 3.70 ± 0.10c | 3.69 ± 0.09b | 3.30 ± 0.10b | 3.45 ± 0.10b |
Note: Results are reported in the form of mean ± sems (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test.
Subjective Evaluation Details: One piece cherry-picked from 8 samples of each test piece, resulting in 6 pages of questions. We collect responses from 22 participants (18 complete submissions and 4 partial submissions). For complete submissions, the average completion time is 16min 59s.