Difference between revisions of "2024:Symbolic Music Generation Results"
From MIREX Wiki
(→Submissions) |
Zhaojw1998 (talk | contribs) (→Submissions) |
||
(One intermediate revision by one other user not shown) | |||
Line 12: | Line 12: | ||
| [https://futuremirex.com/portal/wp-content/uploads/2024/11/chart_accomp_2024_ISMIR_LBD.pdf PDF] | | [https://futuremirex.com/portal/wp-content/uploads/2024/11/chart_accomp_2024_ISMIR_LBD.pdf PDF] | ||
| BART | | BART | ||
− | | A BART model | + | | A BART model leveraging pre-trained Transformer encoders for piano accompaniment generation. |
|- | |- | ||
| AccoMontage (BL-1) | | AccoMontage (BL-1) | ||
Line 33: | Line 33: | ||
{| class="wikitable" style="text-align:center;" | {| class="wikitable" style="text-align:center;" | ||
− | |- style="font-weight:bold; vertical-align: | + | |- style="font-weight:bold; vertical-align:center;" |
! rowspan="2" | Team | ! rowspan="2" | Team | ||
! colspan="4" | Subjective Evaluation | ! colspan="4" | Subjective Evaluation | ||
! Objective Evaluation | ! Objective Evaluation | ||
− | |- style="font-weight:bold; vertical-align: | + | |- style="font-weight:bold; vertical-align:center;" |
| Coherecy ↑ | | Coherecy ↑ | ||
| Naturalness ↑ | | Naturalness ↑ | ||
Line 44: | Line 44: | ||
| NLL ↓ | | NLL ↓ | ||
|- | |- | ||
− | + | | Chart-Accompaniment | |
| 1.92 ± 0.11<sup>d</sup> | | 1.92 ± 0.11<sup>d</sup> | ||
| 1.87 ± 0.10<sup>c</sup> | | 1.87 ± 0.10<sup>c</sup> | ||
| 2.62 ± 0.13<sup>c</sup> | | 2.62 ± 0.13<sup>c</sup> | ||
| 2.01 ± 0.11<sup>c</sup> | | 2.01 ± 0.11<sup>c</sup> | ||
− | + | | 4.12 ± 0.12<sup>c</sup> | |
|- | |- | ||
− | + | | AccoMontage (BL-1) | |
| '''3.77 ± 0.11<sup>a</sup>''' | | '''3.77 ± 0.11<sup>a</sup>''' | ||
| '''3.59 ± 0.11<sup>a</sup>''' | | '''3.59 ± 0.11<sup>a</sup>''' | ||
| '''3.65 ± 0.11<sup>a</sup>''' | | '''3.65 ± 0.11<sup>a</sup>''' | ||
| '''3.63 ± 0.12<sup>a</sup>''' | | '''3.63 ± 0.12<sup>a</sup>''' | ||
− | + | | '''2.48 ± 0.07<sup>a</sup>''' | |
|- | |- | ||
− | + | | Whole-Song-Gen (BL-2) | |
| 3.59 ± 0.11<sup>b</sup> | | 3.59 ± 0.11<sup>b</sup> | ||
| 3.24 ± 0.11<sup>b</sup> | | 3.24 ± 0.11<sup>b</sup> | ||
| '''3.66 ± 0.10<sup>a</sup>''' | | '''3.66 ± 0.10<sup>a</sup>''' | ||
| 3.47 ± 0.13<sup>b</sup> | | 3.47 ± 0.13<sup>b</sup> | ||
− | + | | 2.87 ± 0.08<sup>b</sup> | |
|- | |- | ||
− | + | | Compose-&-Embesslish (BL-3) | |
| 3.39 ± 0.10<sup>c</sup> | | 3.39 ± 0.10<sup>c</sup> | ||
| 3.38 ± 0.12<sup>b</sup> | | 3.38 ± 0.12<sup>b</sup> | ||
| 3.13 ± 0.10<sup>b</sup> | | 3.13 ± 0.10<sup>b</sup> | ||
| 3.36 ± 0.11<sup>b</sup> | | 3.36 ± 0.11<sup>b</sup> | ||
− | + | | 7.41 ± 0.07<sup>d</sup> | |
|} | |} | ||
Latest revision as of 02:28, 12 November 2024
Submissions
Team | Extended Abstract | Methods | Methodology |
---|---|---|---|
Chart-Accompaniment | BART | A BART model leveraging pre-trained Transformer encoders for piano accompaniment generation. | |
AccoMontage (BL-1) | Style Transfer | A hybrid algorithm generating piano accompaniments by rule-based search and music representation learning. | |
Whole-Song-Gen (BL-2) | DDPM | A denoising diffusion probabilistic model (DDPM) generating piano accompaniments as piano-roll images | |
Compose-&-Embesslish (BL-3) | Transformer | A Transformer-based architecture generating piano performances in beat-based event sequences. |
Results
Team | Subjective Evaluation | Objective Evaluation | |||
---|---|---|---|---|---|
Coherecy ↑ | Naturalness ↑ | Creativity ↑ | Musicality ↑ | NLL ↓ | |
Chart-Accompaniment | 1.92 ± 0.11d | 1.87 ± 0.10c | 2.62 ± 0.13c | 2.01 ± 0.11c | 4.12 ± 0.12c |
AccoMontage (BL-1) | 3.77 ± 0.11a | 3.59 ± 0.11a | 3.65 ± 0.11a | 3.63 ± 0.12a | 2.48 ± 0.07a |
Whole-Song-Gen (BL-2) | 3.59 ± 0.11b | 3.24 ± 0.11b | 3.66 ± 0.10a | 3.47 ± 0.13b | 2.87 ± 0.08b |
Compose-&-Embesslish (BL-3) | 3.39 ± 0.10c | 3.38 ± 0.12b | 3.13 ± 0.10b | 3.36 ± 0.11b | 7.41 ± 0.07d |
Note: Results are reported in the form of mean ± sems (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test.
Objective Evaluation Details: Each model generates 16 samples for each of 6 test pieces. Negative Log Likelihood (NLL) is computed by inputing the molody and accompaniment into the MuseCoco 1B model.
Subjective Evaluation Details: One piece cherry-picked from 16 samples of each test piece, resulting in 6 pages of questions. We collect responses from 22 participants (18 complete submissions and 4 partial submissions). For complete submissions, the average completion time is 16min 59s.