Difference between revisions of "2024:Symbolic Music Generation Results"
From MIREX Wiki
(Created page with " = Submissions = {| class="wikitable" |- style="font-weight:bold;" ! style="vertical-align:bottom;" | Team ! Extended Abstract ! Methods ! style="vertical-align:bottom;" | M...") |
Zhaojw1998 (talk | contribs) (→Submissions) |
||
(4 intermediate revisions by one other user not shown) | |||
Line 4: | Line 4: | ||
{| class="wikitable" | {| class="wikitable" | ||
|- style="font-weight:bold;" | |- style="font-weight:bold;" | ||
− | ! | + | ! Team |
! Extended Abstract | ! Extended Abstract | ||
! Methods | ! Methods | ||
− | ! | + | ! Methodology |
|- | |- | ||
− | + | | Chart-Accompaniment | |
− | | | + | | [https://futuremirex.com/portal/wp-content/uploads/2024/11/chart_accomp_2024_ISMIR_LBD.pdf PDF] |
− | + | | BART | |
− | | A BART model | + | | A BART model leveraging pre-trained Transformer encoders for piano accompaniment generation. |
|- | |- | ||
− | + | | AccoMontage (BL-1) | |
− | + | | [https://arxiv.org/abs/2108.11213 PDF] | |
− | + | | Style Transfer | |
| A hybrid algorithm generating piano accompaniments by rule-based search and music representation learning. | | A hybrid algorithm generating piano accompaniments by rule-based search and music representation learning. | ||
|- | |- | ||
− | + | | Whole-Song-Gen (BL-2) | |
− | + | | [https://arxiv.org/abs/2405.09901 PDF] | |
− | + | | DDPM | |
| A denoising diffusion probabilistic model (DDPM) generating piano accompaniments as piano-roll images | | A denoising diffusion probabilistic model (DDPM) generating piano accompaniments as piano-roll images | ||
|- | |- | ||
− | + | | Compose-&-Embesslish (BL-3) | |
− | | | + | | [https://arxiv.org/abs/2209.08212 PDF] |
− | + | | Transformer | |
| A Transformer-based architecture generating piano performances in beat-based event sequences. | | A Transformer-based architecture generating piano performances in beat-based event sequences. | ||
|} | |} | ||
Line 33: | Line 33: | ||
{| class="wikitable" style="text-align:center;" | {| class="wikitable" style="text-align:center;" | ||
− | |- style="font-weight:bold; vertical-align: | + | |- style="font-weight:bold; vertical-align:center;" |
! rowspan="2" | Team | ! rowspan="2" | Team | ||
! colspan="4" | Subjective Evaluation | ! colspan="4" | Subjective Evaluation | ||
! Objective Evaluation | ! Objective Evaluation | ||
− | |- style="font-weight:bold; vertical-align: | + | |- style="font-weight:bold; vertical-align:center;" |
| Coherecy ↑ | | Coherecy ↑ | ||
| Naturalness ↑ | | Naturalness ↑ | ||
Line 44: | Line 44: | ||
| NLL ↓ | | NLL ↓ | ||
|- | |- | ||
− | + | | Chart-Accompaniment | |
| 1.92 ± 0.11<sup>d</sup> | | 1.92 ± 0.11<sup>d</sup> | ||
| 1.87 ± 0.10<sup>c</sup> | | 1.87 ± 0.10<sup>c</sup> | ||
| 2.62 ± 0.13<sup>c</sup> | | 2.62 ± 0.13<sup>c</sup> | ||
| 2.01 ± 0.11<sup>c</sup> | | 2.01 ± 0.11<sup>c</sup> | ||
− | + | | 4.12 ± 0.12<sup>c</sup> | |
|- | |- | ||
− | + | | AccoMontage (BL-1) | |
| '''3.77 ± 0.11<sup>a</sup>''' | | '''3.77 ± 0.11<sup>a</sup>''' | ||
| '''3.59 ± 0.11<sup>a</sup>''' | | '''3.59 ± 0.11<sup>a</sup>''' | ||
| '''3.65 ± 0.11<sup>a</sup>''' | | '''3.65 ± 0.11<sup>a</sup>''' | ||
| '''3.63 ± 0.12<sup>a</sup>''' | | '''3.63 ± 0.12<sup>a</sup>''' | ||
− | + | | '''2.48 ± 0.07<sup>a</sup>''' | |
|- | |- | ||
− | + | | Whole-Song-Gen (BL-2) | |
| 3.59 ± 0.11<sup>b</sup> | | 3.59 ± 0.11<sup>b</sup> | ||
| 3.24 ± 0.11<sup>b</sup> | | 3.24 ± 0.11<sup>b</sup> | ||
| '''3.66 ± 0.10<sup>a</sup>''' | | '''3.66 ± 0.10<sup>a</sup>''' | ||
| 3.47 ± 0.13<sup>b</sup> | | 3.47 ± 0.13<sup>b</sup> | ||
− | + | | 2.87 ± 0.08<sup>b</sup> | |
|- | |- | ||
− | + | | Compose-&-Embesslish (BL-3) | |
| 3.39 ± 0.10<sup>c</sup> | | 3.39 ± 0.10<sup>c</sup> | ||
| 3.38 ± 0.12<sup>b</sup> | | 3.38 ± 0.12<sup>b</sup> | ||
| 3.13 ± 0.10<sup>b</sup> | | 3.13 ± 0.10<sup>b</sup> | ||
| 3.36 ± 0.11<sup>b</sup> | | 3.36 ± 0.11<sup>b</sup> | ||
− | + | | 7.41 ± 0.07<sup>d</sup> | |
|} | |} | ||
'''Note''': Results are reported in the form of mean ± sem<sup>s</sup> (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test. | '''Note''': Results are reported in the form of mean ± sem<sup>s</sup> (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test. | ||
+ | |||
'''Objective Evaluation Details''': Each model generates 16 samples for each of 6 test pieces. Negative Log Likelihood (NLL) is computed by inputing the molody and accompaniment into the MuseCoco 1B model. | '''Objective Evaluation Details''': Each model generates 16 samples for each of 6 test pieces. Negative Log Likelihood (NLL) is computed by inputing the molody and accompaniment into the MuseCoco 1B model. | ||
+ | |||
'''Subjective Evaluation Details''': One piece cherry-picked from 16 samples of each test piece, resulting in 6 pages of questions. We collect responses from 22 participants (18 complete submissions and 4 partial submissions). For complete submissions, the average completion time is 16min 59s. | '''Subjective Evaluation Details''': One piece cherry-picked from 16 samples of each test piece, resulting in 6 pages of questions. We collect responses from 22 participants (18 complete submissions and 4 partial submissions). For complete submissions, the average completion time is 16min 59s. |
Latest revision as of 02:28, 12 November 2024
Submissions
Team | Extended Abstract | Methods | Methodology |
---|---|---|---|
Chart-Accompaniment | BART | A BART model leveraging pre-trained Transformer encoders for piano accompaniment generation. | |
AccoMontage (BL-1) | Style Transfer | A hybrid algorithm generating piano accompaniments by rule-based search and music representation learning. | |
Whole-Song-Gen (BL-2) | DDPM | A denoising diffusion probabilistic model (DDPM) generating piano accompaniments as piano-roll images | |
Compose-&-Embesslish (BL-3) | Transformer | A Transformer-based architecture generating piano performances in beat-based event sequences. |
Results
Team | Subjective Evaluation | Objective Evaluation | |||
---|---|---|---|---|---|
Coherecy ↑ | Naturalness ↑ | Creativity ↑ | Musicality ↑ | NLL ↓ | |
Chart-Accompaniment | 1.92 ± 0.11d | 1.87 ± 0.10c | 2.62 ± 0.13c | 2.01 ± 0.11c | 4.12 ± 0.12c |
AccoMontage (BL-1) | 3.77 ± 0.11a | 3.59 ± 0.11a | 3.65 ± 0.11a | 3.63 ± 0.12a | 2.48 ± 0.07a |
Whole-Song-Gen (BL-2) | 3.59 ± 0.11b | 3.24 ± 0.11b | 3.66 ± 0.10a | 3.47 ± 0.13b | 2.87 ± 0.08b |
Compose-&-Embesslish (BL-3) | 3.39 ± 0.10c | 3.38 ± 0.12b | 3.13 ± 0.10b | 3.36 ± 0.11b | 7.41 ± 0.07d |
Note: Results are reported in the form of mean ± sems (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test.
Objective Evaluation Details: Each model generates 16 samples for each of 6 test pieces. Negative Log Likelihood (NLL) is computed by inputing the molody and accompaniment into the MuseCoco 1B model.
Subjective Evaluation Details: One piece cherry-picked from 16 samples of each test piece, resulting in 6 pages of questions. We collect responses from 22 participants (18 complete submissions and 4 partial submissions). For complete submissions, the average completion time is 16min 59s.