Difference between revisions of "2024:Symbolic Music Generation Results"

From MIREX Wiki
(Created page with " = Submissions = {| class="wikitable" |- style="font-weight:bold;" ! style="vertical-align:bottom;" | Team ! Extended Abstract ! Methods ! style="vertical-align:bottom;" | M...")
 
(Submissions)
 
(4 intermediate revisions by one other user not shown)
Line 4: Line 4:
 
{| class="wikitable"  
 
{| class="wikitable"  
 
|- style="font-weight:bold;"
 
|- style="font-weight:bold;"
! style="vertical-align:bottom;" | Team
+
! Team
 
! Extended Abstract
 
! Extended Abstract
 
! Methods
 
! Methods
! style="vertical-align:bottom;" | Methodology
+
! Methodology
 
|-
 
|-
| style="vertical-align:bottom;" | Chart-Accompaniment
+
| Chart-Accompaniment
| style="vertical-align:bottom;" |
+
| [https://futuremirex.com/portal/wp-content/uploads/2024/11/chart_accomp_2024_ISMIR_LBD.pdf PDF]
| style="vertical-align:bottom;" | BART
+
| BART
| A BART model generating piano accompaniments using beat-based tokenization.
+
| A BART model leveraging pre-trained Transformer encoders for piano accompaniment generation.
 
|-
 
|-
| style="vertical-align:bottom;" | AccoMontage (BL-1)
+
| AccoMontage (BL-1)
| style="text-decoration:underline; color:#00E;" | [https://arxiv.org/abs/2108.11213 PDF]
+
| [https://arxiv.org/abs/2108.11213 PDF]
| style="vertical-align:bottom;" | Style Transfer
+
| Style Transfer
 
| A hybrid algorithm generating piano accompaniments by rule-based search and music representation learning.
 
| A hybrid algorithm generating piano accompaniments by rule-based search and music representation learning.
 
|-
 
|-
| style="vertical-align:bottom;" | Whole-Song-Gen (BL-2)
+
| Whole-Song-Gen (BL-2)
| style="text-decoration:underline; color:#00E;" | [https://arxiv.org/abs/2405.09901 PDF]
+
| [https://arxiv.org/abs/2405.09901 PDF]
| style="vertical-align:bottom;" | DDPM
+
| DDPM
 
| A denoising diffusion probabilistic model (DDPM) generating piano accompaniments as piano-roll images
 
| A denoising diffusion probabilistic model (DDPM) generating piano accompaniments as piano-roll images
 
|-
 
|-
| style="vertical-align:bottom;" | Compose-&-Embesslish (BL-3)
+
| Compose-&-Embesslish (BL-3)
| style="text-decoration:underline; color:#00E;" | [https://arxiv.org/abs/2209.08212 PDF]
+
| [https://arxiv.org/abs/2209.08212 PDF]
| style="vertical-align:bottom;" | Transformer
+
| Transformer
 
| A Transformer-based architecture generating piano performances in beat-based event sequences.
 
| A Transformer-based architecture generating piano performances in beat-based event sequences.
 
|}
 
|}
Line 33: Line 33:
  
 
{| class="wikitable" style="text-align:center;"
 
{| class="wikitable" style="text-align:center;"
|- style="font-weight:bold; vertical-align:bottom;"
+
|- style="font-weight:bold; vertical-align:center;"
 
! rowspan="2" | Team
 
! rowspan="2" | Team
 
! colspan="4" | Subjective Evaluation
 
! colspan="4" | Subjective Evaluation
 
! Objective Evaluation
 
! Objective Evaluation
|- style="font-weight:bold; vertical-align:bottom;"
+
|- style="font-weight:bold; vertical-align:center;"
 
| Coherecy ↑
 
| Coherecy ↑
 
| Naturalness ↑
 
| Naturalness ↑
Line 44: Line 44:
 
| NLL ↓
 
| NLL ↓
 
|-
 
|-
| style="vertical-align:bottom; text-align:left;" | Chart-Accompaniment
+
| Chart-Accompaniment
 
| 1.92 ± 0.11<sup>d</sup>
 
| 1.92 ± 0.11<sup>d</sup>
 
| 1.87 ± 0.10<sup>c</sup>
 
| 1.87 ± 0.10<sup>c</sup>
 
| 2.62 ± 0.13<sup>c</sup>
 
| 2.62 ± 0.13<sup>c</sup>
 
| 2.01 ± 0.11<sup>c</sup>
 
| 2.01 ± 0.11<sup>c</sup>
| style="vertical-align:bottom;" | 4.12 ± 0.12<sup>c</sup>
+
| 4.12 ± 0.12<sup>c</sup>
 
|-
 
|-
| style="vertical-align:bottom; text-align:left;" | AccoMontage (BL-1)
+
| AccoMontage (BL-1)
 
| '''3.77 ± 0.11<sup>a</sup>'''
 
| '''3.77 ± 0.11<sup>a</sup>'''
 
| '''3.59 ± 0.11<sup>a</sup>'''
 
| '''3.59 ± 0.11<sup>a</sup>'''
 
| '''3.65 ± 0.11<sup>a</sup>'''
 
| '''3.65 ± 0.11<sup>a</sup>'''
 
| '''3.63 ± 0.12<sup>a</sup>'''
 
| '''3.63 ± 0.12<sup>a</sup>'''
| style="vertical-align:bottom;" | '''2.48 ± 0.07<sup>a</sup>'''
+
| '''2.48 ± 0.07<sup>a</sup>'''
 
|-
 
|-
| style="vertical-align:bottom; text-align:left;" | Whole-Song-Gen (BL-2)
+
| Whole-Song-Gen (BL-2)
 
| 3.59 ± 0.11<sup>b</sup>
 
| 3.59 ± 0.11<sup>b</sup>
 
| 3.24 ± 0.11<sup>b</sup>
 
| 3.24 ± 0.11<sup>b</sup>
 
| '''3.66 ± 0.10<sup>a</sup>'''
 
| '''3.66 ± 0.10<sup>a</sup>'''
 
| 3.47 ± 0.13<sup>b</sup>
 
| 3.47 ± 0.13<sup>b</sup>
| style="vertical-align:bottom;" | 2.87 ± 0.08<sup>b</sup>
+
| 2.87 ± 0.08<sup>b</sup>
 
|-
 
|-
| style="vertical-align:bottom; text-align:left;" | Compose-&-Embesslish (BL-3)
+
| Compose-&-Embesslish (BL-3)
 
| 3.39 ± 0.10<sup>c</sup>
 
| 3.39 ± 0.10<sup>c</sup>
 
| 3.38 ± 0.12<sup>b</sup>
 
| 3.38 ± 0.12<sup>b</sup>
 
| 3.13 ± 0.10<sup>b</sup>
 
| 3.13 ± 0.10<sup>b</sup>
 
| 3.36 ± 0.11<sup>b</sup>
 
| 3.36 ± 0.11<sup>b</sup>
| style="vertical-align:bottom;" | 7.41 ± 0.07<sup>d</sup>
+
| 7.41 ± 0.07<sup>d</sup>
 
|}
 
|}
  
 
'''Note''': Results are reported in the form of mean ± sem<sup>s</sup> (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test.
 
'''Note''': Results are reported in the form of mean ± sem<sup>s</sup> (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test.
 +
 
'''Objective Evaluation Details''': Each model generates 16 samples for each of 6 test pieces. Negative Log Likelihood (NLL) is computed by inputing the molody and accompaniment into the MuseCoco 1B model.
 
'''Objective Evaluation Details''': Each model generates 16 samples for each of 6 test pieces. Negative Log Likelihood (NLL) is computed by inputing the molody and accompaniment into the MuseCoco 1B model.
 +
 
'''Subjective Evaluation Details''': One piece cherry-picked from 16 samples of each test piece, resulting in 6 pages of questions. We collect responses from 22 participants (18 complete submissions and 4 partial submissions). For complete submissions, the average completion time is 16min 59s.
 
'''Subjective Evaluation Details''': One piece cherry-picked from 16 samples of each test piece, resulting in 6 pages of questions. We collect responses from 22 participants (18 complete submissions and 4 partial submissions). For complete submissions, the average completion time is 16min 59s.

Latest revision as of 02:28, 12 November 2024

Submissions

Team Extended Abstract Methods Methodology
Chart-Accompaniment PDF BART A BART model leveraging pre-trained Transformer encoders for piano accompaniment generation.
AccoMontage (BL-1) PDF Style Transfer A hybrid algorithm generating piano accompaniments by rule-based search and music representation learning.
Whole-Song-Gen (BL-2) PDF DDPM A denoising diffusion probabilistic model (DDPM) generating piano accompaniments as piano-roll images
Compose-&-Embesslish (BL-3) PDF Transformer A Transformer-based architecture generating piano performances in beat-based event sequences.

Results

Team Subjective Evaluation Objective Evaluation
Coherecy ↑ Naturalness ↑ Creativity ↑ Musicality ↑ NLL ↓
Chart-Accompaniment 1.92 ± 0.11d 1.87 ± 0.10c 2.62 ± 0.13c 2.01 ± 0.11c 4.12 ± 0.12c
AccoMontage (BL-1) 3.77 ± 0.11a 3.59 ± 0.11a 3.65 ± 0.11a 3.63 ± 0.12a 2.48 ± 0.07a
Whole-Song-Gen (BL-2) 3.59 ± 0.11b 3.24 ± 0.11b 3.66 ± 0.10a 3.47 ± 0.13b 2.87 ± 0.08b
Compose-&-Embesslish (BL-3) 3.39 ± 0.10c 3.38 ± 0.12b 3.13 ± 0.10b 3.36 ± 0.11b 7.41 ± 0.07d

Note: Results are reported in the form of mean ± sems (sem refers to standard error of mean), where s is a letter. Different letters within a column indicate significant differences (p-value p < 0.05) based on a Wilcoxon signed rank test.

Objective Evaluation Details: Each model generates 16 samples for each of 6 test pieces. Negative Log Likelihood (NLL) is computed by inputing the molody and accompaniment into the MuseCoco 1B model.

Subjective Evaluation Details: One piece cherry-picked from 16 samples of each test piece, resulting in 6 pages of questions. We collect responses from 22 participants (18 complete submissions and 4 partial submissions). For complete submissions, the average completion time is 16min 59s.