Difference between revisions of "2024:Symbolic Music Generation"

From MIREX Wiki
(References)
(Objective Measurements)
Line 83: Line 83:
 
==Objective Measurements==
 
==Objective Measurements==
 
* We will use objective measurements only as a reference. The correlation between subjective and objective scores will be measured as a reference.  
 
* We will use objective measurements only as a reference. The correlation between subjective and objective scores will be measured as a reference.  
* The current plan is to compute the Negative Log Likelihood of a large music language model (e.g., Lu et al., 2023)scwill currently use only the likelihood.
+
* The current plan is to compute the Negative Log Likelihood of a large music language model (e.g., Lu et al., 2023).
 +
* We welcome proposals of the objective measurements.
  
 
=Submission=
 
=Submission=

Revision as of 04:48, 15 September 2024

Description

Symbolic music generation is a broad topic. It covers a wide range of tasks, including generation, harmonization, arrangement, instrumentation, and more. We have multiple ways to represent music data, and the evaluation metrics also vary. To define a MIREX challenge within this topic, we need to narrow our focus to specific subtasks that are both relevant to the community and feasible to evaluate effectively.

This year, we select the task to be piano accompaniment arrangement from a lead sheet. The lead sheet provides information about the melody, chord progression, and optional phrase labels. The goal is to generate a piano accompaniment that complements the lead melody. The music data consists of 8-measure segments in 4/4 meter, quantized to a sixteenth-note resolution. A more detailed description of the data structure is provided in the data format section. The genre of the lead sheets is broadly within western pop music (refer to the music examples for more detail).

Data Format

The input lead sheet consists of 8 bars for the melody and harmony, with an additional mandatory pickup measure (left blank if not used). The data is prepared in JSON format containing two properties: melody and chords:

  • melody: a list of notes. Each note contains properties of start, pitch, and duration.
  • chords: a list of chords. Each chord contains properties of start, symbol, and duration.

The output generation should also follow the JSON format containing one property acc:

  • acc: a list of notes. Each note contains properties of start, pitch, and duration.

Detailed explanation of start and duration attributes.

  1. The data is assumed to be in 4/4 meter, quantized to a sixteenth-note resolution. For both melody and chords, onsets and durations are counted in sixteenth notes.
  2. Both onsets and durations are integers ranging from 0 to 9 * 16 - 1 = 143. Notes that end later than the ninth measure (i.e., 9 * 16 = 144th time step) will be truncated to the end of the ninth measure.
  3. Melody notes are not allowed to overlap with one another.
  4. There should be no gaps or overlaps between chords. Chords must follow one another directly. If there is a blank space where no chord is played, it must be filled with the N chord.
  5. The accompaniment of the pick-up measure should be blank.

Detailed explanation of the pitch attribute.

  1. The pitch property of a note should be integers ranging from 0 to 127, corresponding to the MIDI pitch numbers.

Detailed explanation of the chord symbol attribute.

  1. The symbol property of a chord should be a string based on the syntax of (Harte, 2010). In other words, each chord string should be able to be passed as a parameter to mir_eval.chord.encode() without causing an error.

Data Example

Below is an example of the input lead sheet in the format given above. The lead sheet is the melody of the first phrase of Hey Jude by The Beatles.

{
  "melody": [
    {"start": 12, "pitch": 72, "duration": 4},
    {"start": 16, "pitch": 69, "duration": 8},
    ...
  ],
  "chords": [
    {"start": 0, "symbol": "N", "duration": 16},
    {"start": 16, "symbol": "F", "duration": 16},
    ...
  ]
}


This is an example of the generated accompaniment. The accompaniment is generated using the baseline method WholeSongGen introduced below. Note that the generation starts from the second measure (time step 16).

{
  "acc": [
    {"start": 16, "pitch": 41, "duration": 12},
    {"start": 16, "pitch": 65, "duration": 5},
    ...
  ]
}

Full data examples can be accessed in this code repository. MIDI conversion code and MIDI demos are also provided there.

Evaluation

We will evaluate the submitted algorithms through an online subjective double-blind test. The evaluation format differs from conventional tasks in the following aspects:

  • We use a "potluck" test set. Before submitting the algorithm, each team is required to submit two lead sheets. The organizer team will supplement the lead sheet if necessary.
  • There will be no live ranking because the subjective test will be done after the algorithm submission deadline.
  • To better handle randomness in the generation algorithm, we allow cherry-picking from a fixed number of generated samples.
  • We hope to compute some objective measurements as well, but these will only be reported as a reference.


Subjective Evaluation Format

  • After each team submits the algorithm, the organizer team will use the algorithm to generate 16 arrangements for each test sample. The generated results will be returned to each team for cherry-picking.
  • Only a subset of the test set will be used for subjective evaluation.
  • In the subjective evaluation, we will first ask the subjects to listen to the lead melody with chords and then listen to the generated samples in random order. The order of the samples will be randomized.
  • The subject will be asked to rate each arrangement based on the following criteria:
  • Harmony correctness (5-point scale)
  • Creativity (5-point scale)
  • Naturalness (5-point scale)
  • Overall musicality (5-point scale)

Objective Measurements

  • We will use objective measurements only as a reference. The correlation between subjective and objective scores will be measured as a reference.
  • The current plan is to compute the Negative Log Likelihood of a large music language model (e.g., Lu et al., 2023).
  • We welcome proposals of the objective measurements.

Submission

Important Dates

The submission process is tentative.

  • Oct 8, 2024: submission of two lead sheets as a part of the test set. This is also a confirmation of participation.
  • Oct. 15, 2024: submission of the algorithm in docker.
  • Oct. 22, 2024: return of the generated samples. Start of the cherry-picking phase.
  • Oct. 25, 2024: submission of the cherry-picked sample ids.
  • Oct. 31 - Nov. 3, 2024: subjective test.
  • Nov. 5, 2024: announcement of the final result.

I/O Format

Participants must include an batch_acc_gen.sh script in their submission. The task captain will use the script to generate output files according to the following format:

Usage

acc_gen.sh "/path/to/input.json" "/path/to/output_folder" n_sample
  • Input File: Path to the input .json file.
  • Output Folder: Path to the folder where the generated output files will be saved.
  • n_sample: Number of samples to generate.

Output

  • The script should generate n_sample output files in the specified output folder.
  • Output files should be named sequentially as sample_01.json, sample_02.json, ..., up to sample_n_sample.json.

Participants are free to implement the internal logic of the script, but it must adhere to this format for proper execution during the evaluation process.

Packaging Submissions

  • Every submission must be packed into a docker image
  • Every submission will be deployed and evaluated automatically with docker run

Accepted submission form:

  • Link to public or private Github repository
  • Link to public or private docker hub
  • Shared google drive links
  • If the repository is private, an access token is also required

Baselines

TBD

References

  • Harte. Towards Automatic Extraction of Harmony Information from Music Signals. PhD thesis, Queen Mary University of London, August 2010.
  • Lu, P., Xu, X., Kang, C., Yu, B., Xing, C., Tan, X., & Bian, J. (2023). Musecoco: Generating symbolic music from text. arXiv preprint arXiv:2306.00110.