MIREX Wiki - User contributions [en]

2019:Patterns for Prediction Results

2019-11-05T13:40:16Z

Tom Collins: /* Discussion */

== Introduction ==

'''In brief''':

We look for

(1) Algorithms that take an excerpt of music as input (the prime), and output a predicted continuation of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next N musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; MIREX Discovery of Repeated Themes & Sections task; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt (Meredith, 2013). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

== Contribution ==

...

For a more detailed introduction to the task, please see [[2019:Patterns for Prediction]].

== Datasets and Algorithms ==

The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it shares simiarlity with LMD and not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Submissions to the symMono and symPoly variants of the tasks are listed in Table 1. There were no submissions to the audMono or audPoly variants of the tasks this year. The task captains prepared a first-order Markov model (MM) over a state space of measure beat and key-centralized MIDI note number. This enabled evaluation of the implicit subtask, and can also serve as a point of comparison for the explicit task. It should be noted, however, that this model had access to the full song/piece – '''not just the prime''' – so it is at an advantage compared to EN1 and FC1 in the explicit task.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono - Task 1
!
!
|-
! TD1
| CopyForward || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/TD1.pdf PDF] || [https://mcgill.ca/music/timothy-de-reuse Timothy de Reuse]
|-
|- style="background: green;"
! Task Version
! symPoly - Task 1
!
!
|-
! TD1
| CopyForward || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/TD1.pdf PDF] || [https://mcgill.ca/music/timothy-de-reuse Timothy de Reuse]
|-
|- style="background: green;"
! Task Version
! symMono - Task 2
!
!
|-
! EP1
| GenDetect || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/EP1.pdf PDF] || [http://metacreation.net/members/jeff-ens/ Jeff Ens], [http://philippepasquier.com/publications Philippe Pasquier]
|-
|- style="background: green;"
! Task Version
! symPoly Task 2
!
!
|-
! EP1
| GenDetect || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/EP1.pdf PDF] || [http://metacreation.net/members/jeff-ens/ Jeff Ens], [http://philippepasquier.com/publications Philippe Pasquier]
|-
! YB2
| MLM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/YB2.pdf PDF] || [http://www.eecs.qmul.ac.uk/~ay304/ Adrien Ycart], [http://www.eecs.qmul.ac.uk/profiles/benetosemmanouil.html Emmanouil Benetos]
|-
! YB5
| MLM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/YB2.pdf PDF] || [http://www.eecs.qmul.ac.uk/~ay304/ Adrien Ycart], [http://www.eecs.qmul.ac.uk/profiles/benetosemmanouil.html Emmanouil Benetos]

|}

'''Table 1. Algorithms submitted to Patterns for Prediction 2019.'''

== Results ==

We measure the performance of an algorithm to 1) predict a continuation, given a prime (explicit task), and 2) decide which of two versions is the true or foil continuation, given a prime (implicit task). To evaluate performance at the explicit task, we compare the true continuation to the generated continuation, and measure how many pitch-onset pairs are correctly predicted at various time intervals after the last note of the prime. To evaluate performance at the implicit task, we measure accuracy as the number of correct decisions, divided by the total amount of decisions. (For mathematical definitions of the various metrics, please see [[2019:Patterns_for_Prediction#Evaluation_Procedure]].)

For reference purposes, we also include results of the [https://www.music-ir.org/mirex/wiki/2018:Patterns_for_Prediction_Results#Datasets_and_Algorithms submissions from 2018] (BachProp and Seq2SeqP4P).

==Figures==
===symMono===
====Explicit task: generate music given a prime====

[[File:2019_mono_cs.png|1000px]]

'''Figure 1.''' Precision, recall and F1 (cardinality score) in quarter note onsets from prediction start.

[[File:2019_mono_pitch.png|800px]]

'''Figure 2.''' Pitch overlap of the algorithmically generated continuations with the true continuation.

===symPoly===
====Explicit task: generate music given a prime====
[[File:2019_poly_cs.png|1000px]]

'''Figure 3.''' Precision, recall and F1 (cardinality score) in quarter note onsets from prediction start.

[[File:2019_poly_pitch.png|800px]]

'''Figure 4.''' Pitch overlap of the algorithmically generated continuations with the true continuation.

==Tables==
===symMono===
====Explicit task: generate music given a prime====
<table border="1" class="dataframe">
<tr>
<th></th>
<th colspan="3" halign="left">Modulo12Pitch</th>
</tr>
<tr>
<th></th>
<th>mean</th>
<th>median</th>
<th>std</th>
</tr>
<tr>
<th>Model</th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>BachProp</th>
<td>0.502</td>
<td>0.516</td>
<td>0.219</td>
</tr>
<tr>
<th>CopyForward</th>
<td>0.596</td>
<td>0.612</td>
<td>0.292</td>
</tr>
<tr>
<th>Markov</th>
<td>0.583</td>
<td>0.608</td>
<td>0.195</td>
</tr>
<tr>
<th>Seq2SeqP4P</th>
<td>0.087</td>
<td>0.000</td>
<td>0.121</td>
</tr>
</table>

'''Table 2.''' Pitch overlap of the algorithmic continuations with the true continuation - mean, median and standard deviation.

===symPoly===
====Explicit task: generate music given a prime====
<table border="1" class="dataframe">
<tr>
<th></th>
<th colspan="3" halign="left">Pitch</th>
</tr>
<tr>
<th></th>
<th>mean</th>
<th>median</th>
<th>std</th>
</tr>
<tr>
<th>Model</th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>BachProp</th>
<td>0.455</td>
<td>0.466</td>
<td>0.139</td>
</tr>
<tr>
<th>CopyForward</th>
<td>0.594</td>
<td>0.598</td>
<td>0.238</td>
</tr>
<tr>
<th>Markov</th>
<td>0.506</td>
<td>0.508</td>
<td>0.176</td>
</tr>
</table>

'''Table 3.''' Pitch overlap of the algorithmic continuations with the true continuation - mean, median and standard deviation.

===symMono/symPoly===
====Implicit task: discriminate true and foil continuation====
<table border="1" class="dataframe">
<tr style="text-align: right;">
<th></th>
<th></th>
<th>Observations</th>
<th>Accuracy</th>
<th>Mean Probability</th>
<th>Variance</th>
</tr>
<tr>
<th>model</th>
<th>data</th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>GenDetect</th>
<th>mono</th>
<td>500</td>
<td>1.000</td>
<td>1.000</td>
<td>0.000</td>
</tr>
<tr>
<th>BachProp</th>
<th>mono</th>
<td>499</td>
<td>0.844</td>
<td>0.498</td>
<td>0.006</td>
</tr>
<tr>
<th>GenDetect</th>
<th>poly</th>
<td>500</td>
<td>0.998</td>
<td>0.992</td>
<td>0.002</td>
</tr>
<tr>
<th>BachProp</th>
<th>poly</th>
<td>499</td>
<td>0.916</td>
<td>0.519</td>
<td>0.006</td>
</tr>
<tr>
<th>MLM(2)</th>
<th>poly</th>
<td>499</td>
<td>0.703</td>
<td>0.527</td>
<td>0.004</td>
</tr>
<tr>
<th>MLM(5)</th>
<th>poly</th>
<td>499</td>
<td>0.731</td>
<td>0.554</td>
<td>0.012</td>
</tr>
</table>

'''Table 4.''' Discrimination scores of the submitted algorithms.

== Discussion ==

CopyForward (a method based on geometric pattern discovery) significantly outperforms the baseline Markov model and BachProp (a method based on recurrent neural networks) on the explicit subtask. The abstract for CopyForward suggests this relatively high level of performance is due to successful modeling of dynamic expectancies – that is, it uses information from the prime exclusively (no training data) to predict the content of the continuation. BachProp, on the other hand, attempts to model schematic (or corpus-based) expectancies too, combining information from the prime with information from a training process. It remains to be seen whether BachProp (or deep-learning approaches in general) could improve upon CopyForward's continuations when the latter's dependence on modeling of dynamic expectancies leads to unsuccessful predictions.

All algorithms submitted this year to the implicit subtask achieved significantly-above-chance performance. MLM employed an LSTM and GenDetect employed a gradient boosting classifier. GenDetect performed significantly better than previous and current submissions on this task, discriminating between all monophonic true and foil continuations correctly, and all but one polyphonic true and foil continuations. If we run this subtask again next year, we probably ought to identify a different, more sophisticated algorithm for generating the foils.

2019:Patterns for Prediction Results

2019-11-05T13:39:57Z

Tom Collins: /* Discussion */

== Introduction ==

'''In brief''':

We look for

(1) Algorithms that take an excerpt of music as input (the prime), and output a predicted continuation of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next N musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; MIREX Discovery of Repeated Themes & Sections task; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt (Meredith, 2013). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

== Contribution ==

...

For a more detailed introduction to the task, please see [[2019:Patterns for Prediction]].

== Datasets and Algorithms ==

The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it shares simiarlity with LMD and not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Submissions to the symMono and symPoly variants of the tasks are listed in Table 1. There were no submissions to the audMono or audPoly variants of the tasks this year. The task captains prepared a first-order Markov model (MM) over a state space of measure beat and key-centralized MIDI note number. This enabled evaluation of the implicit subtask, and can also serve as a point of comparison for the explicit task. It should be noted, however, that this model had access to the full song/piece – '''not just the prime''' – so it is at an advantage compared to EN1 and FC1 in the explicit task.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono - Task 1
!
!
|-
! TD1
| CopyForward || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/TD1.pdf PDF] || [https://mcgill.ca/music/timothy-de-reuse Timothy de Reuse]
|-
|- style="background: green;"
! Task Version
! symPoly - Task 1
!
!
|-
! TD1
| CopyForward || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/TD1.pdf PDF] || [https://mcgill.ca/music/timothy-de-reuse Timothy de Reuse]
|-
|- style="background: green;"
! Task Version
! symMono - Task 2
!
!
|-
! EP1
| GenDetect || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/EP1.pdf PDF] || [http://metacreation.net/members/jeff-ens/ Jeff Ens], [http://philippepasquier.com/publications Philippe Pasquier]
|-
|- style="background: green;"
! Task Version
! symPoly Task 2
!
!
|-
! EP1
| GenDetect || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/EP1.pdf PDF] || [http://metacreation.net/members/jeff-ens/ Jeff Ens], [http://philippepasquier.com/publications Philippe Pasquier]
|-
! YB2
| MLM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/YB2.pdf PDF] || [http://www.eecs.qmul.ac.uk/~ay304/ Adrien Ycart], [http://www.eecs.qmul.ac.uk/profiles/benetosemmanouil.html Emmanouil Benetos]
|-
! YB5
| MLM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/YB2.pdf PDF] || [http://www.eecs.qmul.ac.uk/~ay304/ Adrien Ycart], [http://www.eecs.qmul.ac.uk/profiles/benetosemmanouil.html Emmanouil Benetos]

|}

'''Table 1. Algorithms submitted to Patterns for Prediction 2019.'''

== Results ==

We measure the performance of an algorithm to 1) predict a continuation, given a prime (explicit task), and 2) decide which of two versions is the true or foil continuation, given a prime (implicit task). To evaluate performance at the explicit task, we compare the true continuation to the generated continuation, and measure how many pitch-onset pairs are correctly predicted at various time intervals after the last note of the prime. To evaluate performance at the implicit task, we measure accuracy as the number of correct decisions, divided by the total amount of decisions. (For mathematical definitions of the various metrics, please see [[2019:Patterns_for_Prediction#Evaluation_Procedure]].)

For reference purposes, we also include results of the [https://www.music-ir.org/mirex/wiki/2018:Patterns_for_Prediction_Results#Datasets_and_Algorithms submissions from 2018] (BachProp and Seq2SeqP4P).

==Figures==
===symMono===
====Explicit task: generate music given a prime====

[[File:2019_mono_cs.png|1000px]]

'''Figure 1.''' Precision, recall and F1 (cardinality score) in quarter note onsets from prediction start.

[[File:2019_mono_pitch.png|800px]]

'''Figure 2.''' Pitch overlap of the algorithmically generated continuations with the true continuation.

===symPoly===
====Explicit task: generate music given a prime====
[[File:2019_poly_cs.png|1000px]]

'''Figure 3.''' Precision, recall and F1 (cardinality score) in quarter note onsets from prediction start.

[[File:2019_poly_pitch.png|800px]]

'''Figure 4.''' Pitch overlap of the algorithmically generated continuations with the true continuation.

==Tables==
===symMono===
====Explicit task: generate music given a prime====
<table border="1" class="dataframe">
<tr>
<th></th>
<th colspan="3" halign="left">Modulo12Pitch</th>
</tr>
<tr>
<th></th>
<th>mean</th>
<th>median</th>
<th>std</th>
</tr>
<tr>
<th>Model</th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>BachProp</th>
<td>0.502</td>
<td>0.516</td>
<td>0.219</td>
</tr>
<tr>
<th>CopyForward</th>
<td>0.596</td>
<td>0.612</td>
<td>0.292</td>
</tr>
<tr>
<th>Markov</th>
<td>0.583</td>
<td>0.608</td>
<td>0.195</td>
</tr>
<tr>
<th>Seq2SeqP4P</th>
<td>0.087</td>
<td>0.000</td>
<td>0.121</td>
</tr>
</table>

'''Table 2.''' Pitch overlap of the algorithmic continuations with the true continuation - mean, median and standard deviation.

===symPoly===
====Explicit task: generate music given a prime====
<table border="1" class="dataframe">
<tr>
<th></th>
<th colspan="3" halign="left">Pitch</th>
</tr>
<tr>
<th></th>
<th>mean</th>
<th>median</th>
<th>std</th>
</tr>
<tr>
<th>Model</th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>BachProp</th>
<td>0.455</td>
<td>0.466</td>
<td>0.139</td>
</tr>
<tr>
<th>CopyForward</th>
<td>0.594</td>
<td>0.598</td>
<td>0.238</td>
</tr>
<tr>
<th>Markov</th>
<td>0.506</td>
<td>0.508</td>
<td>0.176</td>
</tr>
</table>

'''Table 3.''' Pitch overlap of the algorithmic continuations with the true continuation - mean, median and standard deviation.

===symMono/symPoly===
====Implicit task: discriminate true and foil continuation====
<table border="1" class="dataframe">
<tr style="text-align: right;">
<th></th>
<th></th>
<th>Observations</th>
<th>Accuracy</th>
<th>Mean Probability</th>
<th>Variance</th>
</tr>
<tr>
<th>model</th>
<th>data</th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>GenDetect</th>
<th>mono</th>
<td>500</td>
<td>1.000</td>
<td>1.000</td>
<td>0.000</td>
</tr>
<tr>
<th>BachProp</th>
<th>mono</th>
<td>499</td>
<td>0.844</td>
<td>0.498</td>
<td>0.006</td>
</tr>
<tr>
<th>GenDetect</th>
<th>poly</th>
<td>500</td>
<td>0.998</td>
<td>0.992</td>
<td>0.002</td>
</tr>
<tr>
<th>BachProp</th>
<th>poly</th>
<td>499</td>
<td>0.916</td>
<td>0.519</td>
<td>0.006</td>
</tr>
<tr>
<th>MLM(2)</th>
<th>poly</th>
<td>499</td>
<td>0.703</td>
<td>0.527</td>
<td>0.004</td>
</tr>
<tr>
<th>MLM(5)</th>
<th>poly</th>
<td>499</td>
<td>0.731</td>
<td>0.554</td>
<td>0.012</td>
</tr>
</table>

'''Table 4.''' Discrimination scores of the submitted algorithms.

== Discussion ==

CopyForward (a method based on geometric pattern discovery) significantly outperforms the baseline Markov model and BachProp (a method based on recurrent neural networks) on the explicit subtask. The abstract for CopyForward suggests this relatively high level of performance is due to successful modeling of dynamic expectancies – that is, it uses information from the prime exclusively (no training data) to predict the content of the continuation. BachProp, on the other hand, attempts to model schematic (or corpus-based) expectancies too, combining information from the prime with information from a training process. It remains to be seen whether BachProp (or deep-learning approaches in general) could improve upon CopyForward's continuations when the latter's dependence on modeling of dynamic expectancies leads unsuccessful predictions.

All algorithms submitted this year to the implicit subtask achieved significantly-above-chance performance. MLM employed an LSTM and GenDetect employed a gradient boosting classifier. GenDetect performed significantly better than previous and current submissions on this task, discriminating between all monophonic true and foil continuations correctly, and all but one polyphonic true and foil continuations. If we run this subtask again next year, we probably ought to identify a different, more sophisticated algorithm for generating the foils.

2019:Patterns for Prediction Results

2019-11-05T13:38:27Z

Tom Collins: /* Discussion */

== Introduction ==

'''In brief''':

We look for

(1) Algorithms that take an excerpt of music as input (the prime), and output a predicted continuation of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next N musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; MIREX Discovery of Repeated Themes & Sections task; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt (Meredith, 2013). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

== Contribution ==

...

For a more detailed introduction to the task, please see [[2019:Patterns for Prediction]].

== Datasets and Algorithms ==

The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it shares simiarlity with LMD and not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Submissions to the symMono and symPoly variants of the tasks are listed in Table 1. There were no submissions to the audMono or audPoly variants of the tasks this year. The task captains prepared a first-order Markov model (MM) over a state space of measure beat and key-centralized MIDI note number. This enabled evaluation of the implicit subtask, and can also serve as a point of comparison for the explicit task. It should be noted, however, that this model had access to the full song/piece – '''not just the prime''' – so it is at an advantage compared to EN1 and FC1 in the explicit task.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono - Task 1
!
!
|-
! TD1
| CopyForward || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/TD1.pdf PDF] || [https://mcgill.ca/music/timothy-de-reuse Timothy de Reuse]
|-
|- style="background: green;"
! Task Version
! symPoly - Task 1
!
!
|-
! TD1
| CopyForward || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/TD1.pdf PDF] || [https://mcgill.ca/music/timothy-de-reuse Timothy de Reuse]
|-
|- style="background: green;"
! Task Version
! symMono - Task 2
!
!
|-
! EP1
| GenDetect || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/EP1.pdf PDF] || [http://metacreation.net/members/jeff-ens/ Jeff Ens], [http://philippepasquier.com/publications Philippe Pasquier]
|-
|- style="background: green;"
! Task Version
! symPoly Task 2
!
!
|-
! EP1
| GenDetect || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/EP1.pdf PDF] || [http://metacreation.net/members/jeff-ens/ Jeff Ens], [http://philippepasquier.com/publications Philippe Pasquier]
|-
! YB2
| MLM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/YB2.pdf PDF] || [http://www.eecs.qmul.ac.uk/~ay304/ Adrien Ycart], [http://www.eecs.qmul.ac.uk/profiles/benetosemmanouil.html Emmanouil Benetos]
|-
! YB5
| MLM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/YB2.pdf PDF] || [http://www.eecs.qmul.ac.uk/~ay304/ Adrien Ycart], [http://www.eecs.qmul.ac.uk/profiles/benetosemmanouil.html Emmanouil Benetos]

|}

'''Table 1. Algorithms submitted to Patterns for Prediction 2019.'''

== Results ==

We measure the performance of an algorithm to 1) predict a continuation, given a prime (explicit task), and 2) decide which of two versions is the true or foil continuation, given a prime (implicit task). To evaluate performance at the explicit task, we compare the true continuation to the generated continuation, and measure how many pitch-onset pairs are correctly predicted at various time intervals after the last note of the prime. To evaluate performance at the implicit task, we measure accuracy as the number of correct decisions, divided by the total amount of decisions. (For mathematical definitions of the various metrics, please see [[2019:Patterns_for_Prediction#Evaluation_Procedure]].)

For reference purposes, we also include results of the [https://www.music-ir.org/mirex/wiki/2018:Patterns_for_Prediction_Results#Datasets_and_Algorithms submissions from 2018] (BachProp and Seq2SeqP4P).

==Figures==
===symMono===
====Explicit task: generate music given a prime====

[[File:2019_mono_cs.png|1000px]]

'''Figure 1.''' Precision, recall and F1 (cardinality score) in quarter note onsets from prediction start.

[[File:2019_mono_pitch.png|800px]]

'''Figure 2.''' Pitch overlap of the algorithmically generated continuations with the true continuation.

===symPoly===
====Explicit task: generate music given a prime====
[[File:2019_poly_cs.png|1000px]]

'''Figure 3.''' Precision, recall and F1 (cardinality score) in quarter note onsets from prediction start.

[[File:2019_poly_pitch.png|800px]]

'''Figure 4.''' Pitch overlap of the algorithmically generated continuations with the true continuation.

==Tables==
===symMono===
====Explicit task: generate music given a prime====
<table border="1" class="dataframe">
<tr>
<th></th>
<th colspan="3" halign="left">Modulo12Pitch</th>
</tr>
<tr>
<th></th>
<th>mean</th>
<th>median</th>
<th>std</th>
</tr>
<tr>
<th>Model</th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>BachProp</th>
<td>0.502</td>
<td>0.516</td>
<td>0.219</td>
</tr>
<tr>
<th>CopyForward</th>
<td>0.596</td>
<td>0.612</td>
<td>0.292</td>
</tr>
<tr>
<th>Markov</th>
<td>0.583</td>
<td>0.608</td>
<td>0.195</td>
</tr>
<tr>
<th>Seq2SeqP4P</th>
<td>0.087</td>
<td>0.000</td>
<td>0.121</td>
</tr>
</table>

'''Table 2.''' Pitch overlap of the algorithmic continuations with the true continuation - mean, median and standard deviation.

===symPoly===
====Explicit task: generate music given a prime====
<table border="1" class="dataframe">
<tr>
<th></th>
<th colspan="3" halign="left">Pitch</th>
</tr>
<tr>
<th></th>
<th>mean</th>
<th>median</th>
<th>std</th>
</tr>
<tr>
<th>Model</th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>BachProp</th>
<td>0.455</td>
<td>0.466</td>
<td>0.139</td>
</tr>
<tr>
<th>CopyForward</th>
<td>0.594</td>
<td>0.598</td>
<td>0.238</td>
</tr>
<tr>
<th>Markov</th>
<td>0.506</td>
<td>0.508</td>
<td>0.176</td>
</tr>
</table>

'''Table 3.''' Pitch overlap of the algorithmic continuations with the true continuation - mean, median and standard deviation.

===symMono/symPoly===
====Implicit task: discriminate true and foil continuation====
<table border="1" class="dataframe">
<tr style="text-align: right;">
<th></th>
<th></th>
<th>Observations</th>
<th>Accuracy</th>
<th>Mean Probability</th>
<th>Variance</th>
</tr>
<tr>
<th>model</th>
<th>data</th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>GenDetect</th>
<th>mono</th>
<td>500</td>
<td>1.000</td>
<td>1.000</td>
<td>0.000</td>
</tr>
<tr>
<th>BachProp</th>
<th>mono</th>
<td>499</td>
<td>0.844</td>
<td>0.498</td>
<td>0.006</td>
</tr>
<tr>
<th>GenDetect</th>
<th>poly</th>
<td>500</td>
<td>0.998</td>
<td>0.992</td>
<td>0.002</td>
</tr>
<tr>
<th>BachProp</th>
<th>poly</th>
<td>499</td>
<td>0.916</td>
<td>0.519</td>
<td>0.006</td>
</tr>
<tr>
<th>MLM(2)</th>
<th>poly</th>
<td>499</td>
<td>0.703</td>
<td>0.527</td>
<td>0.004</td>
</tr>
<tr>
<th>MLM(5)</th>
<th>poly</th>
<td>499</td>
<td>0.731</td>
<td>0.554</td>
<td>0.012</td>
</tr>
</table>

'''Table 4.''' Discrimination scores of the submitted algorithms.

== Discussion ==

CopyForward (a method based on geometric pattern discovery) significantly outperforms the baseline Markov model and BachProp (a method based on recurrent neural networks) on the explicit subtask. The abstract for CopyForward suggests this relatively high level of performance is due to successful modeling of dynamic expectancies – that is, it uses information from the prime exclusively (no training data) to predict the content of the continuation. BachProp, on the other hand, attempts to model schematic (or corpus-based) expectancies too, combining information from the prime with information from a training process. It remains to be seen whether BachProp (or deep-learning approaches in general) could improve upon CopyForward's continuations when its dependence on modeling of dynamic expectancies leads unsuccessful predictions.

All algorithms submitted this year to the implicit subtask achieved significantly-above-chance performance. MLM employed an LSTM and GenDetect employed a gradient boosting classifier. GenDetect performed significantly better than previous and current submissions on this task, discriminating between all monophonic true and foil continuations correctly, and all but one polyphonic true and foil continuations. If we run this subtask again next year, we probably ought to identify a different, more sophisticated algorithm for generating the foils.

2019:Patterns for Prediction Results

2019-11-05T13:37:12Z

Tom Collins: /* Discussion */

== Introduction ==

'''In brief''':

We look for

(1) Algorithms that take an excerpt of music as input (the prime), and output a predicted continuation of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next N musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; MIREX Discovery of Repeated Themes & Sections task; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt (Meredith, 2013). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

== Contribution ==

...

For a more detailed introduction to the task, please see [[2019:Patterns for Prediction]].

== Datasets and Algorithms ==

The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it shares simiarlity with LMD and not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Submissions to the symMono and symPoly variants of the tasks are listed in Table 1. There were no submissions to the audMono or audPoly variants of the tasks this year. The task captains prepared a first-order Markov model (MM) over a state space of measure beat and key-centralized MIDI note number. This enabled evaluation of the implicit subtask, and can also serve as a point of comparison for the explicit task. It should be noted, however, that this model had access to the full song/piece – '''not just the prime''' – so it is at an advantage compared to EN1 and FC1 in the explicit task.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono - Task 1
!
!
|-
! TD1
| CopyForward || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/TD1.pdf PDF] || [https://mcgill.ca/music/timothy-de-reuse Timothy de Reuse]
|-
|- style="background: green;"
! Task Version
! symPoly - Task 1
!
!
|-
! TD1
| CopyForward || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/TD1.pdf PDF] || [https://mcgill.ca/music/timothy-de-reuse Timothy de Reuse]
|-
|- style="background: green;"
! Task Version
! symMono - Task 2
!
!
|-
! EP1
| GenDetect || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/EP1.pdf PDF] || [http://metacreation.net/members/jeff-ens/ Jeff Ens], [http://philippepasquier.com/publications Philippe Pasquier]
|-
|- style="background: green;"
! Task Version
! symPoly Task 2
!
!
|-
! EP1
| GenDetect || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/EP1.pdf PDF] || [http://metacreation.net/members/jeff-ens/ Jeff Ens], [http://philippepasquier.com/publications Philippe Pasquier]
|-
! YB2
| MLM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/YB2.pdf PDF] || [http://www.eecs.qmul.ac.uk/~ay304/ Adrien Ycart], [http://www.eecs.qmul.ac.uk/profiles/benetosemmanouil.html Emmanouil Benetos]
|-
! YB5
| MLM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/YB2.pdf PDF] || [http://www.eecs.qmul.ac.uk/~ay304/ Adrien Ycart], [http://www.eecs.qmul.ac.uk/profiles/benetosemmanouil.html Emmanouil Benetos]

|}

'''Table 1. Algorithms submitted to Patterns for Prediction 2019.'''

== Results ==

We measure the performance of an algorithm to 1) predict a continuation, given a prime (explicit task), and 2) decide which of two versions is the true or foil continuation, given a prime (implicit task). To evaluate performance at the explicit task, we compare the true continuation to the generated continuation, and measure how many pitch-onset pairs are correctly predicted at various time intervals after the last note of the prime. To evaluate performance at the implicit task, we measure accuracy as the number of correct decisions, divided by the total amount of decisions. (For mathematical definitions of the various metrics, please see [[2019:Patterns_for_Prediction#Evaluation_Procedure]].)

For reference purposes, we also include results of the [https://www.music-ir.org/mirex/wiki/2018:Patterns_for_Prediction_Results#Datasets_and_Algorithms submissions from 2018] (BachProp and Seq2SeqP4P).

==Figures==
===symMono===
====Explicit task: generate music given a prime====

[[File:2019_mono_cs.png|1000px]]

'''Figure 1.''' Precision, recall and F1 (cardinality score) in quarter note onsets from prediction start.

[[File:2019_mono_pitch.png|800px]]

'''Figure 2.''' Pitch overlap of the algorithmically generated continuations with the true continuation.

===symPoly===
====Explicit task: generate music given a prime====
[[File:2019_poly_cs.png|1000px]]

'''Figure 3.''' Precision, recall and F1 (cardinality score) in quarter note onsets from prediction start.

[[File:2019_poly_pitch.png|800px]]

'''Figure 4.''' Pitch overlap of the algorithmically generated continuations with the true continuation.

==Tables==
===symMono===
====Explicit task: generate music given a prime====
<table border="1" class="dataframe">
<tr>
<th></th>
<th colspan="3" halign="left">Modulo12Pitch</th>
</tr>
<tr>
<th></th>
<th>mean</th>
<th>median</th>
<th>std</th>
</tr>
<tr>
<th>Model</th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>BachProp</th>
<td>0.502</td>
<td>0.516</td>
<td>0.219</td>
</tr>
<tr>
<th>CopyForward</th>
<td>0.596</td>
<td>0.612</td>
<td>0.292</td>
</tr>
<tr>
<th>Markov</th>
<td>0.583</td>
<td>0.608</td>
<td>0.195</td>
</tr>
<tr>
<th>Seq2SeqP4P</th>
<td>0.087</td>
<td>0.000</td>
<td>0.121</td>
</tr>
</table>

'''Table 2.''' Pitch overlap of the algorithmic continuations with the true continuation - mean, median and standard deviation.

===symPoly===
====Explicit task: generate music given a prime====
<table border="1" class="dataframe">
<tr>
<th></th>
<th colspan="3" halign="left">Pitch</th>
</tr>
<tr>
<th></th>
<th>mean</th>
<th>median</th>
<th>std</th>
</tr>
<tr>
<th>Model</th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>BachProp</th>
<td>0.455</td>
<td>0.466</td>
<td>0.139</td>
</tr>
<tr>
<th>CopyForward</th>
<td>0.594</td>
<td>0.598</td>
<td>0.238</td>
</tr>
<tr>
<th>Markov</th>
<td>0.506</td>
<td>0.508</td>
<td>0.176</td>
</tr>
</table>

'''Table 3.''' Pitch overlap of the algorithmic continuations with the true continuation - mean, median and standard deviation.

===symMono/symPoly===
====Implicit task: discriminate true and foil continuation====
<table border="1" class="dataframe">
<tr style="text-align: right;">
<th></th>
<th></th>
<th>Observations</th>
<th>Accuracy</th>
<th>Mean Probability</th>
<th>Variance</th>
</tr>
<tr>
<th>model</th>
<th>data</th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>GenDetect</th>
<th>mono</th>
<td>500</td>
<td>1.000</td>
<td>1.000</td>
<td>0.000</td>
</tr>
<tr>
<th>BachProp</th>
<th>mono</th>
<td>499</td>
<td>0.844</td>
<td>0.498</td>
<td>0.006</td>
</tr>
<tr>
<th>GenDetect</th>
<th>poly</th>
<td>500</td>
<td>0.998</td>
<td>0.992</td>
<td>0.002</td>
</tr>
<tr>
<th>BachProp</th>
<th>poly</th>
<td>499</td>
<td>0.916</td>
<td>0.519</td>
<td>0.006</td>
</tr>
<tr>
<th>MLM(2)</th>
<th>poly</th>
<td>499</td>
<td>0.703</td>
<td>0.527</td>
<td>0.004</td>
</tr>
<tr>
<th>MLM(5)</th>
<th>poly</th>
<td>499</td>
<td>0.731</td>
<td>0.554</td>
<td>0.012</td>
</tr>
</table>

'''Table 4.''' Discrimination scores of the submitted algorithms.

== Discussion ==

CopyForward (a method based on geometric pattern discovery) significantly outperforms the baseline Markov model and BachProp (a method based on recurrent neural networks) on the explicit subtask. Reading the abstract for CopyForward, it seems this relatively high level of performance is due to successful modeling of dynamic expectancies – that is, it uses information from the prime exclusively (not any training data) to predict the content of the continuation. BachProp, on the other hand, attempts to model schematic (or corpus-based) expectancies too, combining information from the prime with information from a training process. It remains to be seen whether BachProp (or deep-learning approaches in general) could improve upon CopyForward's continuations when its dependence on modeling of dynamic expectancies leads unsuccessful predictions.

All algorithms submitted this year to the implicit subtask achieved significantly-above-chance performance. MLM employed an LSTM and GenDetect employed a gradient boosting classifier. GenDetect performed significantly better than previous and current submissions on this task, discriminating between all monophonic true and foil continuations correctly, and all but one polyphonic true and foil continuations. If we run this subtask again next year, we probably ought to identify a different, more sophisticated algorithm for generating the foils.

2019:Patterns for Prediction Results

2019-11-05T13:13:36Z

Tom Collins:

== Introduction ==

'''In brief''':

We look for

(1) Algorithms that take an excerpt of music as input (the prime), and output a predicted continuation of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next N musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; MIREX Discovery of Repeated Themes & Sections task; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt (Meredith, 2013). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

== Contribution ==

...

For a more detailed introduction to the task, please see [[2019:Patterns for Prediction]].

== Datasets and Algorithms ==

The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it shares simiarlity with LMD and not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Submissions to the symMono and symPoly variants of the tasks are listed in Table 1. There were no submissions to the audMono or audPoly variants of the tasks this year. The task captains prepared a first-order Markov model (MM) over a state space of measure beat and key-centralized MIDI note number. This enabled evaluation of the implicit subtask, and can also serve as a point of comparison for the explicit task. It should be noted, however, that this model had access to the full song/piece – '''not just the prime''' – so it is at an advantage compared to EN1 and FC1 in the explicit task.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono - Task 1
!
!
|-
! TD1
| CopyForward || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/TD1.pdf PDF] || [https://mcgill.ca/music/timothy-de-reuse Timothy de Reuse]
|-
|- style="background: green;"
! Task Version
! symPoly - Task 1
!
!
|-
! TD1
| CopyForward || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/TD1.pdf PDF] || [https://mcgill.ca/music/timothy-de-reuse Timothy de Reuse]
|-
|- style="background: green;"
! Task Version
! symMono - Task 2
!
!
|-
! EP1
| GenDetect || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/EP1.pdf PDF] || [http://metacreation.net/members/jeff-ens/ Jeff Ens], [http://philippepasquier.com/publications Philippe Pasquier]
|-
|- style="background: green;"
! Task Version
! symPoly Task 2
!
!
|-
! EP1
| GenDetect || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/EP1.pdf PDF] || [http://metacreation.net/members/jeff-ens/ Jeff Ens], [http://philippepasquier.com/publications Philippe Pasquier]
|-
! YB2
| MLM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/YB2.pdf PDF] || [http://www.eecs.qmul.ac.uk/~ay304/ Adrien Ycart], [http://www.eecs.qmul.ac.uk/profiles/benetosemmanouil.html Emmanouil Benetos]
|-
! YB5
| MLM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2019/YB2.pdf PDF] || [http://www.eecs.qmul.ac.uk/~ay304/ Adrien Ycart], [http://www.eecs.qmul.ac.uk/profiles/benetosemmanouil.html Emmanouil Benetos]

|}

'''Table 1. Algorithms submitted to Patterns for Prediction 2019.'''

== Results ==

We measure the performance of an algorithm to 1) predict a continuation, given a prime (explicit task), and 2) decide which of two versions is the true or foil continuation, given a prime (implicit task). To evaluate performance at the explicit task, we compare the true continuation to the generated continuation, and measure how many pitch-onset pairs are correctly predicted at various time intervals after the last note of the prime. To evaluate performance at the implicit task, we measure accuracy as the number of correct decisions, divided by the total amount of decisions. (For mathematical definitions of the various metrics, please see [[2019:Patterns_for_Prediction#Evaluation_Procedure]].)

For reference purposes, we also include results of the [https://www.music-ir.org/mirex/wiki/2018:Patterns_for_Prediction_Results#Datasets_and_Algorithms submissions from 2018] (BachProp and Seq2SeqP4P).

==Figures==
===symMono===
====Explicit task: generate music given a prime====

[[File:2019_mono_cs.png|1000px]]

'''Figure 1.''' Precision, recall and F1 (cardinality score) in quarter note onsets from prediction start.

[[File:2019_mono_pitch.png|800px]]

'''Figure 2.''' Pitch overlap of the algorithmically generated continuations with the true continuation.

===symPoly===
====Explicit task: generate music given a prime====
[[File:2019_poly_cs.png|1000px]]

'''Figure 3.''' Precision, recall and F1 (cardinality score) in quarter note onsets from prediction start.

[[File:2019_poly_pitch.png|800px]]

'''Figure 4.''' Pitch overlap of the algorithmically generated continuations with the true continuation.

==Tables==
===symMono===
====Explicit task: generate music given a prime====
<table border="1" class="dataframe">
<tr>
<th></th>
<th colspan="3" halign="left">Modulo12Pitch</th>
</tr>
<tr>
<th></th>
<th>mean</th>
<th>median</th>
<th>std</th>
</tr>
<tr>
<th>Model</th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>BachProp</th>
<td>0.502</td>
<td>0.516</td>
<td>0.219</td>
</tr>
<tr>
<th>CopyForward</th>
<td>0.596</td>
<td>0.612</td>
<td>0.292</td>
</tr>
<tr>
<th>Markov</th>
<td>0.583</td>
<td>0.608</td>
<td>0.195</td>
</tr>
<tr>
<th>Seq2SeqP4P</th>
<td>0.087</td>
<td>0.000</td>
<td>0.121</td>
</tr>
</table>

'''Table 2.''' Pitch overlap of the algorithmic continuations with the true continuation - mean, median and standard deviation.

===symPoly===
====Explicit task: generate music given a prime====
<table border="1" class="dataframe">
<tr>
<th></th>
<th colspan="3" halign="left">Pitch</th>
</tr>
<tr>
<th></th>
<th>mean</th>
<th>median</th>
<th>std</th>
</tr>
<tr>
<th>Model</th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>BachProp</th>
<td>0.455</td>
<td>0.466</td>
<td>0.139</td>
</tr>
<tr>
<th>CopyForward</th>
<td>0.594</td>
<td>0.598</td>
<td>0.238</td>
</tr>
<tr>
<th>Markov</th>
<td>0.506</td>
<td>0.508</td>
<td>0.176</td>
</tr>
</table>

'''Table 3.''' Pitch overlap of the algorithmic continuations with the true continuation - mean, median and standard deviation.

===symMono/symPoly===
====Implicit task: discriminate true and foil continuation====
<table border="1" class="dataframe">
<tr style="text-align: right;">
<th></th>
<th></th>
<th>Observations</th>
<th>Accuracy</th>
<th>Mean Probability</th>
<th>Variance</th>
</tr>
<tr>
<th>model</th>
<th>data</th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
<tr>
<th>GenDetect</th>
<th>mono</th>
<td>500</td>
<td>1.000</td>
<td>1.000</td>
<td>0.000</td>
</tr>
<tr>
<th>BachProp</th>
<th>mono</th>
<td>499</td>
<td>0.844</td>
<td>0.498</td>
<td>0.006</td>
</tr>
<tr>
<th>GenDetect</th>
<th>poly</th>
<td>500</td>
<td>0.998</td>
<td>0.992</td>
<td>0.002</td>
</tr>
<tr>
<th>BachProp</th>
<th>poly</th>
<td>499</td>
<td>0.916</td>
<td>0.519</td>
<td>0.006</td>
</tr>
<tr>
<th>MLM(2)</th>
<th>poly</th>
<td>499</td>
<td>0.703</td>
<td>0.527</td>
<td>0.004</td>
</tr>
<tr>
<th>MLM(5)</th>
<th>poly</th>
<td>499</td>
<td>0.731</td>
<td>0.554</td>
<td>0.012</td>
</tr>
</table>

'''Table 4.''' Discrimination scores of the submitted algorithms.

== Discussion ==

CopyForward (a method based on geometric pattern discovery) significantly outperforms the baseline Markov model and BachProp (a method based on recurrent neural networks) on the explicit subtask. Reading the abstract for CopyForward, it seems this relatively high level of performance is due to successful modeling of dynamic expectancies – that is, it uses information from the prime exclusively (not any training data) to predict the content of the continuation. BachProp, on the other hand, attempts to model schematic (or corpus-based) expectancies too, combining information from the prime with information from a training process. It remains to be seen whether BachProp (or deep-learning approaches in general) could improve upon CopyForward's continuations when its dependence on modeling of dynamic expectancies leads unsuccessful predictions.

2019:Patterns for Prediction Results

2019-11-05T09:14:43Z

Tom Collins:

2018:Patterns for Prediction Results

2019-11-05T09:13:02Z

Tom Collins:

== Introduction ==

'''In brief''':

We look for

(1) Algorithms that take an excerpt of music as input (the prime), and output a predicted continuation of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next N musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; MIREX Discovery of Repeated Themes & Sections task; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt (Meredith, 2013). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

== Contribution ==

...

For a more detailed introduction to the task, please see [[2018:Patterns for Prediction]].

== Datasets and Algorithms ==

The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it shares simiarlity with LMD and not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Submissions to the symMono and symPoly variants of the tasks are listed in Table 1. There were no submissions to the audMono or audPoly variants of the tasks this year. The task captains prepared a first-order Markov model (MM) over a state space of measure beat and key-centralized MIDI note number. This enabled evaluation of the implicit subtask, and can also serve as a point of comparison for the explicit task. It should be noted, however, that this model had access to the full song/piece – '''not just the prime''' – so it is at an advantage compared to EN1 and FC1 in the explicit task.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! EN1
| Seq2SeqP4P || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/EN1.pdf PDF] || [http://ericpnichols.com/ Eric Nichols]
|-
! FC1
| BachProp || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM1
| First-order Markov model || style="text-align: center;" | Task captains || For purposes of comparison
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! FC1
| BachProp || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM1
| First-order Markov model || style="text-align: center;" | Task captains || For purposes of comparison
|-
|}

'''Table 1. Algorithms submitted to Patterns for Prediction 2018. Seg2SegPvP and BachProp are models based on LSTM networks.'''

== Results ==

We measure the performance of an algorithm to 1) predict a continuation, given a prime (explicit task), and 2) decide which of two versions is the true or foil continuation, given a prime (implicit task). To evaluate performance at the explicit task, we compare the true continuation to the generated continuation, and measure how many pitches and inter-onset intervals (with relation to the last onset of the prime) are correctly predicted at various time intervals after the last note of the prime. To evaluate performance at the implicit task, we measure accuracy as the number of correct decisions, divided by the total amount of decisions. (For mathematical definitions of the various metrics, please see [[2018:Patterns_for_Prediction#Evaluation_Procedure]].)

===SymMono===
For the implicit task, the two LSTM based models make their best predictions close to the cut-off point (i.e., last event of the prime). As the onset time after the cut-off point increases, the Markov Model outperforms the LSTM based models, with the exception of recall of inter-onset interval (cf. Figure 4), where FC1 consistently performs better than the Markov Model. Possibly in consequence of the poorer pitch performance, also fewer relevant pitch-ioi pairs were detected by the LSTM models as onset time after cutoff point increases.

For the explicit task, only FC1 was submitted. It outperforms chance level significantly, at 0.87, i.e., picking the correct continuation in almost 90% of the cases (See Table 1).

===SymPoly===
Only one LSTM model was submitted to SymPoly (FC1), with results comparable to SymMono (see Figures 10-18, Table 1).

==Figures==
===symMono===
====Explicit task: generate music given a prime====

[[File:2018_mono_R_pitch.png|600px]]

'''Figure 1.''' Recall of generated pitches after cutoff point.

[[File:2018_mono_P_pitch.png|600px]]

'''Figure 2.''' Precision of generated pitches after cutoff point.

[[File:2018_mono_F1_pitch.png|600px]]

'''Figure 3.''' F1 measure of generated pitches after cutoff point.

[[File:2018_mono_R_ioi.png|600px]]

'''Figure 4.''' Recall of generated inter-onset intervals after cutoff point.

[[File:2018_mono_P_ioi.png|600px]]

'''Figure 5.''' Precision of generated inter-onset intervals after cutoff point.

[[File:2018_mono_F1_ioi.png|600px]]

'''Figure 6.''' F1 measure of generated inter-onset intervals after cutoff point.

[[File:2018_mono_R_pairs.png|600px]]

'''Figure 7.''' Recall of generated pitch-ioi pairs after cutoff point.

[[File:2018_mono_P_pairs.png|600px]]

'''Figure 8.''' Precision of generated pitch-ioi pairs after cutoff point.

[[File:2018_mono_F1_pitch.png|600px]]

'''Figure 9.''' F1 measure of generated pitch-ioi pairs after cutoff point.

===Explicit Task: Polyphonic===

[[File:2018_poly_R_pitch.png|600px]]

'''Figure 10.''' Recall of generated pitches after cutoff point.

[[File:2018_poly_P_pitch.png|600px]]

'''Figure 11.''' Precision of generated pitches after cutoff point.

[[File:2018_poly_F1_pitch.png|600px]]

'''Figure 12.''' F1 measure of generated pitches after cutoff point.

[[File:2018_poly_R_ioi.png|600px]]

'''Figure 13.''' Recall of generated inter-onset intervals after cutoff point.

[[File:2018_poly_P_ioi.png|600px]]

'''Figure 14.''' Precision of generated inter-onset intervals after cutoff point.

[[File:2018_poly_F1_ioi.png|600px]]

'''Figure 15.''' F1 measure of generated inter-onset intervals after cutoff point.

[[File:2018_poly_R_pairs.png|600px]]

'''Figure 16.''' Recall of generated pitch-ioi pairs after cutoff point.

[[File:2018_poly_P_pairs.png|600px]]

'''Figure 17.''' Precision of generated pitch-ioi pairs after cutoff point.

[[File:2018_poly_F1_pairs.png|600px]]

'''Figure 18.''' F1 measure of generated pitch-ioi pairs after cutoff point.

==Tables==
====Implicit task: decide which of two continuations is the true one, given the prime====
{| border="1" cellspacing="0" style="text-align: left; width: 280px;"
|-
! width="120" | Algorithm
! width="80" | Monophonic
! width="80" | Polyphonic
|-
|-
! FC1
| 0.87 || 0.92
|-
|-
! Sig. > chance
| 0.54 || 0.54
|-
|}

2018:Patterns for Prediction Results

2019-11-05T09:11:20Z

Tom Collins: /* Implicit Task: Polyphonic */

== Introduction ==

'''In brief''':

We look for

(1) Algorithms that take an excerpt of music as input (the prime), and output a predicted continuation of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next N musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; MIREX Discovery of Repeated Themes & Sections task; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt (Meredith, 2013). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

== Contribution ==

...

For a more detailed introduction to the task, please see [[2018:Patterns for Prediction]].

== Datasets and Algorithms ==

The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it shares simiarlity with LMD and not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Submissions to the symMono and symPoly variants of the tasks are listed in Table 1. There were no submissions to the audMono or audPoly variants of the tasks this year. The task captains prepared a first-order Markov model (MM) over a state space of measure beat and key-centralized MIDI note number. This enabled evaluation of the implicit subtask, and can also serve as a point of comparison for the explicit task. It should be noted, however, that this model had access to the full song/piece – '''not just the prime''' – so it is at an advantage compared to EN1 and FC1 in the explicit task.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! EN1
| Seq2SeqP4P || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/EN1.pdf PDF] || [http://ericpnichols.com/ Eric Nichols]
|-
! FC1
| BachProp || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM1
| First-order Markov model || style="text-align: center;" | Task captains || For purposes of comparison
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! FC1
| BachProp || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM1
| First-order Markov model || style="text-align: center;" | Task captains || For purposes of comparison
|-
|}

'''Table 1. Algorithms submitted to Patterns for Prediction 2018. Seg2SegPvP and BachProp are models based on LSTM networks.'''

== Results ==

We measure the performance of an algorithm to 1) predict a continuation, given a prime (explicit task), and 2) decide which of two versions is the true or foil continuation, given a prime (implicit task). To evaluate performance at the explicit task, we compare the true continuation to the generated continuation, and measure how many pitches and inter-onset intervals (with relation to the last onset of the prime) are correctly predicted at various time intervals after the last note of the prime. To evaluate performance at the implicit task, we measure accuracy as the number of correct decisions, divided by the total amount of decisions. (For mathematical definitions of the various metrics, please see [[2018:Patterns_for_Prediction#Evaluation_Procedure]].)

===SymMono===
For the implicit task, the two LSTM based models make their best predictions close to the cut-off point (i.e., last event of the prime). As the onset time after the cut-off point increases, the Markov Model outperforms the LSTM based models, with the exception of recall of inter-onset interval (cf. Figure 4), where FC1 consistently performs better than the Markov Model. Possibly in consequence of the poorer pitch performance, also fewer relevant pitch-ioi pairs were detected by the LSTM models as onset time after cutoff point increases.

For the explicit task, only FC1 was submitted. It outperforms chance level significantly, at 0.87, i.e., picking the correct continuation in almost 90% of the cases (See Table 1).

===SymPoly===
Only one LSTM model was submitted to SymPoly (FC1), with results comparable to SymMono (see Figures 10-18, Table 1).

==Discussion==

...

Berit Janssen, Iris Ren, Tom Collins.

==Figures==
===symMono===
====Explicit task: generate music given a prime====

[[File:2018_mono_R_pitch.png|600px]]

'''Figure 1.''' Recall of generated pitches after cutoff point.

[[File:2018_mono_P_pitch.png|600px]]

'''Figure 2.''' Precision of generated pitches after cutoff point.

[[File:2018_mono_F1_pitch.png|600px]]

'''Figure 3.''' F1 measure of generated pitches after cutoff point.

[[File:2018_mono_R_ioi.png|600px]]

'''Figure 4.''' Recall of generated inter-onset intervals after cutoff point.

[[File:2018_mono_P_ioi.png|600px]]

'''Figure 5.''' Precision of generated inter-onset intervals after cutoff point.

[[File:2018_mono_F1_ioi.png|600px]]

'''Figure 6.''' F1 measure of generated inter-onset intervals after cutoff point.

[[File:2018_mono_R_pairs.png|600px]]

'''Figure 7.''' Recall of generated pitch-ioi pairs after cutoff point.

[[File:2018_mono_P_pairs.png|600px]]

'''Figure 8.''' Precision of generated pitch-ioi pairs after cutoff point.

[[File:2018_mono_F1_pitch.png|600px]]

'''Figure 9.''' F1 measure of generated pitch-ioi pairs after cutoff point.

===Explicit Task: Polyphonic===

[[File:2018_poly_R_pitch.png|600px]]

'''Figure 10.''' Recall of generated pitches after cutoff point.

[[File:2018_poly_P_pitch.png|600px]]

'''Figure 11.''' Precision of generated pitches after cutoff point.

[[File:2018_poly_F1_pitch.png|600px]]

'''Figure 12.''' F1 measure of generated pitches after cutoff point.

[[File:2018_poly_R_ioi.png|600px]]

'''Figure 13.''' Recall of generated inter-onset intervals after cutoff point.

[[File:2018_poly_P_ioi.png|600px]]

'''Figure 14.''' Precision of generated inter-onset intervals after cutoff point.

[[File:2018_poly_F1_ioi.png|600px]]

'''Figure 15.''' F1 measure of generated inter-onset intervals after cutoff point.

[[File:2018_poly_R_pairs.png|600px]]

'''Figure 16.''' Recall of generated pitch-ioi pairs after cutoff point.

[[File:2018_poly_P_pairs.png|600px]]

'''Figure 17.''' Precision of generated pitch-ioi pairs after cutoff point.

[[File:2018_poly_F1_pairs.png|600px]]

'''Figure 18.''' F1 measure of generated pitch-ioi pairs after cutoff point.

==Tables==
====Implicit task: decide which of two continuations is the true one, given the prime====
{| border="1" cellspacing="0" style="text-align: left; width: 280px;"
|-
! width="120" | Algorithm
! width="80" | Monophonic
! width="80" | Polyphonic
|-
|-
! FC1
| 0.87 || 0.92
|-
|-
! Sig. > chance
| 0.54 || 0.54
|-
|}

2019:Patterns for Prediction

2019-08-01T05:30:40Z

Tom Collins: /* Evaluation Procedure */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are [http://beritjanssen.com/ Berit Janssen] (berit.janssen), [https://sites.google.com/view/iyr/home Iris YuPing Ren] (yuping.ren.iris all at gmail.com), [https://jamesowers.github.io/ James Owers] (james.f.owers), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins, all at gmail.com). Please copy in all four of us if you have questions/comments.

The '''submission deadline''' is Monday September 9th, 2019 (any time as long as it's Sep 9th somewhere on Earth!).

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Sep2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Sep2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

(These examples are from an earlier version of the dataset, PPDD-Jul2018, but the above observations apply also to the current version of the dataset.)

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "A" or "B" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "A" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder A, and column "B" similarly for the continuation in folder B.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*input representation(s), should be 1 or more of "audio, monophonic"; "audio, polyphonic"; "symbolic, monophonic"; "symbolic, polyphonic";
*subtasks you would like your algorithm to be evaluated on, should be "1", "2", or "1 and 2" (see first sentences of [[2018:Patterns_for_Prediction#Description]] for a reminder);
*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/tree/mirex2019 GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is pitch, represented by MNN.

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1)</math>, <math>(\hat{x}_2, \hat{y}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, and <math>\hat{y}_i</math> are predicted MNNs. The true continuations are notated <math>(x_1, y_1), (x_2, y_2),..., (x_n, y_n)</math>. The predicted continuation ontimes are strictly non-decreasing, that is <math>x_0 \leq \hat{x}_1 \leq \cdots \leq \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 \leq x_1 \leq \cdots \leq x_n</math>.

===Subtask 1===
We represent each note in the true and algorithmic continuation as a point in a two-dimensional space of onset and pitch, giving the point-set <math>\mathbf{P} = \{ (x_1, y_1), (x_2, y_2),..., (x_n, y_n) \}</math> for the true continuation, and <math>\mathbf{Q} = \{ (\hat{x}_1, \hat{y}_1), (\hat{x}_2, \hat{y}_2),..., (\hat{x}_{n^\prime}, \hat{y}_{n^\prime}) \}</math> for the algorithmic continuation. We calculate differences between all points <math>p = (x_i, y_i)</math> in <math>\mathbf{P}</math> and <math>q = (\hat{x}_j, \hat{y}_j)</math> in <math>\mathbf{Q}</math>, which represent the translation vectors <math>\mathbf{T}</math> that transform a given algorithmically generated note into a note from the true continuation:

<math>\mathbf{T} = \left\{p - q \; \forall \; p \in \mathbf{P},\, q \in \mathbf{Q}\right\}</math>

From this, we find the <math>t \in \mathbf{T}</math> which maximises the cardinality of the set of notes which now overlap under this translation. The maximum cardinality is the Cardinality Score (CS):

<math>
\text{cs}(\mathbf{P},\mathbf{Q}) = \max_{t \in \mathbf{T}} \left|\left\{q \; \forall \; q \in \mathbf{Q} \;\, | \;\, (q + t) \in \mathbf{P}\right\}\right|
</math>

We define recall as the number of correctly predicted notes, divided by the cardinality of the true continuation point set <math>\mathbf{P}</math>. Since there exists at least one point in <math>\mathbf{Q}</math> which can be translated by any vector to a point in <math>\mathbf{P}</math>, we subtract <math>1</math> from numerator and denominator to scale to <math>[0,1]</math>.

<math>
\text{Rec} = (\text{cs}(\mathbf{P},\mathbf{Q}) - 1) / (|\mathbf{P}| - 1)
</math>

Precision is the number of correctly predicted notes, divided by the cardinality of the point set of the algorithmic continuation <math>\mathbf{Q}</math>, scaled in the same way:

<math>
\text{Prec} = (\text{cs}(\mathbf{P},\mathbf{Q}) - 1) / (|\mathbf{Q}| - 1)
</math>

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Morgan, E., Fogel, A., Nair, A., & Patel, A. D. (2019). Statistical learning and Gestalt-like principles predict melodic expectations. ''Cognition, 189'', 23-34.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2019:Patterns for Prediction

2019-07-26T02:28:35Z

Tom Collins: /* References */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are [http://beritjanssen.com/ Berit Janssen] (berit.janssen), [https://sites.google.com/view/iyr/home Iris YuPing Ren] (yuping.ren.iris all at gmail.com), [https://jamesowers.github.io/ James Owers] (james.f.owers), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins, all at gmail.com). Please copy in all four of us if you have questions/comments.

The '''submission deadline''' is Monday September 9th, 2019 (any time as long as it's Sep 9th somewhere on Earth!).

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Sep2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Sep2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

(These examples are from an earlier version of the dataset, PPDD-Jul2018, but the above observations apply also to the current version of the dataset.)

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "A" or "B" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "A" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder A, and column "B" similarly for the continuation in folder B.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*input representation(s), should be 1 or more of "audio, monophonic"; "audio, polyphonic"; "symbolic, monophonic"; "symbolic, polyphonic";
*subtasks you would like your algorithm to be evaluated on, should be "1", "2", or "1 and 2" (see first sentences of [[2018:Patterns_for_Prediction#Description]] for a reminder);
*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/tree/mirex2019 GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is pitch, represented by MNN.

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1)</math>, <math>(\hat{x}_2, \hat{y}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, and <math>\hat{y}_i</math> are predicted MNNs. The true continuations are notated <math>(x_1, y_1), (x_2, y_2),..., (x_n, y_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===Subtask 1===
We represent each note in the true and algorithmic continuation as a point in a two-dimensional space of onset and pitch, giving the point-set <math>\mathbf{P}</math> for the true continuation, and <math>\mathbf{Q}</math> for the algorithmic continuation. We calculate differences between all points <math>p_i</math> in <math>\mathbf{P}</math> and <math>q_j</math> in <math>\mathbf{Q}</math>, which represent the translation vectors <math>\mathbf{T}</math> to transform a given algorithmically generated note into a note from the true continuation:

<math>
\text{cp}(\mathbf{P},\mathbf{Q}) = \max_\mathbf{T} |\{q_j | q_j \in \mathbf{Q} \wedge q_j + \mathbf{T} \in \mathbf{P}\}|
</math>

We define recall as the number of correctly predicted notes, divided by the cardinality of the true continuation point set <math>\mathbf{P}</math>. Since there exists at least one point in <math>\mathbf{Q}</math> which can be translated by any vector to a point in <math>\mathbf{P}</math>, we subtract <math>1</math> from numerator and denominator to scale to <math>[0,1]</math>.

<math>
\text{Rec} = (\text{cp}(\mathbf{P},\mathbf{Q}) - 1) / (|\mathbf{P}| - 1)
</math>

Precision is the number of correctly predicted notes, divided by the cardinality of the point set of the algorithmic continuation <math>\mathbf{Q}</math>, scaled in the same way:

<math>
\text{Prec} = (\text{cp}(\mathbf{P},\mathbf{Q}) - 1) / (|\mathbf{Q}| - 1)
</math>

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Morgan, E., Fogel, A., Nair, A., & Patel, A. D. (2019). Statistical learning and Gestalt-like principles predict melodic expectations. ''Cognition, 189'', 23-34.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2019:Patterns for Prediction

2019-07-12T20:07:38Z

Tom Collins: /* Description */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are [http://beritjanssen.com/ Berit Janssen] (berit.janssen), [https://sites.google.com/view/iyr/home Iris YuPing Ren] (yuping.ren.iris all at gmail.com), [https://jamesowers.github.io/ James Owers] (james.f.owers), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins, all at gmail.com). Please copy in all four of us if you have questions/comments.

The '''submission deadline''' is Monday September 9th, 2019 (any time as long as it's Sep 9th somewhere on Earth!).

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Sep2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Sep2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

(These examples are from an earlier version of the dataset, PPDD-Jul2018, but the above observations apply also to the current version of the dataset.)

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "A" or "B" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "A" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder A, and column "B" similarly for the continuation in folder B.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*input representation(s), should be 1 or more of "audio, monophonic"; "audio, polyphonic"; "symbolic, monophonic"; "symbolic, polyphonic";
*subtasks you would like your algorithm to be evaluated on, should be "1", "2", or "1 and 2" (see first sentences of [[2018:Patterns_for_Prediction#Description]] for a reminder);
*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/tree/mirex2019 GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is pitch, represented by MNN.

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1)</math>, <math>(\hat{x}_2, \hat{y}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, and <math>\hat{y}_i</math> are predicted MNNs. The true continuations are notated <math>(x_1, y_1), (x_2, y_2),..., (x_n, y_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===Subtask 1===
We represent each note in the true and algorithmic continuation as a point in a two-dimensional space of onset and pitch, giving the point-set <math>\mathbf{P}</math> for the true continuation, and <math>\mathbf{Q}</math> for the algorithmic continuation. We calculate differences between all points <math>p_i</math> in <math>\mathbf{P}</math> and <math>q_j</math> in <math>\mathbf{Q}</math>, which represent the translation vectors <math>\mathbf{T}</math> to transform a given algorithmically generated note into a note from the true continuation:

<math>
\text{cp}(\mathbf{P},\mathbf{Q}) = \max_\mathbf{T} |\{q_j | q_j \in \mathbf{Q} \wedge q_j + \mathbf{T} \in \mathbf{P}\}|
</math>

We define recall as the number of correctly predicted notes, divided by the cardinality of the true continuation point set <math>\mathbf{P}</math>. Since there exists at least one point in <math>\mathbf{Q}</math> which can be translated by any vector to a point in <math>\mathbf{P}</math>, we subtract <math>1</math> from numerator and denominator to scale to <math>[0,1]</math>.

<math>
\text{Rec} = (\text{cp}(\mathbf{P},\mathbf{Q}) - 1) / (|\mathbf{P}| - 1)
</math>

Precision is the number of correctly predicted notes, divided by the cardinality of the point set of the algorithmic continuation <math>\mathbf{Q}</math>, scaled in the same way:

<math>
\text{Prec} = (\text{cp}(\mathbf{P},\mathbf{Q}) - 1) / (|\mathbf{Q}| - 1)
</math>

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

MIREX HOME

2019-07-12T19:57:29Z

Tom Collins: /* MIREX 2019 Deadline Dates */

==Welcome to MIREX 2019==

This is the main page for the 15th running of the Music Information Retrieval Evaluation eXchange (MIREX 2019). The International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL) at [https://ischool.illinois.edu School of Information Sciences], University of Illinois at Urbana-Champaign ([http://www.illinois.edu UIUC]) is the principal organizer of MIREX 2019.

The MIREX 2019 community will hold its annual meeting as part of [http://ismir2019.ewi.tudelft.nl The 20th International Society for Music Information Retrieval Conference], ISMIR 2019, which will be held in Delft, The Netherlands, November 4-8, 2019.

J. Stephen Downie<br>
Director, IMIRSEL<br>

==Task Leadership Model==

Like previous years, we are prepared to improve the distribution of tasks for the upcoming MIREX 2019. To do so, we really need leaders to help us organize and run each task.

To volunteer to lead a task, please complete the form [TBD]. Current information about task captains can be found on the [[2019:Task Captains]] page. Please direct any communication to the [https://lists.ischool.illinois.edu/lists/admin/evalfest EvalFest] mailing list.

What does it mean to lead a task?
* Update wiki pages as needed
* Communicate with submitters and troubleshooting submissions
* Execution and evaluation of submissions
* Publishing final results

Due to the proprietary nature of much of the data, the submission system, evaluation framework, and most of the datasets will continue to be hosted by IMIRSEL. However, we are prepared to provide access to task organizers to manage and run submissions on the IMIRSEL systems.

We really need leaders to help us this year!

==MIREX 2019 Deadline Dates==
* '''September 2nd 2019'''
** [[2019:Audio Fingerprinting]] <TC: Chung-Che Wang>

* '''September 9th 2019'''
** [[2019:Audio Classification (Train/Test) Tasks]] <TC: Yun Hao (IMIRSEL)>, including
*** Audio US Pop Genre Classification
*** Audio Latin Genre Classification
*** Audio Music Mood Classification
*** Audio Classical Composer Identification
** [[2019:Audio K-POP Mood Classification]] <TC: Yun Hao (IMIRSEL)>
** [[2019:Audio K-POP Genre Classification]] <TC: Yun Hao (IMIRSEL)>
** [[2019:Patterns for Prediction]] (offshoot of [[2017:Discovery of Repeated Themes & Sections]]) <TC: Iris Ren, Berit Janssen, James Owers, and Tom Collins>

* '''September 23th 2019'''
** [[2019:Audio Cover Song Identification]] <TC: Yun Hao (IMIRSEL)>
** [[2019:Multiple Fundamental Frequency Estimation & Tracking]] <TC: Yun Hao (IMIRSEL)>

* '''TBD very soon'''
** [[2019:Audio Beat Tracking]] <TC: Aggelos Gkiokas>
** [[2019:Audio Chord Estimation]] <TC: Johan Pauwels>
** [[2019:Audio Downbeat Estimation]] <TC: Mickaël Zehren>
** [[2019:Audio Key Detection]] <TC: Johan Pauwels>
** [[2019:Audio Onset Detection]] <TC: Sebastian Böck>
** [[2019:Audio Tempo Estimation]] <TC: Aggelos Gkiokas, Hendrik Schreiber>
** [[2019:Automatic Lyrics-to-Audio Alignment]] <TC: Georgi Dzhambazov, Daniel Stoller>
** [[2019:Set List Identification]] <TC: Ming-Chi Yen>
** [[2019:Music Detection]] <TC: Blai Meléndez-Catalán>

==MIREX 2019 Possible Evaluation Tasks==
* [[2019:Audio Classification (Train/Test) Tasks]], incorporating:
** Audio US Pop Genre Classification
** Audio Latin Genre Classification
** Audio Music Mood Classification
** Audio Classical Composer Identification
** [[2019:Audio K-POP Mood Classification]]
** [[2019:Audio K-POP Genre Classification]]
* [[2019:Audio Beat Tracking]]
* [[2019:Audio Chord Estimation]]
* [[2019:Audio Cover Song Identification]]
* [[2019:Audio Downbeat Estimation]]
* [[2019:Audio Key Detection]]
* [[2019:Audio Onset Detection]]
* [[2019:Audio Tempo Estimation]]
* [[2019:Patterns for Prediction]] (offshoot of Discovery of Repeated Themes & Sections from previous years)
* [[2019:Automatic Lyrics-to-Audio Alignment]]
* [[2019:Drum Transcription]]
* [[2019:Multiple Fundamental Frequency Estimation & Tracking]]
* [[2019:Real-time Audio to Score Alignment (a.k.a Score Following)]]
* [[2019:Structural Segmentation]]
* [[2019:Audio Fingerprinting]]
* [[2019:Set List Identification]]
* [[2019:Query by Singing/Humming]]
* [[2019:Singing Voice Separation]]
* [[2019:Audio Tag Classification]]
* [[2019:Audio Music Similarity and Retrieval]]
* [[2019:Symbolic Melodic Similarity]]
* [[2019:Audio Melody Extraction]]
* [[2019:Query by Tapping]]

==MIREX 2019 Submission Instructions==
* Be sure to read through the rest of this page
* Be sure to read though the task pages for which you are submitting
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2019 Submission Instructions]] including both the tutorial video and the text
* The MIREX 2019 Submission System is coming soon at: https://www.music-ir.org/mirex/sub/ .

==MIREX 2019 Evaluation==

===Note to New Participants===
Please take the time to read the following review articles that explain the history and structure of MIREX.

Downie, J. Stephen (2008). The Music Information Retrieval Evaluation Exchange (2005-2007):<br>
A window into music information retrieval research.''Acoustical Science and Technology 29'' (4): 247-255. <br>
Available at: [http://dx.doi.org/10.1250/ast.29.247 http://dx.doi.org/10.1250/ast.29.247]

Downie, J. Stephen, Andreas F. Ehmann, Mert Bay and M. Cameron Jones. (2010).<br>
The Music Information Retrieval Evaluation eXchange: Some Observations and Insights.<br>
''Advances in Music Information Retrieval'' Vol. 274, pp. 93-115<br>
Available at: [http://bit.ly/KpM5u5 http://bit.ly/KpM5u5]

===Runtime Limits===

We reserve the right to stop any process that exceeds runtime limits for each task. We will do our best to notify you in enough time to allow revisions, but this may not be possible in some cases. Please respect the published runtime limits.

===Note to All Participants===

Because MIREX is premised upon the sharing of ideas and results, '''ALL''' MIREX participants are expected to:

# submit a DRAFT 2-3 page extended abstract PDF in the ISMIR format about the submitted program(s) to help us and the community better understand how the algorithm works when submitting their programme(s).
# submit a FINALIZED 2-3 page extended abstract PDF in the ISMIR format prior to ISMIR 2019 for posting on the respective results pages (sometimes the same abstract can be used for multiple submissions; in many cases the DRAFT and FINALIZED abstracts are the same)
# present a poster at the MIREX 2019 poster session at ISMIR 2019

===Software Dependency Requests===
If you have not submitted to MIREX before or are unsure whether IMIRSEL currently supports some of the software/architecture dependencies for your submission a [https://goo.gl/forms/96Wndw9j9dzv4x3c2 dependency request form is available]. Please submit details of your dependencies on this form and the IMIRSEL team will attempt to satisfy them for you.

Due to the high volume of submissions expected at MIREX 2019, submissions with difficulty to satisfy dependencies that the team has not been given sufficient notice of may result in the submission being rejected.

Finally, you will also be expected to detail your software/architecture dependencies in a README file to be provided to the submission system.

==Getting Involved in MIREX 2019==
MIREX is a community-based endeavour. Be a part of the community and help make MIREX 2019 the best yet.

===Mailing List Participation===
If you are interested in formal MIR evaluation, you should also subscribe to the "MIREX" (aka "EvalFest") mail list and participate in the community discussions about defining and running MIREX 2019 tasks. Subscription information at:
[https://mail.lis.illinois.edu/mailman/listinfo/evalfest EvalFest Central].

If you are participating in MIREX 2019, it is VERY IMPORTANT that you are subscribed to EvalFest. Deadlines, task updates and other important information will be announced via this mailing list. Please use the EvalFest for discussion of MIREX task proposals and other MIREX related issues. This wiki (MIREX 2019 wiki) will be used to embody and disseminate task proposals, however, task related discussions should be conducted on the MIREX organization mailing list (EvalFest) rather than on this wiki, but should be summarized here.

Where possible, definitions or example code for new evaluation metrics or tasks should be provided to the IMIRSEL team who will embody them in software as part of the NEMA analytics framework, which will be released to the community at or before ISMIR 2019 - providing a standardised set of interfaces and output to disciplined evaluation procedures for a great many MIR tasks.

===Wiki Participation===
If you find that you cannot edit a MIREX wiki page, you will need to create a new account via: [[Special:Userlogin]].

Please note that because of "spam-bots", MIREX wiki registration requests may be moderated by IMIRSEL members. It might take up to 24 hours for approval (Thank you for your patience!).

==MIREX 2005 - 2018 Wikis==
Content from MIREX 2005 - 2018 are available at:
'''[[2018:Main_Page|MIREX 2018]]'''
'''[[2017:Main_Page|MIREX 2017]]'''
'''[[2016:Main_Page|MIREX 2016]]'''
'''[[2015:Main_Page|MIREX 2015]]'''
'''[[2014:Main_Page|MIREX 2014]]'''
'''[[2013:Main_Page|MIREX 2013]]'''
'''[[2012:Main_Page|MIREX 2012]]'''
'''[[2011:Main_Page|MIREX 2011]]'''
'''[[2010:Main_Page|MIREX 2010]]'''
'''[[2009:Main_Page|MIREX 2009]]'''
'''[[2008:Main_Page|MIREX 2008]]'''
'''[[2007:Main_Page|MIREX 2007]]'''
'''[[2006:Main_Page|MIREX 2006]]'''
'''[[2005:Main_Page|MIREX 2005]]'''

2019:Patterns for Prediction

2019-07-09T14:48:29Z

Tom Collins: /* Description */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are [http://beritjanssen.com/ Berit Janssen] (berit.janssen), [https://sites.google.com/view/iyr/home Iris YuPing Ren] (yuping.ren.iris all at gmail.com), [https://jamesowers.github.io/ James Owers] (james.f.owers), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins, all at gmail.com). Please copy in all four of us if you have questions/comments.

The '''submission deadline''' is '''TO BE DETERMINED'''.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Sep2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Sep2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

(These examples are from an earlier version of the dataset, PPDD-Jul2018, but the above observations apply also to the current version of the dataset.)

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "A" or "B" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "A" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder A, and column "B" similarly for the continuation in folder B.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*input representation(s), should be 1 or more of "audio, monophonic"; "audio, polyphonic"; "symbolic, monophonic"; "symbolic, polyphonic";
*subtasks you would like your algorithm to be evaluated on, should be "1", "2", or "1 and 2" (see first sentences of [[2018:Patterns_for_Prediction#Description]] for a reminder);
*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/tree/mirex2019 GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is pitch, represented by MNN.

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1)</math>, <math>(\hat{x}_2, \hat{y}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, and <math>\hat{y}_i</math> are predicted MNNs. The true continuations are notated <math>(x_1, y_1), (x_2, y_2),..., (x_n, y_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===Subtask 1===
We represent each note in the true and algorithmic continuation as a point in a two-dimensional space of onset and pitch, giving the point-set <math>\mathbf{P}</math> for the true continuation, and <math>\mathbf{Q}</math> for the algorithmic continuation. We calculate differences between all points <math>p_i</math> in <math>\mathbf{P}</math> and <math>q_j</math> in <math>\mathbf{Q}</math>, which represent the translation vectors <math>\mathbf{T}</math> to transform a given algorithmically generated note into a note from the true continuation:

<math>
\text{cp}(\mathbf{P},\mathbf{Q}) = \max_\mathbf{T} |\{q_j | q_j \in \mathbf{Q} \wedge q_j + \mathbf{T} \in \mathbf{P}\}|
</math>

We define recall as the number of correctly predicted notes, divided by the cardinality of the true continuation point set <math>\mathbf{P}</math>. Since there exists at least one point in <math>\mathbf{Q}</math> which can be translated by any vector to a point in <math>\mathbf{P}</math>, we subtract <math>1</math> from numerator and denominator to scale to <math>[0,1]</math>.

<math>
\text{Rec} = (\text{cp}(\mathbf{P},\mathbf{Q}) - 1) / (|\mathbf{P}| - 1)
</math>

Precision is the number of correctly predicted notes, divided by the cardinality of the point set of the algorithmic continuation <math>\mathbf{Q}</math>, scaled in the same way:

<math>
\text{Prec} = (\text{cp}(\mathbf{P},\mathbf{Q}) - 1) / (|\mathbf{Q}| - 1)
</math>

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2019:Patterns for Prediction

2019-07-09T14:47:59Z

Tom Collins: /* Description */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are [http://beritjanssen.com/ Berit Janssen] (berit.janssen), [https://sites.google.com/view/iyr/home Iris Yuping Ren] (yuping.ren.iris all at gmail.com), [https://jamesowers.github.io/ James Owers] (james.f.owers), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins, all at gmail.com). Please copy in all four of us if you have questions/comments.

The '''submission deadline''' is '''TO BE DETERMINED'''.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Sep2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Sep2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

(These examples are from an earlier version of the dataset, PPDD-Jul2018, but the above observations apply also to the current version of the dataset.)

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "A" or "B" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "A" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder A, and column "B" similarly for the continuation in folder B.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*input representation(s), should be 1 or more of "audio, monophonic"; "audio, polyphonic"; "symbolic, monophonic"; "symbolic, polyphonic";
*subtasks you would like your algorithm to be evaluated on, should be "1", "2", or "1 and 2" (see first sentences of [[2018:Patterns_for_Prediction#Description]] for a reminder);
*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/tree/mirex2019 GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is pitch, represented by MNN.

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1)</math>, <math>(\hat{x}_2, \hat{y}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, and <math>\hat{y}_i</math> are predicted MNNs. The true continuations are notated <math>(x_1, y_1), (x_2, y_2),..., (x_n, y_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===Subtask 1===
We represent each note in the true and algorithmic continuation as a point in a two-dimensional space of onset and pitch, giving the point-set <math>\mathbf{P}</math> for the true continuation, and <math>\mathbf{Q}</math> for the algorithmic continuation. We calculate differences between all points <math>p_i</math> in <math>\mathbf{P}</math> and <math>q_j</math> in <math>\mathbf{Q}</math>, which represent the translation vectors <math>\mathbf{T}</math> to transform a given algorithmically generated note into a note from the true continuation:

<math>
\text{cp}(\mathbf{P},\mathbf{Q}) = \max_\mathbf{T} |\{q_j | q_j \in \mathbf{Q} \wedge q_j + \mathbf{T} \in \mathbf{P}\}|
</math>

We define recall as the number of correctly predicted notes, divided by the cardinality of the true continuation point set <math>\mathbf{P}</math>. Since there exists at least one point in <math>\mathbf{Q}</math> which can be translated by any vector to a point in <math>\mathbf{P}</math>, we subtract <math>1</math> from numerator and denominator to scale to <math>[0,1]</math>.

<math>
\text{Rec} = (\text{cp}(\mathbf{P},\mathbf{Q}) - 1) / (|\mathbf{P}| - 1)
</math>

Precision is the number of correctly predicted notes, divided by the cardinality of the point set of the algorithmic continuation <math>\mathbf{Q}</math>, scaled in the same way:

<math>
\text{Prec} = (\text{cp}(\mathbf{P},\mathbf{Q}) - 1) / (|\mathbf{Q}| - 1)
</math>

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2019:Patterns for Prediction

2019-07-09T14:45:08Z

Tom Collins: /* Description */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are [http://beritjanssen.com/ Berit Janssen] (berit.janssen), Iris Yuping Ren (yuping.ren.iris all at gmail.com), James Owers (james.f.owers), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins, all at gmail.com). Please copy in all four of us if you have questions/comments.

The '''submission deadline''' is '''TO BE DETERMINED'''.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Sep2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Sep2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

(These examples are from an earlier version of the dataset, PPDD-Jul2018, but the above observations apply also to the current version of the dataset.)

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "A" or "B" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "A" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder A, and column "B" similarly for the continuation in folder B.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*input representation(s), should be 1 or more of "audio, monophonic"; "audio, polyphonic"; "symbolic, monophonic"; "symbolic, polyphonic";
*subtasks you would like your algorithm to be evaluated on, should be "1", "2", or "1 and 2" (see first sentences of [[2018:Patterns_for_Prediction#Description]] for a reminder);
*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/tree/mirex2019 GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is pitch, represented by MNN.

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1)</math>, <math>(\hat{x}_2, \hat{y}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, and <math>\hat{y}_i</math> are predicted MNNs. The true continuations are notated <math>(x_1, y_1), (x_2, y_2),..., (x_n, y_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===Subtask 1===
We represent each note in the true and algorithmic continuation as a point in a two-dimensional space of onset and pitch, giving the point-set <math>\mathbf{P}</math> for the true continuation, and <math>\mathbf{Q}</math> for the algorithmic continuation. We calculate differences between all points <math>p_i</math> in <math>\mathbf{P}</math> and <math>q_j</math> in <math>\mathbf{Q}</math>, which represent the translation vectors <math>\mathbf{T}</math> to transform a given algorithmically generated note into a note from the true continuation:

<math>
\text{cp}(\mathbf{P},\mathbf{Q}) = \max_\mathbf{T} |\{q_j | q_j \in \mathbf{Q} \wedge q_j + \mathbf{T} \in \mathbf{P}\}|
</math>

We define recall as the number of correctly predicted notes, divided by the cardinality of the true continuation point set <math>\mathbf{P}</math>. Since there exists at least one point in <math>\mathbf{Q}</math> which can be translated by any vector to a point in <math>\mathbf{P}</math>, we subtract <math>1</math> from numerator and denominator to scale to <math>[0,1]</math>.

<math>
\text{Rec} = (\text{cp}(\mathbf{P},\mathbf{Q}) - 1) / (|\mathbf{P}| - 1)
</math>

Precision is the number of correctly predicted notes, divided by the cardinality of the point set of the algorithmic continuation <math>\mathbf{Q}</math>, scaled in the same way:

<math>
\text{Prec} = (\text{cp}(\mathbf{P},\mathbf{Q}) - 1) / (|\mathbf{Q}| - 1)
</math>

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2019:Main Page

2019-07-09T14:43:01Z

Tom Collins: /* MIREX 2019 Deadline Dates */

==Welcome to MIREX 2019==

This is the main page for the 15th running of the Music Information Retrieval Evaluation eXchange (MIREX 2019). The International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL) at [https://ischool.illinois.edu School of Information Sciences], University of Illinois at Urbana-Champaign ([http://www.illinois.edu UIUC]) is the principal organizer of MIREX 2019.

The MIREX 2019 community will hold its annual meeting as part of [http://ismir2019.ewi.tudelft.nl The 20th International Society for Music Information Retrieval Conference], ISMIR 2019, which will be held in Delft, The Netherlands, November 4-8, 2019.

J. Stephen Downie<br>
Director, IMIRSEL<br>

==Task Leadership Model==

Like previous years, we are prepared to improve the distribution of tasks for the upcoming MIREX 2019. To do so, we really need leaders to help us organize and run each task.

To volunteer to lead a task, please complete the form [TBD]. Current information about task captains can be found on the [[2019:Task Captains]] page. Please direct any communication to the [https://lists.ischool.illinois.edu/lists/admin/evalfest EvalFest] mailing list.

What does it mean to lead a task?
* Update wiki pages as needed
* Communicate with submitters and troubleshooting submissions
* Execution and evaluation of submissions
* Publishing final results

Due to the proprietary nature of much of the data, the submission system, evaluation framework, and most of the datasets will continue to be hosted by IMIRSEL. However, we are prepared to provide access to task organizers to manage and run submissions on the IMIRSEL systems.

We really need leaders to help us this year!

==MIREX 2019 Deadline Dates==
* '''September 2nd 2019'''
** [[2019:Audio Fingerprinting]] <TC: Chung-Che Wang>

* '''September 9th 2019'''
** [[2019:Audio Classification (Train/Test) Tasks]] <TC: Yun Hao (IMIRSEL)>, including
*** Audio US Pop Genre Classification
*** Audio Latin Genre Classification
*** Audio Music Mood Classification
*** Audio Classical Composer Identification
** [[2019:Audio K-POP Mood Classification]] <TC: Yun Hao (IMIRSEL)>
** [[2019:Audio K-POP Genre Classification]] <TC: Yun Hao (IMIRSEL)>

* '''September 23th 2019'''
** [[2019:Audio Cover Song Identification]] <TC: Yun Hao (IMIRSEL)>
** [[2019:Multiple Fundamental Frequency Estimation & Tracking]] <TC: Yun Hao (IMIRSEL)>

* '''TBD by July 12th 2019'''
** [[2019:Audio Beat Tracking]] <TC: Aggelos Gkiokas>
** [[2019:Audio Chord Estimation]] <TC: Johan Pauwels>
** [[2019:Audio Downbeat Estimation]] <TC: Mickaël Zehren>
** [[2019:Audio Key Detection]] <TC: Johan Pauwels>
** [[2019:Audio Onset Detection]] <TC: Sebastian Böck>
** [[2019:Audio Tempo Estimation]] <TC: Aggelos Gkiokas, Hendrik Schreiber>
** [[2019:Automatic Lyrics-to-Audio Alignment]] <TC: Georgi Dzhambazov, Daniel Stoller>
** [[2019:Patterns for Prediction]] (offshoot of [[2017:Discovery of Repeated Themes & Sections]]) <TC: Iris Ren, Berit Janssen, James Owers, and Tom Collins>
** [[2019:Set List Identification]] <TC: Ming-Chi Yen>
** [[2019:Music Detection]] <TC: Blai Meléndez-Catalán>

==MIREX 2019 Possible Evaluation Tasks==
* [[2019:Audio Classification (Train/Test) Tasks]], incorporating:
** Audio US Pop Genre Classification
** Audio Latin Genre Classification
** Audio Music Mood Classification
** Audio Classical Composer Identification
** [[2019:Audio K-POP Mood Classification]]
** [[2019:Audio K-POP Genre Classification]]
* [[2019:Audio Beat Tracking]]
* [[2019:Audio Chord Estimation]]
* [[2019:Audio Cover Song Identification]]
* [[2019:Audio Downbeat Estimation]]
* [[2019:Audio Key Detection]]
* [[2019:Audio Onset Detection]]
* [[2019:Audio Tempo Estimation]]
* [[2019:Patterns for Prediction]] (offshoot of Discovery of Repeated Themes & Sections from previous years)
* [[2019:Automatic Lyrics-to-Audio Alignment]]
* [[2019:Drum Transcription]]
* [[2019:Multiple Fundamental Frequency Estimation & Tracking]]
* [[2019:Real-time Audio to Score Alignment (a.k.a Score Following)]]
* [[2019:Structural Segmentation]]
* [[2019:Discovery of Repeated Themes & Sections]]
* [[2019:Audio Fingerprinting]]
* [[2019:Set List Identification]]
* [[2019:Query by Singing/Humming]]
* [[2019:Singing Voice Separation]]
* [[2019:Audio Tag Classification]]
* [[2019:Audio Music Similarity and Retrieval]]
* [[2019:Symbolic Melodic Similarity]]
* [[2019:Audio Melody Extraction]]
* [[2019:Query by Tapping]]
* [[2019:Music Detection]]

==MIREX 2019 Submission Instructions==
* Be sure to read through the rest of this page
* Be sure to read though the task pages for which you are submitting
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2019 Submission Instructions]] including both the tutorial video and the text
* The MIREX 2019 Submission System is coming soon at: https://www.music-ir.org/mirex/sub/ .

==MIREX 2019 Evaluation==

===Note to New Participants===
Please take the time to read the following review articles that explain the history and structure of MIREX.

Downie, J. Stephen (2008). The Music Information Retrieval Evaluation Exchange (2005-2007):<br>
A window into music information retrieval research.''Acoustical Science and Technology 29'' (4): 247-255. <br>
Available at: [http://dx.doi.org/10.1250/ast.29.247 http://dx.doi.org/10.1250/ast.29.247]

Downie, J. Stephen, Andreas F. Ehmann, Mert Bay and M. Cameron Jones. (2010).<br>
The Music Information Retrieval Evaluation eXchange: Some Observations and Insights.<br>
''Advances in Music Information Retrieval'' Vol. 274, pp. 93-115<br>
Available at: [http://bit.ly/KpM5u5 http://bit.ly/KpM5u5]

===Runtime Limits===

We reserve the right to stop any process that exceeds runtime limits for each task. We will do our best to notify you in enough time to allow revisions, but this may not be possible in some cases. Please respect the published runtime limits.

===Note to All Participants===

Because MIREX is premised upon the sharing of ideas and results, '''ALL''' MIREX participants are expected to:

# submit a DRAFT 2-3 page extended abstract PDF in the ISMIR format about the submitted program(s) to help us and the community better understand how the algorithm works when submitting their programme(s).
# submit a FINALIZED 2-3 page extended abstract PDF in the ISMIR format prior to ISMIR 2019 for posting on the respective results pages (sometimes the same abstract can be used for multiple submissions; in many cases the DRAFT and FINALIZED abstracts are the same)
# present a poster at the MIREX 2019 poster session at ISMIR 2019

===Software Dependency Requests===
If you have not submitted to MIREX before or are unsure whether IMIRSEL currently supports some of the software/architecture dependencies for your submission a [https://goo.gl/forms/96Wndw9j9dzv4x3c2 dependency request form is available]. Please submit details of your dependencies on this form and the IMIRSEL team will attempt to satisfy them for you.

Due to the high volume of submissions expected at MIREX 2019, submissions with difficulty to satisfy dependencies that the team has not been given sufficient notice of may result in the submission being rejected.

Finally, you will also be expected to detail your software/architecture dependencies in a README file to be provided to the submission system.

==Getting Involved in MIREX 2019==
MIREX is a community-based endeavour. Be a part of the community and help make MIREX 2019 the best yet.

===Mailing List Participation===
If you are interested in formal MIR evaluation, you should also subscribe to the "MIREX" (aka "EvalFest") mail list and participate in the community discussions about defining and running MIREX 2019 tasks. Subscription information at:
[https://mail.lis.illinois.edu/mailman/listinfo/evalfest EvalFest Central].

If you are participating in MIREX 2019, it is VERY IMPORTANT that you are subscribed to EvalFest. Deadlines, task updates and other important information will be announced via this mailing list. Please use the EvalFest for discussion of MIREX task proposals and other MIREX related issues. This wiki (MIREX 2019 wiki) will be used to embody and disseminate task proposals, however, task related discussions should be conducted on the MIREX organization mailing list (EvalFest) rather than on this wiki, but should be summarized here.

Where possible, definitions or example code for new evaluation metrics or tasks should be provided to the IMIRSEL team who will embody them in software as part of the NEMA analytics framework, which will be released to the community at or before ISMIR 2019 - providing a standardised set of interfaces and output to disciplined evaluation procedures for a great many MIR tasks.

===Wiki Participation===
If you find that you cannot edit a MIREX wiki page, you will need to create a new account via: [[Special:Userlogin]].

Please note that because of "spam-bots", MIREX wiki registration requests may be moderated by IMIRSEL members. It might take up to 24 hours for approval (Thank you for your patience!).

==MIREX 2005 - 2018 Wikis==
Content from MIREX 2005 - 2018 are available at:
'''[[2018:Main_Page|MIREX 2018]]'''
'''[[2017:Main_Page|MIREX 2017]]'''
'''[[2016:Main_Page|MIREX 2016]]'''
'''[[2015:Main_Page|MIREX 2015]]'''
'''[[2014:Main_Page|MIREX 2014]]'''
'''[[2013:Main_Page|MIREX 2013]]'''
'''[[2012:Main_Page|MIREX 2012]]'''
'''[[2011:Main_Page|MIREX 2011]]'''
'''[[2010:Main_Page|MIREX 2010]]'''
'''[[2009:Main_Page|MIREX 2009]]'''
'''[[2008:Main_Page|MIREX 2008]]'''
'''[[2007:Main_Page|MIREX 2007]]'''
'''[[2006:Main_Page|MIREX 2006]]'''
'''[[2005:Main_Page|MIREX 2005]]'''

2019:Patterns for Prediction

2019-06-14T18:45:34Z

Tom Collins: /* Some examples */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are [http://beritjanssen.com/ Berit Janssen] (berit.janssen), [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins), and Iris Yuping Ren (yuping.ren.iris all at gmail.com). Please copy in all three of us if you have questions/comments.

The '''submission deadline''' is '''TO BE DETERMINED'''.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Sep2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Sep2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

(These examples are from an earlier version of the dataset, PPDD-Jul2018, but the above observations apply also to the current version of the dataset.)

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "A" or "B" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "A" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder A, and column "B" similarly for the continuation in folder B.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*input representation(s), should be 1 or more of "audio, monophonic"; "audio, polyphonic"; "symbolic, monophonic"; "symbolic, polyphonic";
*subtasks you would like your algorithm to be evaluated on, should be "1", "2", or "1 and 2" (see first sentences of [[2018:Patterns_for_Prediction#Description]] for a reminder);
*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2019:Patterns for Prediction

2019-06-14T18:42:06Z

Tom Collins: /* Data */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are [http://beritjanssen.com/ Berit Janssen] (berit.janssen), [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins), and Iris Yuping Ren (yuping.ren.iris all at gmail.com). Please copy in all three of us if you have questions/comments.

The '''submission deadline''' is '''TO BE DETERMINED'''.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Sep2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Sep2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-Sep2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "A" or "B" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "A" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder A, and column "B" similarly for the continuation in folder B.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*input representation(s), should be 1 or more of "audio, monophonic"; "audio, polyphonic"; "symbolic, monophonic"; "symbolic, polyphonic";
*subtasks you would like your algorithm to be evaluated on, should be "1", "2", or "1 and 2" (see first sentences of [[2018:Patterns_for_Prediction#Description]] for a reminder);
*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2019:Patterns for Prediction

2019-06-14T18:38:42Z

Tom Collins:

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are [http://beritjanssen.com/ Berit Janssen] (berit.janssen), [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins), and Iris Yuping Ren (yuping.ren.iris all at gmail.com). Please copy in all three of us if you have questions/comments.

The '''submission deadline''' is '''TO BE DETERMINED'''.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Sep2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Sep2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-sep2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-sep2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-sep2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-sep2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-sep2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-sep2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-sep2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-sep2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-sep2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-sep2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-sep2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-sep2018/PPDD-sep2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "A" or "B" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "A" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder A, and column "B" similarly for the continuation in folder B.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*input representation(s), should be 1 or more of "audio, monophonic"; "audio, polyphonic"; "symbolic, monophonic"; "symbolic, polyphonic";
*subtasks you would like your algorithm to be evaluated on, should be "1", "2", or "1 and 2" (see first sentences of [[2018:Patterns_for_Prediction#Description]] for a reminder);
*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2019:Patterns for Prediction

2019-04-26T19:20:02Z

Tom Collins: Created page with "== Description == '''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt. (2) Additiona..."

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are [http://beritjanssen.com/ Berit Janssen] (berit.janssen), [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins), and Iris Yuping Ren (yuping.ren.iris all at gmail.com). Please copy in all three of us if you have questions/comments.

The '''submission deadline''' is '''TO BE DETERMINED'''.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "A" or "B" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "A" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder A, and column "B" similarly for the continuation in folder B.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*input representation(s), should be 1 or more of "audio, monophonic"; "audio, polyphonic"; "symbolic, monophonic"; "symbolic, polyphonic";
*subtasks you would like your algorithm to be evaluated on, should be "1", "2", or "1 and 2" (see first sentences of [[2018:Patterns_for_Prediction#Description]] for a reminder);
*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

MIREX HOME

2019-04-26T19:17:47Z

Tom Collins: /* MIREX 2019 Possible Evaluation Tasks */

==Welcome to MIREX 2019==

This is the main page for the 15th running of the Music Information Retrieval Evaluation eXchange (MIREX 2019). The International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL) at [https://ischool.illinois.edu School of Information Sciences], University of Illinois at Urbana-Champaign ([http://www.illinois.edu UIUC]) is the principal organizer of MIREX 2019.

The MIREX 2019 community will hold its annual meeting as part of [http://ismir2019.ewi.tudelft.nl The 20th International Society for Music Information Retrieval Conference], ISMIR 2019, which will be held in Delft, The Netherlands, November 4-8, 2019.

J. Stephen Downie<br>
Director, IMIRSEL<br>

==Task Leadership Model==

Like previous years, we are prepared to improve the distribution of tasks for the upcoming MIREX 2019. To do so, we really need leaders to help us organize and run each task.

To volunteer to lead a task, please complete the form [TBD]. Current information about task captains can be found on the [[2019:Task Captains]] page. Please direct any communication to the [https://lists.ischool.illinois.edu/lists/admin/evalfest EvalFest] mailing list.

What does it mean to lead a task?
* Update wiki pages as needed
* Communicate with submitters and troubleshooting submissions
* Execution and evaluation of submissions
* Publishing final results

Due to the proprietary nature of much of the data, the submission system, evaluation framework, and most of the datasets will continue to be hosted by IMIRSEL. However, we are prepared to provide access to task organizers to manage and run submissions on the IMIRSEL systems.

We really need leaders to help us this year!

==MIREX 2019 Deadline Dates==
* TBD

==MIREX 2019 Possible Evaluation Tasks==
* [[2019:Audio Classification (Train/Test) Tasks]], incorporating:
** Audio US Pop Genre Classification
** Audio Latin Genre Classification
** Audio Music Mood Classification
** Audio Classical Composer Identification
** [[2019:Audio K-POP Mood Classification]]
** [[2019:Audio K-POP Genre Classification]]
* [[2019:Audio Beat Tracking]]
* [[2019:Audio Chord Estimation]]
* [[2019:Audio Cover Song Identification]]
* [[2019:Audio Downbeat Estimation]]
* [[2019:Audio Key Detection]]
* [[2019:Audio Onset Detection]]
* [[2019:Audio Tempo Estimation]]
* [[2019:Patterns for Prediction]] (offshoot of Discovery of Repeated Themes & Sections from previous years)
* [[2019:Automatic Lyrics-to-Audio Alignment]]
* [[2019:Drum Transcription]]
* [[2019:Multiple Fundamental Frequency Estimation & Tracking]]
* [[2019:Real-time Audio to Score Alignment (a.k.a Score Following)]]
* [[2019:Structural Segmentation]]
* [[2019:Audio Fingerprinting]]
* [[2019:Set List Identification]]
* [[2019:Query by Singing/Humming]]
* [[2019:Singing Voice Separation]]
* [[2019:Audio Tag Classification]]
* [[2019:Audio Music Similarity and Retrieval]]
* [[2019:Symbolic Melodic Similarity]]
* [[2019:Audio Melody Extraction]]
* [[2019:Query by Tapping]]

==MIREX 2019 Submission Instructions==
* Be sure to read through the rest of this page
* Be sure to read though the task pages for which you are submitting
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2019 Submission Instructions]] including both the tutorial video and the text
* The MIREX 2019 Submission System is coming soon at: https://www.music-ir.org/mirex/sub/ .

==MIREX 2019 Evaluation==

===Note to New Participants===
Please take the time to read the following review articles that explain the history and structure of MIREX.

Downie, J. Stephen (2008). The Music Information Retrieval Evaluation Exchange (2005-2007):<br>
A window into music information retrieval research.''Acoustical Science and Technology 29'' (4): 247-255. <br>
Available at: [http://dx.doi.org/10.1250/ast.29.247 http://dx.doi.org/10.1250/ast.29.247]

Downie, J. Stephen, Andreas F. Ehmann, Mert Bay and M. Cameron Jones. (2010).<br>
The Music Information Retrieval Evaluation eXchange: Some Observations and Insights.<br>
''Advances in Music Information Retrieval'' Vol. 274, pp. 93-115<br>
Available at: [http://bit.ly/KpM5u5 http://bit.ly/KpM5u5]

===Runtime Limits===

We reserve the right to stop any process that exceeds runtime limits for each task. We will do our best to notify you in enough time to allow revisions, but this may not be possible in some cases. Please respect the published runtime limits.

===Note to All Participants===

Because MIREX is premised upon the sharing of ideas and results, '''ALL''' MIREX participants are expected to:

# submit a DRAFT 2-3 page extended abstract PDF in the ISMIR format about the submitted program(s) to help us and the community better understand how the algorithm works when submitting their programme(s).
# submit a FINALIZED 2-3 page extended abstract PDF in the ISMIR format prior to ISMIR 2019 for posting on the respective results pages (sometimes the same abstract can be used for multiple submissions; in many cases the DRAFT and FINALIZED abstracts are the same)
# present a poster at the MIREX 2019 poster session at ISMIR 2019

===Software Dependency Requests===
If you have not submitted to MIREX before or are unsure whether IMIRSEL currently supports some of the software/architecture dependencies for your submission a [https://goo.gl/forms/96Wndw9j9dzv4x3c2 dependency request form is available]. Please submit details of your dependencies on this form and the IMIRSEL team will attempt to satisfy them for you.

Due to the high volume of submissions expected at MIREX 2019, submissions with difficulty to satisfy dependencies that the team has not been given sufficient notice of may result in the submission being rejected.

Finally, you will also be expected to detail your software/architecture dependencies in a README file to be provided to the submission system.

==Getting Involved in MIREX 2019==
MIREX is a community-based endeavour. Be a part of the community and help make MIREX 2019 the best yet.

===Mailing List Participation===
If you are interested in formal MIR evaluation, you should also subscribe to the "MIREX" (aka "EvalFest") mail list and participate in the community discussions about defining and running MIREX 2019 tasks. Subscription information at:
[https://mail.lis.illinois.edu/mailman/listinfo/evalfest EvalFest Central].

If you are participating in MIREX 2019, it is VERY IMPORTANT that you are subscribed to EvalFest. Deadlines, task updates and other important information will be announced via this mailing list. Please use the EvalFest for discussion of MIREX task proposals and other MIREX related issues. This wiki (MIREX 2019 wiki) will be used to embody and disseminate task proposals, however, task related discussions should be conducted on the MIREX organization mailing list (EvalFest) rather than on this wiki, but should be summarized here.

Where possible, definitions or example code for new evaluation metrics or tasks should be provided to the IMIRSEL team who will embody them in software as part of the NEMA analytics framework, which will be released to the community at or before ISMIR 2019 - providing a standardised set of interfaces and output to disciplined evaluation procedures for a great many MIR tasks.

===Wiki Participation===
If you find that you cannot edit a MIREX wiki page, you will need to create a new account via: [[Special:Userlogin]].

Please note that because of "spam-bots", MIREX wiki registration requests may be moderated by IMIRSEL members. It might take up to 24 hours for approval (Thank you for your patience!).

==MIREX 2005 - 2018 Wikis==
Content from MIREX 2005 - 2018 are available at:
'''[[2018:Main_Page|MIREX 2018]]'''
'''[[2017:Main_Page|MIREX 2017]]'''
'''[[2016:Main_Page|MIREX 2016]]'''
'''[[2015:Main_Page|MIREX 2015]]'''
'''[[2014:Main_Page|MIREX 2014]]'''
'''[[2013:Main_Page|MIREX 2013]]'''
'''[[2012:Main_Page|MIREX 2012]]'''
'''[[2011:Main_Page|MIREX 2011]]'''
'''[[2010:Main_Page|MIREX 2010]]'''
'''[[2009:Main_Page|MIREX 2009]]'''
'''[[2008:Main_Page|MIREX 2008]]'''
'''[[2007:Main_Page|MIREX 2007]]'''
'''[[2006:Main_Page|MIREX 2006]]'''
'''[[2005:Main_Page|MIREX 2005]]'''

2018:Patterns for Prediction Results

2018-09-18T14:28:08Z

Tom Collins: /* Datasets and Algorithms */

== Introduction ==

THIS PAGE IS UNDER CONSTRUCTION!

The task: ...

== Contribution ==

...

For a more detailed introduction to the task, please see [[2018:Patterns for Prediction]].

== Datasets and Algorithms ==

The training datasets were...

Submissions to the symMono and symPoly variants of the task are listed in Table 1. There were no submissions to the audMono or audPoly variants of the task this year. The task captains prepared a first-order Markov model (MM) over a state space of measure beat and key-centralized MIDI note number. This enabled evaluation of the implicit subtask, and can also serve as a point of comparison for the explicit task. It should be noted, however, that this model had access to the full song/piece – '''not just the prime''' – so it is at an advantage compared to EN1 and FC1 in the explicit task.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! EN1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/EN1.pdf PDF] || [http://ericpnichols.com/ Eric Nichols]
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || For purposes of comparison
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || For purposes of comparison
|-
|}

'''Table 1.''' Algorithms submitted to Patterns for Prediction 2018.

'''(DESCRIBE SUBMISSIONS EN1 AND FC1 BRIEFLY HERE.)'''

== Results ==

An intro spiel here...

(For mathematical definitions of the various metrics, please see [[2018:Patterns_for_Prediction#Evaluation_Procedure]].)

===SymMono===
Here are some results (cf. Figures 1-3), and some interpretation. Don't forget these as well (Figures 4-6), showing something.

Remarks on runtime appropriate here too.

===SymPoly===
And so on.

==Discussion==

...

Berit Janssen, Iris Ren, Tom Collins.

==Figures==
===SymMono===

[[File:2017_Mono_R_est.png|600px]]

'''Figure 1.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Mono_P_est.png|600px]]

'''Figure 2.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

===SymPoly===

[[File:2017_Poly_R_est.png|600px]]

'''Figure 12.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Poly_P_est.png|600px]]

'''Figure 13.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

==Tables==
===SymMono===
[https://www.dropbox.com/s/ajhvidekc4xl8vp/2017_drts_mono_patterns.csv?dl=0 Click to download SymMono pattern retrieval results table]

[https://www.dropbox.com/s/ipt2j4jw0qmkvkh/2017_drts_mono_compression.csv?dl=0 Click to download SymMono compression results table]

===SymPoly===
[https://www.dropbox.com/s/h8b731nlu8v1re0/2017_drts_poly_compression.csv?dl=0 Click to download SymPoly pattern retrieval results table]

[https://www.dropbox.com/s/2q0rj40szi2ybjn/2017_drts_poly_patterns.csv?dl=0 Click to download SymPoly compression results table]

2018:Patterns for Prediction Results

2018-09-18T14:27:15Z

Tom Collins: /* Datasets and Algorithms */

== Introduction ==

THIS PAGE IS UNDER CONSTRUCTION!

The task: ...

== Contribution ==

...

For a more detailed introduction to the task, please see [[2018:Patterns for Prediction]].

== Datasets and Algorithms ==

The training datasets were...

Submissions to the symMono and symPoly variants of the task are listed in Table 1. There were no submissions to the audMono or audPoly variants of the task this year. The task captains prepared a first-order Markov model (MM) over a state space of measure beat and key-centralized MIDI note number. This enabled evaluation of the implicit subtask, and can also serve as a point of comparison for the explicit task. It should be noted, however, that this model had access to the full song/piece – not just the prime – so it is at an advantage compared to EN1 and FC1 in the explicit task.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! EN1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/EN1.pdf PDF] || [http://ericpnichols.com/ Eric Nichols]
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || For purposes of comparison
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || For purposes of comparison
|-
|}

'''Table 1.''' Algorithms submitted to Patterns for Prediction 2018.

'''(DESCRIBE SUBMISSIONS EN1 AND FC1 BRIEFLY HERE.)'''

== Results ==

An intro spiel here...

(For mathematical definitions of the various metrics, please see [[2018:Patterns_for_Prediction#Evaluation_Procedure]].)

===SymMono===
Here are some results (cf. Figures 1-3), and some interpretation. Don't forget these as well (Figures 4-6), showing something.

Remarks on runtime appropriate here too.

===SymPoly===
And so on.

==Discussion==

...

Berit Janssen, Iris Ren, Tom Collins.

==Figures==
===SymMono===

[[File:2017_Mono_R_est.png|600px]]

'''Figure 1.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Mono_P_est.png|600px]]

'''Figure 2.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

===SymPoly===

[[File:2017_Poly_R_est.png|600px]]

'''Figure 12.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Poly_P_est.png|600px]]

'''Figure 13.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

==Tables==
===SymMono===
[https://www.dropbox.com/s/ajhvidekc4xl8vp/2017_drts_mono_patterns.csv?dl=0 Click to download SymMono pattern retrieval results table]

[https://www.dropbox.com/s/ipt2j4jw0qmkvkh/2017_drts_mono_compression.csv?dl=0 Click to download SymMono compression results table]

===SymPoly===
[https://www.dropbox.com/s/h8b731nlu8v1re0/2017_drts_poly_compression.csv?dl=0 Click to download SymPoly pattern retrieval results table]

[https://www.dropbox.com/s/2q0rj40szi2ybjn/2017_drts_poly_patterns.csv?dl=0 Click to download SymPoly compression results table]

2018:Patterns for Prediction Results

2018-09-18T14:27:04Z

Tom Collins: /* Datasets and Algorithms */

== Introduction ==

THIS PAGE IS UNDER CONSTRUCTION!

The task: ...

== Contribution ==

...

For a more detailed introduction to the task, please see [[2018:Patterns for Prediction]].

== Datasets and Algorithms ==

The training datasets were...

Submissions to the symMono and symPoly variants of the task are listed in Table 1. There were no submissions to the audMono or audPoly variants of the task this year. The task captains prepared a first-order Markov model (MM) over a state space of measure beat and key-centralized MIDI note number. This enabled evaluation of the implicit subtask, and can also serve as a point of comparison for the explicit task. It should be noted, however, that this model had access to the full song/piece – not just the prime – so it is at an advantage compared to EN1 and FC1 in the explicit task.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! EN1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/EN1.pdf PDF] || [http://ericpnichols.com/ Eric Nichols]
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || For purposes of comparison
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || For purposes of comparison
|-
|}

'''Table 1.''' Algorithms submitted to Patterns for Prediction 2018.

'''(DESCRIBE SUBMISSIONS EN1 AND FC1 BRIEFLY HERE.)'''

== Results ==

An intro spiel here...

(For mathematical definitions of the various metrics, please see [[2018:Patterns_for_Prediction#Evaluation_Procedure]].)

===SymMono===
Here are some results (cf. Figures 1-3), and some interpretation. Don't forget these as well (Figures 4-6), showing something.

Remarks on runtime appropriate here too.

===SymPoly===
And so on.

==Discussion==

...

Berit Janssen, Iris Ren, Tom Collins.

==Figures==
===SymMono===

[[File:2017_Mono_R_est.png|600px]]

'''Figure 1.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Mono_P_est.png|600px]]

'''Figure 2.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

===SymPoly===

[[File:2017_Poly_R_est.png|600px]]

'''Figure 12.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Poly_P_est.png|600px]]

'''Figure 13.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

==Tables==
===SymMono===
[https://www.dropbox.com/s/ajhvidekc4xl8vp/2017_drts_mono_patterns.csv?dl=0 Click to download SymMono pattern retrieval results table]

[https://www.dropbox.com/s/ipt2j4jw0qmkvkh/2017_drts_mono_compression.csv?dl=0 Click to download SymMono compression results table]

===SymPoly===
[https://www.dropbox.com/s/h8b731nlu8v1re0/2017_drts_poly_compression.csv?dl=0 Click to download SymPoly pattern retrieval results table]

[https://www.dropbox.com/s/2q0rj40szi2ybjn/2017_drts_poly_patterns.csv?dl=0 Click to download SymPoly compression results table]

2018:Patterns for Prediction Results

2018-09-18T14:17:57Z

Tom Collins: /* Training and Test Datasets */

== Introduction ==

THIS PAGE IS UNDER CONSTRUCTION!

The task: ...

== Contribution ==

...

For a more detailed introduction to the task, please see [[2018:Patterns for Prediction]].

== Datasets and Algorithms ==

...

There were no submissions to the audMono or audPoly variants of the task this year.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! EN1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/EN1.pdf PDF] || [http://ericpnichols.com/ Eric Nichols]
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || Intended as 'baseline'
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || Intended as 'baseline'
|-
|}

'''Table 1.''' Algorithms submitted to Patterns for Prediction 2018.

== Results ==

An intro spiel here...

(For mathematical definitions of the various metrics, please see [[2018:Patterns_for_Prediction#Evaluation_Procedure]].)

===SymMono===
Here are some results (cf. Figures 1-3), and some interpretation. Don't forget these as well (Figures 4-6), showing something.

Remarks on runtime appropriate here too.

===SymPoly===
And so on.

==Discussion==

...

Berit Janssen, Iris Ren, Tom Collins.

==Figures==
===SymMono===

[[File:2017_Mono_R_est.png|600px]]

'''Figure 1.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Mono_P_est.png|600px]]

'''Figure 2.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

===SymPoly===

[[File:2017_Poly_R_est.png|600px]]

'''Figure 12.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Poly_P_est.png|600px]]

'''Figure 13.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

==Tables==
===SymMono===
[https://www.dropbox.com/s/ajhvidekc4xl8vp/2017_drts_mono_patterns.csv?dl=0 Click to download SymMono pattern retrieval results table]

[https://www.dropbox.com/s/ipt2j4jw0qmkvkh/2017_drts_mono_compression.csv?dl=0 Click to download SymMono compression results table]

===SymPoly===
[https://www.dropbox.com/s/h8b731nlu8v1re0/2017_drts_poly_compression.csv?dl=0 Click to download SymPoly pattern retrieval results table]

[https://www.dropbox.com/s/2q0rj40szi2ybjn/2017_drts_poly_patterns.csv?dl=0 Click to download SymPoly compression results table]

2018:Patterns for Prediction Results

2018-09-18T14:16:25Z

Tom Collins: /* Figures */

2018:Patterns for Prediction Results

2018-09-18T14:15:55Z

Tom Collins: /* Discussion */

== Introduction ==

THIS PAGE IS UNDER CONSTRUCTION!

The task: ...

== Contribution ==

...

For a more detailed introduction to the task, please see [[2018:Patterns for Prediction]].

== Training and Test Datasets ==

...

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! EN1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/EN1.pdf PDF] || [http://ericpnichols.com/ Eric Nichols]
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || Intended as 'baseline'
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || Intended as 'baseline'
|-
|}

'''Table 1.''' Algorithms submitted to Patterns for Prediction 2018.

== Results ==

An intro spiel here...

(For mathematical definitions of the various metrics, please see [[2018:Patterns_for_Prediction#Evaluation_Procedure]].)

===SymMono===
Here are some results (cf. Figures 1-3), and some interpretation. Don't forget these as well (Figures 4-6), showing something.

Remarks on runtime appropriate here too.

===SymPoly===
And so on.

==Discussion==

...

Berit Janssen, Iris Ren, Tom Collins.

==Figures==
===SymMono===

[[File:2017_Mono_R_est.png|600px]]

'''Figure 1.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Mono_P_est.png|600px]]

'''Figure 2.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

[[File:2017_Mono_F1_est.png|600px]]

'''Figure 3.''' Establishment F1 averaged over each piece/movement. Establishment F1 is an average of establishment precision and establishment recall.

[[File:2017_Mono_R_occ_75.png|600px]]

'''Figure 4.''' Occurrence recall (<math>c = .75</math>) averaged over each piece/movement. Occurrence recall answers the following question. On average, how similar is the most similar set of algorithm-output pattern occurrences to a discovered ground-truth occurrence set?

[[File:2017_Mono_P_occ_75.png|600px]]

'''Figure 5.''' Occurrence precision (<math>c = .75</math>) averaged over each piece/movement. Occurrence precision answers the following question. On average, how similar is the most similar discovered ground-truth occurrence set to a set of algorithm-output pattern occurrences?

[[File:2017_Mono_F1_occ75.png|600px]]

'''Figure 6.''' Occurrence F1 (<math>c = .75</math>) averaged over each piece/movement. Occurrence F1 is an average of occurrence precision and occurrence recall.

[[File:2017_Mono_R3.png|600px]]

'''Figure 7.''' Three-layer recall averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment recall), three-layer recall uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Mono_P3.png|600px]]

'''Figure 8.''' Three-layer precision averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment precision), three-layer precision uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Mono_TLF1.png|600px]]

'''Figure 9.''' Three-layer F1 (TLF) averaged over each piece/movement. TLF is an average of three-layer precision and three-layer recall.

[[File:Mono_Coverage.png|600px]]

'''Figure 10.''' Coverage of the discovered patterns of each piece/movement. Coverage measures the fraction of notes of a piece covered by discovered patterns.

[[File:2017_Mono_LC.png|600px]]

'''Figure 11.''' Lossless compression achieved by representing each piece/movement in terms of patterns discovered by a given algorithm. Next to patterns and their repetitions, also the uncovered notes are represented, such that the complete piece could be reconstructed from the compressed representation.

===SymPoly===

[[File:2017_Poly_R_est.png|600px]]

'''Figure 12.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Poly_P_est.png|600px]]

'''Figure 13.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

[[File:2017_Poly_F1_est.png|600px]]

'''Figure 14.''' Establishment F1 averaged over each piece/movement. Establishment F1 is an average of establishment precision and establishment recall.

[[File:2017_Poly_R_occ_75.png|600px]]

'''Figure 15.''' Occurrence recall (<math>c = .75</math>) averaged over each piece/movement. Occurrence recall answers the following question. On average, how similar is the most similar set of algorithm-output pattern occurrences to a discovered ground-truth occurrence set?

[[File:2017_Poly_P_occ_75.png|600px]]

'''Figure 16.''' Occurrence precision (<math>c = .75</math>) averaged over each piece/movement. Occurrence precision answers the following question. On average, how similar is the most similar discovered ground-truth occurrence set to a set of algorithm-output pattern occurrences?

[[File:2017_Poly_F1_occ_75.png|600px]]

'''Figure 17.''' Occurrence F1 (<math>c = .75</math>) averaged over each piece/movement. Occurrence F1 is an average of occurrence precision and occurrence recall.

[[File:2017_Poly_R3.png|600px]]

'''Figure 18.''' Three-layer recall averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment recall), three-layer recall uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Poly_P3.png|600px]]

'''Figure 19.''' Three-layer precision averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment precision), three-layer precision uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Poly_TLF1.png|600px]]

'''Figure 20.''' Three-layer F1 (TLF) averaged over each piece/movement. TLF is an average of three-layer precision and three-layer recall.

[[File:2017_Poly_Coverage.png|600px]]

'''Figure 21.''' Coverage of the discovered patterns of each piece/movement. Coverage measures the fraction of notes of a piece covered by discovered patterns.

[[File:2017_Poly_LC.png|600px]]

'''Figure 22.''' Lossless compression achieved by representing each piece/movement in terms of patterns discovered by a given algorithm. Next to patterns and their repetitions, also the uncovered notes are represented, such that the complete piece could be reconstructed from the compressed representation.

==Tables==
===SymMono===
[https://www.dropbox.com/s/ajhvidekc4xl8vp/2017_drts_mono_patterns.csv?dl=0 Click to download SymMono pattern retrieval results table]

[https://www.dropbox.com/s/ipt2j4jw0qmkvkh/2017_drts_mono_compression.csv?dl=0 Click to download SymMono compression results table]

===SymPoly===
[https://www.dropbox.com/s/h8b731nlu8v1re0/2017_drts_poly_compression.csv?dl=0 Click to download SymPoly pattern retrieval results table]

[https://www.dropbox.com/s/2q0rj40szi2ybjn/2017_drts_poly_patterns.csv?dl=0 Click to download SymPoly compression results table]

2018:Patterns for Prediction Results

2018-09-18T14:15:30Z

Tom Collins: /* Contribution */

== Introduction ==

THIS PAGE IS UNDER CONSTRUCTION!

The task: ...

== Contribution ==

...

For a more detailed introduction to the task, please see [[2018:Patterns for Prediction]].

== Training and Test Datasets ==

...

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! EN1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/EN1.pdf PDF] || [http://ericpnichols.com/ Eric Nichols]
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || Intended as 'baseline'
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || Intended as 'baseline'
|-
|}

'''Table 1.''' Algorithms submitted to Patterns for Prediction 2018.

== Results ==

An intro spiel here...

(For mathematical definitions of the various metrics, please see [[2018:Patterns_for_Prediction#Evaluation_Procedure]].)

===SymMono===
Here are some results (cf. Figures 1-3), and some interpretation. Don't forget these as well (Figures 4-6), showing something.

Remarks on runtime appropriate here too.

===SymPoly===
And so on.

==Discussion==
The new compression evaluation measures are not highly correlated with the metrics measuring retrieval of annotated patterns. This may be caused by the fact that lossless compression is lower for algorithms which find overlapping patterns: human annotators, and also some pattern discovery algorithms, may find valid overlapping patterns, as patterns may be hierarchically layered (e.g., motifs which are part of themes). We will add new, prediction based measures, and new ground truth pieces to the task next year.

Berit Janssen, Iris Ren, Tom Collins, Anja Volk.
==Figures==
===SymMono===

[[File:2017_Mono_R_est.png|600px]]

'''Figure 1.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Mono_P_est.png|600px]]

'''Figure 2.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

[[File:2017_Mono_F1_est.png|600px]]

'''Figure 3.''' Establishment F1 averaged over each piece/movement. Establishment F1 is an average of establishment precision and establishment recall.

[[File:2017_Mono_R_occ_75.png|600px]]

'''Figure 4.''' Occurrence recall (<math>c = .75</math>) averaged over each piece/movement. Occurrence recall answers the following question. On average, how similar is the most similar set of algorithm-output pattern occurrences to a discovered ground-truth occurrence set?

[[File:2017_Mono_P_occ_75.png|600px]]

'''Figure 5.''' Occurrence precision (<math>c = .75</math>) averaged over each piece/movement. Occurrence precision answers the following question. On average, how similar is the most similar discovered ground-truth occurrence set to a set of algorithm-output pattern occurrences?

[[File:2017_Mono_F1_occ75.png|600px]]

'''Figure 6.''' Occurrence F1 (<math>c = .75</math>) averaged over each piece/movement. Occurrence F1 is an average of occurrence precision and occurrence recall.

[[File:2017_Mono_R3.png|600px]]

'''Figure 7.''' Three-layer recall averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment recall), three-layer recall uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Mono_P3.png|600px]]

'''Figure 8.''' Three-layer precision averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment precision), three-layer precision uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Mono_TLF1.png|600px]]

'''Figure 9.''' Three-layer F1 (TLF) averaged over each piece/movement. TLF is an average of three-layer precision and three-layer recall.

[[File:Mono_Coverage.png|600px]]

'''Figure 10.''' Coverage of the discovered patterns of each piece/movement. Coverage measures the fraction of notes of a piece covered by discovered patterns.

[[File:2017_Mono_LC.png|600px]]

'''Figure 11.''' Lossless compression achieved by representing each piece/movement in terms of patterns discovered by a given algorithm. Next to patterns and their repetitions, also the uncovered notes are represented, such that the complete piece could be reconstructed from the compressed representation.

===SymPoly===

[[File:2017_Poly_R_est.png|600px]]

'''Figure 12.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Poly_P_est.png|600px]]

'''Figure 13.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

[[File:2017_Poly_F1_est.png|600px]]

'''Figure 14.''' Establishment F1 averaged over each piece/movement. Establishment F1 is an average of establishment precision and establishment recall.

[[File:2017_Poly_R_occ_75.png|600px]]

'''Figure 15.''' Occurrence recall (<math>c = .75</math>) averaged over each piece/movement. Occurrence recall answers the following question. On average, how similar is the most similar set of algorithm-output pattern occurrences to a discovered ground-truth occurrence set?

[[File:2017_Poly_P_occ_75.png|600px]]

'''Figure 16.''' Occurrence precision (<math>c = .75</math>) averaged over each piece/movement. Occurrence precision answers the following question. On average, how similar is the most similar discovered ground-truth occurrence set to a set of algorithm-output pattern occurrences?

[[File:2017_Poly_F1_occ_75.png|600px]]

'''Figure 17.''' Occurrence F1 (<math>c = .75</math>) averaged over each piece/movement. Occurrence F1 is an average of occurrence precision and occurrence recall.

[[File:2017_Poly_R3.png|600px]]

'''Figure 18.''' Three-layer recall averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment recall), three-layer recall uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Poly_P3.png|600px]]

'''Figure 19.''' Three-layer precision averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment precision), three-layer precision uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Poly_TLF1.png|600px]]

'''Figure 20.''' Three-layer F1 (TLF) averaged over each piece/movement. TLF is an average of three-layer precision and three-layer recall.

[[File:2017_Poly_Coverage.png|600px]]

'''Figure 21.''' Coverage of the discovered patterns of each piece/movement. Coverage measures the fraction of notes of a piece covered by discovered patterns.

[[File:2017_Poly_LC.png|600px]]

'''Figure 22.''' Lossless compression achieved by representing each piece/movement in terms of patterns discovered by a given algorithm. Next to patterns and their repetitions, also the uncovered notes are represented, such that the complete piece could be reconstructed from the compressed representation.

==Tables==
===SymMono===
[https://www.dropbox.com/s/ajhvidekc4xl8vp/2017_drts_mono_patterns.csv?dl=0 Click to download SymMono pattern retrieval results table]

[https://www.dropbox.com/s/ipt2j4jw0qmkvkh/2017_drts_mono_compression.csv?dl=0 Click to download SymMono compression results table]

===SymPoly===
[https://www.dropbox.com/s/h8b731nlu8v1re0/2017_drts_poly_compression.csv?dl=0 Click to download SymPoly pattern retrieval results table]

[https://www.dropbox.com/s/2q0rj40szi2ybjn/2017_drts_poly_patterns.csv?dl=0 Click to download SymPoly compression results table]

2018:Patterns for Prediction Results

2018-09-18T14:13:45Z

Tom Collins: /* Results */

2018:Patterns for Prediction Results

2018-09-18T14:11:09Z

Tom Collins: /* Training and Test Datasets */

== Introduction ==

THIS PAGE IS UNDER CONSTRUCTION!

The task: ...

== Contribution ==

...

[[File:mozartK282Mvt2.png|500px]]

'''Figure 1.''' Pattern discovery v segmentation. (A) Bars 1-12 of Mozart’s Piano Sonata in E-flat major K282 mvt.2, showing some ground-truth themes and repeated sections; (B-D) Three linear segmentations. Numbers below the staff in Fig. 1A and below the segmentation in Fig. 1D indicate crotchet beats, from zero for bar 1 beat 1.

For a more detailed introduction to the task, please see [[2018:Patterns for Prediction]].

== Training and Test Datasets ==

...

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! EN1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/EN1.pdf PDF] || [http://ericpnichols.com/ Eric Nichols]
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || Intended as 'baseline'
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || Intended as 'baseline'
|-
|}

'''Table 1.''' Algorithms submitted to Patterns for Prediction 2018.

== Results ==

Next to testing how well different algorithms compare when measured with the metrics introduced in earlier forms of this track, the goal of this year's run of drts was also to investigate alternative evaluation measures. Next to establishment, occurrence and three-layer measures, which determine the success of an algorithm to find annotated patterns, we also evaluated coverage and lossless compression of the algorithms, i.e., to what extent a piece is covered, or can be compressed, by discovered patterns.

(For mathematical definitions of the various metrics, please see [[2017:Discovery_of_Repeated_Themes_&_Sections#Evaluation_Procedure]].)

===SymMono===
VM1, successful in the last years' editions of drts, achieves overall highest results for the establishment metrics (cf. Figures 1-3), i.e., it finds a great number of the annotated patterns. It is only outperformed by other measures on piece 2 of the ground truth, where DM1 and DM3, and the new entry CS7 achieve higher results for establishment F1 score. CS7 is overall successful with respect to finding occurrences of patterns (cf. Figures 4-6), comparable to successful results of previous years. (NB: OL1 did not run on piece 5 of the ground truth, which is why values are missing.) The three-layer measures (Figures 7-9) show varying degrees of agreement with the ground truth for the ground truth pieces, and again CS7 and VM1 compare favourably to previous submissions.

The new measure coverage shows that VM1, DM1 and DM3 find patterns in almost all notes of the ground truth pieces (cf. Figure 10); other algorithms share this tendency for piece 2 and 5 of the ground truth, but in other pieces, most notably piece 4, CS7 and other algorithms seem to have a very sparse coverage of the piece with patterns. An abundance of overlapping patterns lead to poor lossless compression scores, and CS7 seems to find few overlapping patterns, as its score at lossless compression is overall highest (cf. Figure 11), notably also for piece 2 where it achieved high coverage, too: this means that it found most notes of the piece as patterns, and these patterns can be used to compress the piece very successfully.

Runtimes were not evaluated this year, as the comparison of the machines of new submissions to runtimes from previous years would not have been very conclusive. The new submission, CS7, completed analysis of the ground truth pieces in a few minutes on a 2 GHz Intel Core i5 machine.

===SymPoly===
DM1, the most successful measure for polyphonic discovery, again compares favourably in the establishment, occurrence and three-layer metrics this year. The new submission, CS3, outperforms DM1-DM3 in piece 1 on establishment measures (cf. Figures 12-14) , and in piece 2 on occurrence precision (cf. Figure 15).

Coverage shows again that DM1 and DM3 find patterns in almost all notes of the ground truth pieces (Figure 21). CS3, NF1 and DM2 (which was optimized for precision metrics) show lower coverage, CS3 lowest overall. CS3 achieves overall highest values in lossless compression (Figure 22).

Runtimes were not evaluated this year, as the comparison of the machines of new submissions to runtimes from previous years would not have been very conclusive. The new submission, CS3, completed analysis of the ground truth pieces in a few minutes on a 2 GHz Intel Core i5 machine.

==Discussion==
The new compression evaluation measures are not highly correlated with the metrics measuring retrieval of annotated patterns. This may be caused by the fact that lossless compression is lower for algorithms which find overlapping patterns: human annotators, and also some pattern discovery algorithms, may find valid overlapping patterns, as patterns may be hierarchically layered (e.g., motifs which are part of themes). We will add new, prediction based measures, and new ground truth pieces to the task next year.

Berit Janssen, Iris Ren, Tom Collins, Anja Volk.
==Figures==
===SymMono===

[[File:2017_Mono_R_est.png|600px]]

'''Figure 1.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Mono_P_est.png|600px]]

'''Figure 2.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

[[File:2017_Mono_F1_est.png|600px]]

'''Figure 3.''' Establishment F1 averaged over each piece/movement. Establishment F1 is an average of establishment precision and establishment recall.

[[File:2017_Mono_R_occ_75.png|600px]]

'''Figure 4.''' Occurrence recall (<math>c = .75</math>) averaged over each piece/movement. Occurrence recall answers the following question. On average, how similar is the most similar set of algorithm-output pattern occurrences to a discovered ground-truth occurrence set?

[[File:2017_Mono_P_occ_75.png|600px]]

'''Figure 5.''' Occurrence precision (<math>c = .75</math>) averaged over each piece/movement. Occurrence precision answers the following question. On average, how similar is the most similar discovered ground-truth occurrence set to a set of algorithm-output pattern occurrences?

[[File:2017_Mono_F1_occ75.png|600px]]

'''Figure 6.''' Occurrence F1 (<math>c = .75</math>) averaged over each piece/movement. Occurrence F1 is an average of occurrence precision and occurrence recall.

[[File:2017_Mono_R3.png|600px]]

'''Figure 7.''' Three-layer recall averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment recall), three-layer recall uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Mono_P3.png|600px]]

'''Figure 8.''' Three-layer precision averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment precision), three-layer precision uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Mono_TLF1.png|600px]]

'''Figure 9.''' Three-layer F1 (TLF) averaged over each piece/movement. TLF is an average of three-layer precision and three-layer recall.

[[File:Mono_Coverage.png|600px]]

'''Figure 10.''' Coverage of the discovered patterns of each piece/movement. Coverage measures the fraction of notes of a piece covered by discovered patterns.

[[File:2017_Mono_LC.png|600px]]

'''Figure 11.''' Lossless compression achieved by representing each piece/movement in terms of patterns discovered by a given algorithm. Next to patterns and their repetitions, also the uncovered notes are represented, such that the complete piece could be reconstructed from the compressed representation.

===SymPoly===

[[File:2017_Poly_R_est.png|600px]]

'''Figure 12.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Poly_P_est.png|600px]]

'''Figure 13.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

[[File:2017_Poly_F1_est.png|600px]]

'''Figure 14.''' Establishment F1 averaged over each piece/movement. Establishment F1 is an average of establishment precision and establishment recall.

[[File:2017_Poly_R_occ_75.png|600px]]

'''Figure 15.''' Occurrence recall (<math>c = .75</math>) averaged over each piece/movement. Occurrence recall answers the following question. On average, how similar is the most similar set of algorithm-output pattern occurrences to a discovered ground-truth occurrence set?

[[File:2017_Poly_P_occ_75.png|600px]]

'''Figure 16.''' Occurrence precision (<math>c = .75</math>) averaged over each piece/movement. Occurrence precision answers the following question. On average, how similar is the most similar discovered ground-truth occurrence set to a set of algorithm-output pattern occurrences?

[[File:2017_Poly_F1_occ_75.png|600px]]

'''Figure 17.''' Occurrence F1 (<math>c = .75</math>) averaged over each piece/movement. Occurrence F1 is an average of occurrence precision and occurrence recall.

[[File:2017_Poly_R3.png|600px]]

'''Figure 18.''' Three-layer recall averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment recall), three-layer recall uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Poly_P3.png|600px]]

'''Figure 19.''' Three-layer precision averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment precision), three-layer precision uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Poly_TLF1.png|600px]]

'''Figure 20.''' Three-layer F1 (TLF) averaged over each piece/movement. TLF is an average of three-layer precision and three-layer recall.

[[File:2017_Poly_Coverage.png|600px]]

'''Figure 21.''' Coverage of the discovered patterns of each piece/movement. Coverage measures the fraction of notes of a piece covered by discovered patterns.

[[File:2017_Poly_LC.png|600px]]

'''Figure 22.''' Lossless compression achieved by representing each piece/movement in terms of patterns discovered by a given algorithm. Next to patterns and their repetitions, also the uncovered notes are represented, such that the complete piece could be reconstructed from the compressed representation.

==Tables==
===SymMono===
[https://www.dropbox.com/s/ajhvidekc4xl8vp/2017_drts_mono_patterns.csv?dl=0 Click to download SymMono pattern retrieval results table]

[https://www.dropbox.com/s/ipt2j4jw0qmkvkh/2017_drts_mono_compression.csv?dl=0 Click to download SymMono compression results table]

===SymPoly===
[https://www.dropbox.com/s/h8b731nlu8v1re0/2017_drts_poly_compression.csv?dl=0 Click to download SymPoly pattern retrieval results table]

[https://www.dropbox.com/s/2q0rj40szi2ybjn/2017_drts_poly_patterns.csv?dl=0 Click to download SymPoly compression results table]

2018:Patterns for Prediction Results

2018-09-18T14:10:24Z

Tom Collins: /* Training and Test Datasets */

== Introduction ==

THIS PAGE IS UNDER CONSTRUCTION!

The task: ...

== Contribution ==

...

[[File:mozartK282Mvt2.png|500px]]

'''Figure 1.''' Pattern discovery v segmentation. (A) Bars 1-12 of Mozart’s Piano Sonata in E-flat major K282 mvt.2, showing some ground-truth themes and repeated sections; (B-D) Three linear segmentations. Numbers below the staff in Fig. 1A and below the segmentation in Fig. 1D indicate crotchet beats, from zero for bar 1 beat 1.

For a more detailed introduction to the task, please see [[2018:Patterns for Prediction]].

== Training and Test Datasets ==

...

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! EN1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/EN1.pdf PDF] || [http://ericpnichols.com/ Eric Nichols]
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || Intended as 'baseline'
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
! MM
| Markov model || style="text-align: center;" | N/A || Intended as 'baseline'
|-
|}

'''Table 1.''' Algorithms submitted to DRTS.

To compare these algorithms to the results or previous years, results of the representative versions of algorithms submitted for symbolic pattern discovery in the previous years are presented as well. The following table shows which algorithms are compared against the new submissions.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! DM1
| SIATECCompress-TLF1 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM1.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! DM2
| SIATECCompress-TLP || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM2.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! DM3
| SIATECCompress-TLR || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM3.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! NF1
| MotivesExtractor || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2014/NF1.pdf PDF] || [http://files.nyu.edu/onc202/public/ Oriol Nieto], [http://www.nyu.edu/projects/farbood/ Morwaread Farbood]
|-
! OL1'14
| PatMinr || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2014/OL1.pdf PDF] || [http://scholar.google.com/citations?user=aiYUZV4AAAAJ&hl=da Olivier Lartillot]
|-
! PLM1
| SYMCHM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/PLM1.pdf PDF] || [http://musiclab.si/ Matevž Pesek], Urša Medvešek, [http://www.cs.bham.ac.uk/~leonarda/ Aleš Leonardis], [http://www.fri.uni-lj.si/en/matija-marolt Matija Marolt]
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! DM1
| SIATECCompress-TLF1 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM1.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! DM2
| SIATECCompress-TLP || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM2.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! DM3
| SIATECCompress-TLR || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM3.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
|}
'''Table 2.''' Algorithms submitted to DRTS in previous years, evaluated for comparison.

== Results ==

Next to testing how well different algorithms compare when measured with the metrics introduced in earlier forms of this track, the goal of this year's run of drts was also to investigate alternative evaluation measures. Next to establishment, occurrence and three-layer measures, which determine the success of an algorithm to find annotated patterns, we also evaluated coverage and lossless compression of the algorithms, i.e., to what extent a piece is covered, or can be compressed, by discovered patterns.

(For mathematical definitions of the various metrics, please see [[2017:Discovery_of_Repeated_Themes_&_Sections#Evaluation_Procedure]].)

===SymMono===
VM1, successful in the last years' editions of drts, achieves overall highest results for the establishment metrics (cf. Figures 1-3), i.e., it finds a great number of the annotated patterns. It is only outperformed by other measures on piece 2 of the ground truth, where DM1 and DM3, and the new entry CS7 achieve higher results for establishment F1 score. CS7 is overall successful with respect to finding occurrences of patterns (cf. Figures 4-6), comparable to successful results of previous years. (NB: OL1 did not run on piece 5 of the ground truth, which is why values are missing.) The three-layer measures (Figures 7-9) show varying degrees of agreement with the ground truth for the ground truth pieces, and again CS7 and VM1 compare favourably to previous submissions.

The new measure coverage shows that VM1, DM1 and DM3 find patterns in almost all notes of the ground truth pieces (cf. Figure 10); other algorithms share this tendency for piece 2 and 5 of the ground truth, but in other pieces, most notably piece 4, CS7 and other algorithms seem to have a very sparse coverage of the piece with patterns. An abundance of overlapping patterns lead to poor lossless compression scores, and CS7 seems to find few overlapping patterns, as its score at lossless compression is overall highest (cf. Figure 11), notably also for piece 2 where it achieved high coverage, too: this means that it found most notes of the piece as patterns, and these patterns can be used to compress the piece very successfully.

Runtimes were not evaluated this year, as the comparison of the machines of new submissions to runtimes from previous years would not have been very conclusive. The new submission, CS7, completed analysis of the ground truth pieces in a few minutes on a 2 GHz Intel Core i5 machine.

===SymPoly===
DM1, the most successful measure for polyphonic discovery, again compares favourably in the establishment, occurrence and three-layer metrics this year. The new submission, CS3, outperforms DM1-DM3 in piece 1 on establishment measures (cf. Figures 12-14) , and in piece 2 on occurrence precision (cf. Figure 15).

Coverage shows again that DM1 and DM3 find patterns in almost all notes of the ground truth pieces (Figure 21). CS3, NF1 and DM2 (which was optimized for precision metrics) show lower coverage, CS3 lowest overall. CS3 achieves overall highest values in lossless compression (Figure 22).

Runtimes were not evaluated this year, as the comparison of the machines of new submissions to runtimes from previous years would not have been very conclusive. The new submission, CS3, completed analysis of the ground truth pieces in a few minutes on a 2 GHz Intel Core i5 machine.

==Discussion==
The new compression evaluation measures are not highly correlated with the metrics measuring retrieval of annotated patterns. This may be caused by the fact that lossless compression is lower for algorithms which find overlapping patterns: human annotators, and also some pattern discovery algorithms, may find valid overlapping patterns, as patterns may be hierarchically layered (e.g., motifs which are part of themes). We will add new, prediction based measures, and new ground truth pieces to the task next year.

Berit Janssen, Iris Ren, Tom Collins, Anja Volk.
==Figures==
===SymMono===

[[File:2017_Mono_R_est.png|600px]]

'''Figure 1.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Mono_P_est.png|600px]]

'''Figure 2.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

[[File:2017_Mono_F1_est.png|600px]]

'''Figure 3.''' Establishment F1 averaged over each piece/movement. Establishment F1 is an average of establishment precision and establishment recall.

[[File:2017_Mono_R_occ_75.png|600px]]

'''Figure 4.''' Occurrence recall (<math>c = .75</math>) averaged over each piece/movement. Occurrence recall answers the following question. On average, how similar is the most similar set of algorithm-output pattern occurrences to a discovered ground-truth occurrence set?

[[File:2017_Mono_P_occ_75.png|600px]]

'''Figure 5.''' Occurrence precision (<math>c = .75</math>) averaged over each piece/movement. Occurrence precision answers the following question. On average, how similar is the most similar discovered ground-truth occurrence set to a set of algorithm-output pattern occurrences?

[[File:2017_Mono_F1_occ75.png|600px]]

'''Figure 6.''' Occurrence F1 (<math>c = .75</math>) averaged over each piece/movement. Occurrence F1 is an average of occurrence precision and occurrence recall.

[[File:2017_Mono_R3.png|600px]]

'''Figure 7.''' Three-layer recall averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment recall), three-layer recall uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Mono_P3.png|600px]]

'''Figure 8.''' Three-layer precision averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment precision), three-layer precision uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Mono_TLF1.png|600px]]

'''Figure 9.''' Three-layer F1 (TLF) averaged over each piece/movement. TLF is an average of three-layer precision and three-layer recall.

[[File:Mono_Coverage.png|600px]]

'''Figure 10.''' Coverage of the discovered patterns of each piece/movement. Coverage measures the fraction of notes of a piece covered by discovered patterns.

[[File:2017_Mono_LC.png|600px]]

'''Figure 11.''' Lossless compression achieved by representing each piece/movement in terms of patterns discovered by a given algorithm. Next to patterns and their repetitions, also the uncovered notes are represented, such that the complete piece could be reconstructed from the compressed representation.

===SymPoly===

[[File:2017_Poly_R_est.png|600px]]

'''Figure 12.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Poly_P_est.png|600px]]

'''Figure 13.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

[[File:2017_Poly_F1_est.png|600px]]

'''Figure 14.''' Establishment F1 averaged over each piece/movement. Establishment F1 is an average of establishment precision and establishment recall.

[[File:2017_Poly_R_occ_75.png|600px]]

'''Figure 15.''' Occurrence recall (<math>c = .75</math>) averaged over each piece/movement. Occurrence recall answers the following question. On average, how similar is the most similar set of algorithm-output pattern occurrences to a discovered ground-truth occurrence set?

[[File:2017_Poly_P_occ_75.png|600px]]

'''Figure 16.''' Occurrence precision (<math>c = .75</math>) averaged over each piece/movement. Occurrence precision answers the following question. On average, how similar is the most similar discovered ground-truth occurrence set to a set of algorithm-output pattern occurrences?

[[File:2017_Poly_F1_occ_75.png|600px]]

'''Figure 17.''' Occurrence F1 (<math>c = .75</math>) averaged over each piece/movement. Occurrence F1 is an average of occurrence precision and occurrence recall.

[[File:2017_Poly_R3.png|600px]]

'''Figure 18.''' Three-layer recall averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment recall), three-layer recall uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Poly_P3.png|600px]]

'''Figure 19.''' Three-layer precision averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment precision), three-layer precision uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Poly_TLF1.png|600px]]

'''Figure 20.''' Three-layer F1 (TLF) averaged over each piece/movement. TLF is an average of three-layer precision and three-layer recall.

[[File:2017_Poly_Coverage.png|600px]]

'''Figure 21.''' Coverage of the discovered patterns of each piece/movement. Coverage measures the fraction of notes of a piece covered by discovered patterns.

[[File:2017_Poly_LC.png|600px]]

'''Figure 22.''' Lossless compression achieved by representing each piece/movement in terms of patterns discovered by a given algorithm. Next to patterns and their repetitions, also the uncovered notes are represented, such that the complete piece could be reconstructed from the compressed representation.

==Tables==
===SymMono===
[https://www.dropbox.com/s/ajhvidekc4xl8vp/2017_drts_mono_patterns.csv?dl=0 Click to download SymMono pattern retrieval results table]

[https://www.dropbox.com/s/ipt2j4jw0qmkvkh/2017_drts_mono_compression.csv?dl=0 Click to download SymMono compression results table]

===SymPoly===
[https://www.dropbox.com/s/h8b731nlu8v1re0/2017_drts_poly_compression.csv?dl=0 Click to download SymPoly pattern retrieval results table]

[https://www.dropbox.com/s/2q0rj40szi2ybjn/2017_drts_poly_patterns.csv?dl=0 Click to download SymPoly compression results table]

2018:Patterns for Prediction Results

2018-09-18T14:08:42Z

Tom Collins: /* Training and Test Datasets */

== Introduction ==

THIS PAGE IS UNDER CONSTRUCTION!

The task: ...

== Contribution ==

...

[[File:mozartK282Mvt2.png|500px]]

'''Figure 1.''' Pattern discovery v segmentation. (A) Bars 1-12 of Mozart’s Piano Sonata in E-flat major K282 mvt.2, showing some ground-truth themes and repeated sections; (B-D) Three linear segmentations. Numbers below the staff in Fig. 1A and below the segmentation in Fig. 1D indicate crotchet beats, from zero for bar 1 beat 1.

For a more detailed introduction to the task, please see [[2018:Patterns for Prediction]].

== Training and Test Datasets ==

...

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! EN1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/EN1.pdf PDF] || [http://ericpnichols.com/ Eric Nichols]
|-
! FC1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/FC1.pdf PDF] || [https://scholar.google.com/citations?user=rpZVNKYAAAAJ&hl=en Florian Colombo]
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! CS7
| FindThemeAndSection_Poly || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2017/CS7.pdf PDF] || [ Tsung-Ping Chen]
|-
|}

'''Table 1.''' Algorithms submitted to DRTS.

To compare these algorithms to the results or previous years, results of the representative versions of algorithms submitted for symbolic pattern discovery in the previous years are presented as well. The following table shows which algorithms are compared against the new submissions.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! DM1
| SIATECCompress-TLF1 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM1.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! DM2
| SIATECCompress-TLP || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM2.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! DM3
| SIATECCompress-TLR || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM3.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! NF1
| MotivesExtractor || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2014/NF1.pdf PDF] || [http://files.nyu.edu/onc202/public/ Oriol Nieto], [http://www.nyu.edu/projects/farbood/ Morwaread Farbood]
|-
! OL1'14
| PatMinr || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2014/OL1.pdf PDF] || [http://scholar.google.com/citations?user=aiYUZV4AAAAJ&hl=da Olivier Lartillot]
|-
! PLM1
| SYMCHM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/PLM1.pdf PDF] || [http://musiclab.si/ Matevž Pesek], Urša Medvešek, [http://www.cs.bham.ac.uk/~leonarda/ Aleš Leonardis], [http://www.fri.uni-lj.si/en/matija-marolt Matija Marolt]
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! DM1
| SIATECCompress-TLF1 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM1.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! DM2
| SIATECCompress-TLP || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM2.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! DM3
| SIATECCompress-TLR || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM3.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
|}
'''Table 2.''' Algorithms submitted to DRTS in previous years, evaluated for comparison.

== Results ==

Next to testing how well different algorithms compare when measured with the metrics introduced in earlier forms of this track, the goal of this year's run of drts was also to investigate alternative evaluation measures. Next to establishment, occurrence and three-layer measures, which determine the success of an algorithm to find annotated patterns, we also evaluated coverage and lossless compression of the algorithms, i.e., to what extent a piece is covered, or can be compressed, by discovered patterns.

(For mathematical definitions of the various metrics, please see [[2017:Discovery_of_Repeated_Themes_&_Sections#Evaluation_Procedure]].)

===SymMono===
VM1, successful in the last years' editions of drts, achieves overall highest results for the establishment metrics (cf. Figures 1-3), i.e., it finds a great number of the annotated patterns. It is only outperformed by other measures on piece 2 of the ground truth, where DM1 and DM3, and the new entry CS7 achieve higher results for establishment F1 score. CS7 is overall successful with respect to finding occurrences of patterns (cf. Figures 4-6), comparable to successful results of previous years. (NB: OL1 did not run on piece 5 of the ground truth, which is why values are missing.) The three-layer measures (Figures 7-9) show varying degrees of agreement with the ground truth for the ground truth pieces, and again CS7 and VM1 compare favourably to previous submissions.

The new measure coverage shows that VM1, DM1 and DM3 find patterns in almost all notes of the ground truth pieces (cf. Figure 10); other algorithms share this tendency for piece 2 and 5 of the ground truth, but in other pieces, most notably piece 4, CS7 and other algorithms seem to have a very sparse coverage of the piece with patterns. An abundance of overlapping patterns lead to poor lossless compression scores, and CS7 seems to find few overlapping patterns, as its score at lossless compression is overall highest (cf. Figure 11), notably also for piece 2 where it achieved high coverage, too: this means that it found most notes of the piece as patterns, and these patterns can be used to compress the piece very successfully.

Runtimes were not evaluated this year, as the comparison of the machines of new submissions to runtimes from previous years would not have been very conclusive. The new submission, CS7, completed analysis of the ground truth pieces in a few minutes on a 2 GHz Intel Core i5 machine.

===SymPoly===
DM1, the most successful measure for polyphonic discovery, again compares favourably in the establishment, occurrence and three-layer metrics this year. The new submission, CS3, outperforms DM1-DM3 in piece 1 on establishment measures (cf. Figures 12-14) , and in piece 2 on occurrence precision (cf. Figure 15).

Coverage shows again that DM1 and DM3 find patterns in almost all notes of the ground truth pieces (Figure 21). CS3, NF1 and DM2 (which was optimized for precision metrics) show lower coverage, CS3 lowest overall. CS3 achieves overall highest values in lossless compression (Figure 22).

Runtimes were not evaluated this year, as the comparison of the machines of new submissions to runtimes from previous years would not have been very conclusive. The new submission, CS3, completed analysis of the ground truth pieces in a few minutes on a 2 GHz Intel Core i5 machine.

==Discussion==
The new compression evaluation measures are not highly correlated with the metrics measuring retrieval of annotated patterns. This may be caused by the fact that lossless compression is lower for algorithms which find overlapping patterns: human annotators, and also some pattern discovery algorithms, may find valid overlapping patterns, as patterns may be hierarchically layered (e.g., motifs which are part of themes). We will add new, prediction based measures, and new ground truth pieces to the task next year.

Berit Janssen, Iris Ren, Tom Collins, Anja Volk.
==Figures==
===SymMono===

[[File:2017_Mono_R_est.png|600px]]

'''Figure 1.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Mono_P_est.png|600px]]

'''Figure 2.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

[[File:2017_Mono_F1_est.png|600px]]

'''Figure 3.''' Establishment F1 averaged over each piece/movement. Establishment F1 is an average of establishment precision and establishment recall.

[[File:2017_Mono_R_occ_75.png|600px]]

'''Figure 4.''' Occurrence recall (<math>c = .75</math>) averaged over each piece/movement. Occurrence recall answers the following question. On average, how similar is the most similar set of algorithm-output pattern occurrences to a discovered ground-truth occurrence set?

[[File:2017_Mono_P_occ_75.png|600px]]

'''Figure 5.''' Occurrence precision (<math>c = .75</math>) averaged over each piece/movement. Occurrence precision answers the following question. On average, how similar is the most similar discovered ground-truth occurrence set to a set of algorithm-output pattern occurrences?

[[File:2017_Mono_F1_occ75.png|600px]]

'''Figure 6.''' Occurrence F1 (<math>c = .75</math>) averaged over each piece/movement. Occurrence F1 is an average of occurrence precision and occurrence recall.

[[File:2017_Mono_R3.png|600px]]

'''Figure 7.''' Three-layer recall averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment recall), three-layer recall uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Mono_P3.png|600px]]

'''Figure 8.''' Three-layer precision averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment precision), three-layer precision uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Mono_TLF1.png|600px]]

'''Figure 9.''' Three-layer F1 (TLF) averaged over each piece/movement. TLF is an average of three-layer precision and three-layer recall.

[[File:Mono_Coverage.png|600px]]

'''Figure 10.''' Coverage of the discovered patterns of each piece/movement. Coverage measures the fraction of notes of a piece covered by discovered patterns.

[[File:2017_Mono_LC.png|600px]]

'''Figure 11.''' Lossless compression achieved by representing each piece/movement in terms of patterns discovered by a given algorithm. Next to patterns and their repetitions, also the uncovered notes are represented, such that the complete piece could be reconstructed from the compressed representation.

===SymPoly===

[[File:2017_Poly_R_est.png|600px]]

'''Figure 12.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Poly_P_est.png|600px]]

'''Figure 13.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

[[File:2017_Poly_F1_est.png|600px]]

'''Figure 14.''' Establishment F1 averaged over each piece/movement. Establishment F1 is an average of establishment precision and establishment recall.

[[File:2017_Poly_R_occ_75.png|600px]]

'''Figure 15.''' Occurrence recall (<math>c = .75</math>) averaged over each piece/movement. Occurrence recall answers the following question. On average, how similar is the most similar set of algorithm-output pattern occurrences to a discovered ground-truth occurrence set?

[[File:2017_Poly_P_occ_75.png|600px]]

'''Figure 16.''' Occurrence precision (<math>c = .75</math>) averaged over each piece/movement. Occurrence precision answers the following question. On average, how similar is the most similar discovered ground-truth occurrence set to a set of algorithm-output pattern occurrences?

[[File:2017_Poly_F1_occ_75.png|600px]]

'''Figure 17.''' Occurrence F1 (<math>c = .75</math>) averaged over each piece/movement. Occurrence F1 is an average of occurrence precision and occurrence recall.

[[File:2017_Poly_R3.png|600px]]

'''Figure 18.''' Three-layer recall averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment recall), three-layer recall uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Poly_P3.png|600px]]

'''Figure 19.''' Three-layer precision averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment precision), three-layer precision uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Poly_TLF1.png|600px]]

'''Figure 20.''' Three-layer F1 (TLF) averaged over each piece/movement. TLF is an average of three-layer precision and three-layer recall.

[[File:2017_Poly_Coverage.png|600px]]

'''Figure 21.''' Coverage of the discovered patterns of each piece/movement. Coverage measures the fraction of notes of a piece covered by discovered patterns.

[[File:2017_Poly_LC.png|600px]]

'''Figure 22.''' Lossless compression achieved by representing each piece/movement in terms of patterns discovered by a given algorithm. Next to patterns and their repetitions, also the uncovered notes are represented, such that the complete piece could be reconstructed from the compressed representation.

==Tables==
===SymMono===
[https://www.dropbox.com/s/ajhvidekc4xl8vp/2017_drts_mono_patterns.csv?dl=0 Click to download SymMono pattern retrieval results table]

[https://www.dropbox.com/s/ipt2j4jw0qmkvkh/2017_drts_mono_compression.csv?dl=0 Click to download SymMono compression results table]

===SymPoly===
[https://www.dropbox.com/s/h8b731nlu8v1re0/2017_drts_poly_compression.csv?dl=0 Click to download SymPoly pattern retrieval results table]

[https://www.dropbox.com/s/2q0rj40szi2ybjn/2017_drts_poly_patterns.csv?dl=0 Click to download SymPoly compression results table]

2018:Patterns for Prediction Results

2018-09-18T14:06:02Z

Tom Collins: Created page with "== Introduction == THIS PAGE IS UNDER CONSTRUCTION! The task: ... == Contribution == ... 500px '''Figure 1.''' Pattern discovery v segmentation..."

== Introduction ==

THIS PAGE IS UNDER CONSTRUCTION!

The task: ...

== Contribution ==

...

[[File:mozartK282Mvt2.png|500px]]

'''Figure 1.''' Pattern discovery v segmentation. (A) Bars 1-12 of Mozart’s Piano Sonata in E-flat major K282 mvt.2, showing some ground-truth themes and repeated sections; (B-D) Three linear segmentations. Numbers below the staff in Fig. 1A and below the segmentation in Fig. 1D indicate crotchet beats, from zero for bar 1 beat 1.

For a more detailed introduction to the task, please see [[2018:Patterns for Prediction]].

== Training and Test Datasets ==

...

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! EN_1
| Algo name here || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2018/EN1.pdf PDF] || [http://ericpnichols.com/ Eric Nichols]
|-
! VM1'14
| VM1 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/VM1.pdf PDF] || [http://personprofil.aau.dk/128103 Gissel Velarde], [http://www.titanmusic.com/ David Meredith]
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! CS7
| FindThemeAndSection_Poly || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2017/CS7.pdf PDF] || [ Tsung-Ping Chen]
|-
|}

'''Table 1.''' Algorithms submitted to DRTS.

To compare these algorithms to the results or previous years, results of the representative versions of algorithms submitted for symbolic pattern discovery in the previous years are presented as well. The following table shows which algorithms are compared against the new submissions.

{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="200" | Submission name
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
|- style="background: green;"
! Task Version
! symMono
!
!
|-
! DM1
| SIATECCompress-TLF1 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM1.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! DM2
| SIATECCompress-TLP || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM2.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! DM3
| SIATECCompress-TLR || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM3.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! NF1
| MotivesExtractor || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2014/NF1.pdf PDF] || [http://files.nyu.edu/onc202/public/ Oriol Nieto], [http://www.nyu.edu/projects/farbood/ Morwaread Farbood]
|-
! OL1'14
| PatMinr || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2014/OL1.pdf PDF] || [http://scholar.google.com/citations?user=aiYUZV4AAAAJ&hl=da Olivier Lartillot]
|-
! PLM1
| SYMCHM || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/PLM1.pdf PDF] || [http://musiclab.si/ Matevž Pesek], Urša Medvešek, [http://www.cs.bham.ac.uk/~leonarda/ Aleš Leonardis], [http://www.fri.uni-lj.si/en/matija-marolt Matija Marolt]
|-
|- style="background: green;"
! Task Version
! symPoly
!
!
|-
! DM1
| SIATECCompress-TLF1 || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM1.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! DM2
| SIATECCompress-TLP || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM2.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
! DM3
| SIATECCompress-TLR || style="text-align: center;" | [https://www.music-ir.org/mirex/abstracts/2016/DM3.pdf PDF] || [http://www.titanmusic.com/ David Meredith]
|-
|}
'''Table 2.''' Algorithms submitted to DRTS in previous years, evaluated for comparison.

== Results ==

Next to testing how well different algorithms compare when measured with the metrics introduced in earlier forms of this track, the goal of this year's run of drts was also to investigate alternative evaluation measures. Next to establishment, occurrence and three-layer measures, which determine the success of an algorithm to find annotated patterns, we also evaluated coverage and lossless compression of the algorithms, i.e., to what extent a piece is covered, or can be compressed, by discovered patterns.

(For mathematical definitions of the various metrics, please see [[2017:Discovery_of_Repeated_Themes_&_Sections#Evaluation_Procedure]].)

===SymMono===
VM1, successful in the last years' editions of drts, achieves overall highest results for the establishment metrics (cf. Figures 1-3), i.e., it finds a great number of the annotated patterns. It is only outperformed by other measures on piece 2 of the ground truth, where DM1 and DM3, and the new entry CS7 achieve higher results for establishment F1 score. CS7 is overall successful with respect to finding occurrences of patterns (cf. Figures 4-6), comparable to successful results of previous years. (NB: OL1 did not run on piece 5 of the ground truth, which is why values are missing.) The three-layer measures (Figures 7-9) show varying degrees of agreement with the ground truth for the ground truth pieces, and again CS7 and VM1 compare favourably to previous submissions.

The new measure coverage shows that VM1, DM1 and DM3 find patterns in almost all notes of the ground truth pieces (cf. Figure 10); other algorithms share this tendency for piece 2 and 5 of the ground truth, but in other pieces, most notably piece 4, CS7 and other algorithms seem to have a very sparse coverage of the piece with patterns. An abundance of overlapping patterns lead to poor lossless compression scores, and CS7 seems to find few overlapping patterns, as its score at lossless compression is overall highest (cf. Figure 11), notably also for piece 2 where it achieved high coverage, too: this means that it found most notes of the piece as patterns, and these patterns can be used to compress the piece very successfully.

Runtimes were not evaluated this year, as the comparison of the machines of new submissions to runtimes from previous years would not have been very conclusive. The new submission, CS7, completed analysis of the ground truth pieces in a few minutes on a 2 GHz Intel Core i5 machine.

===SymPoly===
DM1, the most successful measure for polyphonic discovery, again compares favourably in the establishment, occurrence and three-layer metrics this year. The new submission, CS3, outperforms DM1-DM3 in piece 1 on establishment measures (cf. Figures 12-14) , and in piece 2 on occurrence precision (cf. Figure 15).

Coverage shows again that DM1 and DM3 find patterns in almost all notes of the ground truth pieces (Figure 21). CS3, NF1 and DM2 (which was optimized for precision metrics) show lower coverage, CS3 lowest overall. CS3 achieves overall highest values in lossless compression (Figure 22).

Runtimes were not evaluated this year, as the comparison of the machines of new submissions to runtimes from previous years would not have been very conclusive. The new submission, CS3, completed analysis of the ground truth pieces in a few minutes on a 2 GHz Intel Core i5 machine.

==Discussion==
The new compression evaluation measures are not highly correlated with the metrics measuring retrieval of annotated patterns. This may be caused by the fact that lossless compression is lower for algorithms which find overlapping patterns: human annotators, and also some pattern discovery algorithms, may find valid overlapping patterns, as patterns may be hierarchically layered (e.g., motifs which are part of themes). We will add new, prediction based measures, and new ground truth pieces to the task next year.

Berit Janssen, Iris Ren, Tom Collins, Anja Volk.
==Figures==
===SymMono===

[[File:2017_Mono_R_est.png|600px]]

'''Figure 1.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Mono_P_est.png|600px]]

'''Figure 2.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

[[File:2017_Mono_F1_est.png|600px]]

'''Figure 3.''' Establishment F1 averaged over each piece/movement. Establishment F1 is an average of establishment precision and establishment recall.

[[File:2017_Mono_R_occ_75.png|600px]]

'''Figure 4.''' Occurrence recall (<math>c = .75</math>) averaged over each piece/movement. Occurrence recall answers the following question. On average, how similar is the most similar set of algorithm-output pattern occurrences to a discovered ground-truth occurrence set?

[[File:2017_Mono_P_occ_75.png|600px]]

'''Figure 5.''' Occurrence precision (<math>c = .75</math>) averaged over each piece/movement. Occurrence precision answers the following question. On average, how similar is the most similar discovered ground-truth occurrence set to a set of algorithm-output pattern occurrences?

[[File:2017_Mono_F1_occ75.png|600px]]

'''Figure 6.''' Occurrence F1 (<math>c = .75</math>) averaged over each piece/movement. Occurrence F1 is an average of occurrence precision and occurrence recall.

[[File:2017_Mono_R3.png|600px]]

'''Figure 7.''' Three-layer recall averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment recall), three-layer recall uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Mono_P3.png|600px]]

'''Figure 8.''' Three-layer precision averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment precision), three-layer precision uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Mono_TLF1.png|600px]]

'''Figure 9.''' Three-layer F1 (TLF) averaged over each piece/movement. TLF is an average of three-layer precision and three-layer recall.

[[File:Mono_Coverage.png|600px]]

'''Figure 10.''' Coverage of the discovered patterns of each piece/movement. Coverage measures the fraction of notes of a piece covered by discovered patterns.

[[File:2017_Mono_LC.png|600px]]

'''Figure 11.''' Lossless compression achieved by representing each piece/movement in terms of patterns discovered by a given algorithm. Next to patterns and their repetitions, also the uncovered notes are represented, such that the complete piece could be reconstructed from the compressed representation.

===SymPoly===

[[File:2017_Poly_R_est.png|600px]]

'''Figure 12.''' Establishment recall averaged over each piece/movement. Establishment recall answers the following question. On average, how similar is the most similar algorithm-output pattern to a ground-truth pattern prototype?

[[File:2017_Poly_P_est.png|600px]]

'''Figure 13.''' Establishment precision averaged over each piece/movement. Establishment precision answers the following question. On average, how similar is the most similar ground-truth pattern prototype to an algorithm-output pattern?

[[File:2017_Poly_F1_est.png|600px]]

'''Figure 14.''' Establishment F1 averaged over each piece/movement. Establishment F1 is an average of establishment precision and establishment recall.

[[File:2017_Poly_R_occ_75.png|600px]]

'''Figure 15.''' Occurrence recall (<math>c = .75</math>) averaged over each piece/movement. Occurrence recall answers the following question. On average, how similar is the most similar set of algorithm-output pattern occurrences to a discovered ground-truth occurrence set?

[[File:2017_Poly_P_occ_75.png|600px]]

'''Figure 16.''' Occurrence precision (<math>c = .75</math>) averaged over each piece/movement. Occurrence precision answers the following question. On average, how similar is the most similar discovered ground-truth occurrence set to a set of algorithm-output pattern occurrences?

[[File:2017_Poly_F1_occ_75.png|600px]]

'''Figure 17.''' Occurrence F1 (<math>c = .75</math>) averaged over each piece/movement. Occurrence F1 is an average of occurrence precision and occurrence recall.

[[File:2017_Poly_R3.png|600px]]

'''Figure 18.''' Three-layer recall averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment recall), three-layer recall uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Poly_P3.png|600px]]

'''Figure 19.''' Three-layer precision averaged over each piece/movement. Rather than using <math>|P \cap Q|/\max\{|P|, |Q|\}</math> as a similarity measure (which is the default for establishment precision), three-layer precision uses <math>2|P \cap Q|/(|P| + |Q|)</math>, which is a kind of F1 measure.

[[File:2017_Poly_TLF1.png|600px]]

'''Figure 20.''' Three-layer F1 (TLF) averaged over each piece/movement. TLF is an average of three-layer precision and three-layer recall.

[[File:2017_Poly_Coverage.png|600px]]

'''Figure 21.''' Coverage of the discovered patterns of each piece/movement. Coverage measures the fraction of notes of a piece covered by discovered patterns.

[[File:2017_Poly_LC.png|600px]]

'''Figure 22.''' Lossless compression achieved by representing each piece/movement in terms of patterns discovered by a given algorithm. Next to patterns and their repetitions, also the uncovered notes are represented, such that the complete piece could be reconstructed from the compressed representation.

==Tables==
===SymMono===
[https://www.dropbox.com/s/ajhvidekc4xl8vp/2017_drts_mono_patterns.csv?dl=0 Click to download SymMono pattern retrieval results table]

[https://www.dropbox.com/s/ipt2j4jw0qmkvkh/2017_drts_mono_compression.csv?dl=0 Click to download SymMono compression results table]

===SymPoly===
[https://www.dropbox.com/s/h8b731nlu8v1re0/2017_drts_poly_compression.csv?dl=0 Click to download SymPoly pattern retrieval results table]

[https://www.dropbox.com/s/2q0rj40szi2ybjn/2017_drts_poly_patterns.csv?dl=0 Click to download SymPoly compression results table]

2018:MIREX2018 Results

2018-09-18T13:54:50Z

Tom Collins: /* Results by Task (More results are coming) */

==Overall Results Poster==
Coming soon.

==Results by Task (More results are coming) ==
* Audio Melody Extraction Results
** [https://nema.lis.illinois.edu/nema_out/mirex2018/results/ame/adc04/ ADC04 Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2018/results/ame/mrx05/ MIREX05 Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2018/results/ame/ind08/ INDIAN08 Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2018/results/ame/mrx09_0db/ MIREX09 0dB Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2018/results/ame/mrx09_m5db/ MIREX09 -5dB Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2018/results/ame/mrx09_p5db/ MIREX09 +5dB Dataset]  
** [https://nema.lis.illinois.edu/nema_out/mirex2018/results/ame/orchset/ ORCHSET15 Dataset]  

* [[2018:Audio Fingerprinting Results|Audio Fingerprinting Results]]
*Train-Test Task Set
** [https://www.music-ir.org/nema_out/mirex2018/results/act/mood_report/index.html Audio Music Mood Classification Results ]  
** [https://www.music-ir.org/nema_out/mirex2018/results/act/kmooda_report/ Audio KPOP Mood (Annotated by American Annotators) Classification Results ]  
** [https://www.music-ir.org/nema_out/mirex2018/results/act/kmoodk_report/ Audio KPOP Mood (Annotated by Korean Annotators) Classification Results ]  

* [[2018:Automatic Lyrics-to-Audio Alignment Results | Automatic Lyrics-to-Audio Alignment Results]]

* Multiple Fundamental Frequency Estimation & Tracking Results
** [[2018:Multiple_Fundamental_Frequency_Estimation_%26_Tracking_Results_%2D_MIREX_Dataset | MIREX Dataset]]  
** [[2018:Multiple_Fundamental_Frequency_Estimation_%26_Tracking_Results_%2D_Su_Dataset | Su Dataset]]  

* [https://nema.lis.illinois.edu/nema_out/mirex2018/results/aod/ Audio Onset Detection Results]  
* [[2018:Audio Cover Song Identification Results|Audio Cover Song Identification Results]]

* [[2018:Patterns for Prediction Results|Patterns for Prediction Results]]

2018:Patterns for Prediction

2018-07-31T13:51:38Z

Tom Collins: /* Submission Format */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "A" or "B" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "A" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder A, and column "B" similarly for the continuation in folder B.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*input representation(s), should be 1 or more of "audio, monophonic"; "audio, polyphonic"; "symbolic, monophonic"; "symbolic, polyphonic";
*subtasks you would like your algorithm to be evaluated on, should be "1", "2", or "1 and 2" (see first sentences of [[2018:Patterns_for_Prediction#Description]] for a reminder);
*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-31T13:42:54Z

Tom Collins: /* Submission Format */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column "2" similarly for the continuation in folder 2.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*input representation(s), should be 1 or more of "audio, monophonic"; "audio, polyphonic"; "symbolic, monophonic"; "symbolic, polyphonic";
*subtasks you would like your algorithm to be evaluated on, should be "1", "2", or "1 and 2" (see first sentences of [[2018:Patterns_for_Prediction#Description]] for a reminder);
*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-31T13:39:10Z

Tom Collins: /* Submission Format */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
In terms of input representations, we will evaluate 4 largely independent versions of the task: audio, monophonic; audio, polyphonic; symbolic, monophonic; symbolic, polyphonic. Participants may submit algorithms to 1 or more of these versions, and should list these versions clearly in their readme. '''Irrespective of input representation''', all output for subtask (1) should be in "ontime", "MNN" CSV files. The CSV may contain other information, but "ontime" and "MNN" should be in the first two columns, respectively. All output for subtask (2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column "2" similarly for the continuation in folder 2.

All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-31T13:36:46Z

Tom Collins: /* Questions (Q), Answers (A), and Comments (C) */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = \mathrm{bpm}</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-31T13:36:18Z

Tom Collins: /* Questions (Q), Answers (A), and Comments (C) */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations. (You could do this by looking at the <math>u = bpm</math> value, and then you would know that the main beats in the WAV file are at <math>0, 60/u, 2 \cdot 60/u,\ldots</math> sec.)

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-31T13:31:55Z

Tom Collins: /* Questions (Q), Answers (A), and Comments (C) */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[2018:Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations.

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-31T13:31:28Z

Tom Collins: /* Questions (Q), Answers (A), and Comments (C) */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [[Patterns_for_Prediction#Submission_Format]], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations.

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-31T13:30:46Z

Tom Collins: /* Questions (Q), Answers (A), and Comments (C) */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We have clarified, at the top of [Submission Format], that submissions in 1-4 representational categories are acceptable. It's also OK, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations.

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-31T13:15:06Z

Tom Collins: /* Data */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files. Therefore, the audio and symbolic variants are identical to one another, apart from the presence of WAV files. All other variants are non-identical, although there may be some overlap, as they were all chosen from LMD originally.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We are happy to receive submissions of algorithms that process audio or symbolic data, but not both. That said, it's fine, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations.

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-30T19:38:50Z

Tom Collins: /* Questions (Q), Answers (A), and Comments (C) */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. Would any of your participants be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We are happy to receive submissions of algorithms that process audio or symbolic data, but not both. That said, it's fine, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations.

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-30T19:33:55Z

Tom Collins: /* Questions (Q), Answers (A), and Comments (C) */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

C. to_the_sun--at--gmx.com writes: "This is exactly what I'm interested in! I have an open-source project called The Amanuensis (https://github.com/to-the-sun/amanuensis) that uses an algorithm to predict where in the future beats are likely to fall.

"Amanuensis constructs a cohesive song structure, using the best of what you give it, looping around you and growing in real-time as you play. All you have to do is jam and fully written songs will flow out behind you wherever you go.

"My algorithm right now is only rhythm-based and I'm sure it's not sophisticated enough to be entered into your contest, but I would be very interested in the possibility of using any of the algorithms that are, in place of mine in The Amanuensis. I wonder if any of your participants would be interested in some collaboration? What I can bring to the table would be a real-world application for these algorithms, already set for implementation."

Q. I'm interested in performing this task on the symbolic dataset, but I don't have an audio-based algorithm. It was unclear to me if the inputs are audio, symbolic, both, or either.

A. We are happy to receive submissions of algorithms that process audio or symbolic data, but not both. That said, it's fine, say, for an audio-based algorithm to make use of the descriptor file in order to determine beat locations.

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-30T19:27:58Z

Tom Collins: /* Questions and Comments */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions (Q), Answers (A), and Comments (C)==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-30T18:58:53Z

Tom Collins: /* Data */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
("Large" datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions and Comments==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-30T18:57:55Z

Tom Collins: /* Data */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_large.zip audio, polyphonic, large] (13.44 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
(“Large” datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac".)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions and Comments==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-30T18:25:45Z

Tom Collins: /* Data */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*audio, polyphonic, large (MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_large.zip symbolic, polyphonic, large] (64 MB)
(“Large” datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac". The remaining “large” datasets should be available by Friday August 3rd.)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions and Comments==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.

2018:Patterns for Prediction

2018-07-30T15:16:29Z

Tom Collins: /* Data */

== Description ==
'''In brief''': (1) Algorithms that take an excerpt of music as input (the ''prime''), and output a predicted ''continuation'' of the excerpt.

(2) Additionally or alternatively, algorithms that take a prime and one or more continuations as input, and output the likelihood that each continuation is the genuine extension of the prime.

Your task captains are Iris Yuping Ren (yuping.ren.iris), [http://beritjanssen.com/ Berit Janssen] (berit.janssen), and [http://tomcollinsresearch.net/ Tom Collins] (tomthecollins all at gmail.com). Feel free to copy in all three of us if you have questions/comments.

The '''submission deadline''' is August 25th. With the deadline being so close, '''we intend this task description and datasets provided below to help stimulate discourse''' that will lead to wide participation in 2019.

'''Relation to the pattern discovery task''': The Patterns for Prediction task is an offshoot of the [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections Discovery of Repeated Themes & Sections task] (2013-2017). We hope to run the former (Patterns for Prediction) task and pause the latter (Discovery of Repeated Themes & Sections). In future years we may run both.

'''In more detail''': One facet of human nature comprises the tendency to form predictions about what will happen in the future (Huron, 2006). Music, consisting of complex temporally extended sequences, provides an excellent setting for the study of prediction, and this topic has received attention from fields including but not limited to psychology (Collins, Tillmann, et al., 2014; Janssen, Burgoyne and Honing, 2017; Schellenberg, 1997; Schmukler, 1989), neuroscience (Koelsch et al., 2005), music theory (Gjerdingen, 2007; Lerdahl & Jackendoff, 1983; Rohrmeier & Pearce, 2018), music informatics (Conklin & Witten, 1995; Cherla et al., 2013), and machine learning (Elmsley, Weyde, & Armstrong, 2017; Hadjeres, Pachet, & Nielsen, 2016; Gjerdingen, 1989; Roberts et al., 2018; Sturm et al., 2016). In particular, we are interested in the way exact and inexact repetition occurs over the short, medium, and long term in pieces of music (Margulis, 2014; Widmer, 2016), and how these repetitions may interact with "schematic, veridical, dynamic, and conscious" expectations (Huron, 2006) in order to form a basis for successful prediction.

We call for algorithms that may model such expectations so as to predict the next musical events based on given, foregoing events (the prime). We invite contributions from all fields mentioned above (not just pattern discovery researchers), as different approaches may be complementary in terms of predicting correct continuations of a musical excerpt. We would like to explore these various approaches to music prediction in a MIREX task. For subtask (1) above (see "In brief"), the development and test datasets will contain an excerpt of a piece up until a cut-off point, after which the algorithm is supposed to generate the next ''N'' musical events up until 10 quarter-note beats, and we will quantitatively evaluate the extent to which an algorithm's continuation corresponds to the genuine continuation of the piece. For subtask (2), in addition to containing a prime, the development and test datasets will also contain continuations of the prime, one of which will be genuine, and the algorithm should rate the likelihood that each continuation is the genuine extension of the prime, which again will be evaluated quantitatively.

What is the relationship between pattern discovery and prediction? The last five years have seen an increasing interest in algorithms that discover or generate patterned data, leveraging methods beyond typical (e.g., Markovian) limits (Collins & Laney, 2017; [https://www.music-ir.org/mirex/wiki/2013:Discovery_of_Repeated_Themes_%26_Sections MIREX Discovery of Repeated Themes & Sections task]; Janssen, van Kranenburg and Volk, 2017; Ren et al., 2017; Widmer, 2016). One of the observations to emerge from the above-mentioned MIREX pattern discovery task is that an algorithm that is "good" at discovering patterns ought to be extendable to make "good" predictions for what will happen next in a given music excerpt ([https://www.music-ir.org/mirex/abstracts/2013/DM10.pdf Meredith, 2013]). Furthermore, evaluating the ability to predict may provide a stronger (or at least complementary) evaluation of an algorithm's pattern discovery capabilities, compared to evaluating its output against expert-annotated patterns, where the notion of "ground truth" has been debated (Meredith, 2013).

==Data==
The Patterns for Prediction Development Dataset (PPDD-Jul2018) has been prepared by processing a randomly selected subset of the [http://colinraffel.com/projects/lmd/ Lakh MIDI Dataset] (LMD, Raffel, 2016). It has audio and symbolic versions crossed with monophonic and polyphonic versions. The audio is generated from the symbolic representation, so it is not "expressive". The symbolic data is presented in CSV format. For example,

20,64,62,0.5,0
20.66667,65,63,0.25,0
21,67,64,0.5,0
...

would be the start of a prime where the first event had ontime 20 (measured in quarter-note beats -- equivalent to bar 6 beat 1 if the time signature were 4-4), MIDI note number (MNN) 64, estimated morphetic pitch number 62 (see [http://tomcollinsresearch.net/research/data/mirex/ppdd/mnn_mpn.pdf p. 352] from Collins, 2011 for a diagrammatic explanation; for more details, see Meredith, 1999), duration 0.5 in quarter-note beats, and channel 0. Re-exports to MIDI are also provided, mainly for listening purposes. We also provide a descriptor file containing the original Lakh MIDI Dataset id, the BPM, time signature, and a key estimate. The audio dataset contains all these files, plus WAV files.

The provenance of the Patterns for Prediction Test Dataset (PPTD) will '''not''' be disclosed, but it is not from LMD, if you are concerned about overfitting.

There are small (100 pieces), medium (1,000 pieces), and large (10,000 pieces) variants of each dataset, to cater to different approaches to the task (e.g., a point-set pattern discovery algorithm developer may not want/need as many training examples as a neural network researcher). Each prime lasts approximately 35 sec (according to the BPM value in the original MIDI file) and each continuation covers the subsequent 10 quarter-note beats. We would have liked to provide longer primes (as 35 sec affords investigation of medium- but not really long-term structure), but we have to strike a compromise between ideal and tractable scenarios.

Here are the PPDD-Jul2018 variants for download:
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_small.zip audio, monophonic, small] (92 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_medium.zip audio, monophonic, medium] (850 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_mono_large.zip audio, monophonic, large] (8.46 GB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_small.zip audio, polyphonic, small] (137 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_aud_poly_medium.zip audio, polyphonic, medium] (1.35 GB)
*audio, polyphonic, large (MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_small.zip symbolic, monophonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_medium.zip symbolic, monophonic, medium] (3 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_mono_large.zip symbolic, monophonic, large] (32 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_small.zip symbolic, polyphonic, small] (< 1 MB)
*[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/PPDD-Jul2018_sym_poly_medium.zip symbolic, polyphonic, medium] (9 MB)
*symbolic, polyphonic, large (MB)
(“Large” datasets were compressed using the [https://www.mankier.com/1/7za p7zip] package, installed via "brew install p7zip on Mac". The remaining “large” datasets should be available by Friday August 3rd.)

===Some examples===
[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.wav This prime] finishes with two G’s followed by a D above. Looking at the [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/0a983538-61b5-4b9d-9ad9-23e05f548e5c.png piano roll] or listening to the linked file, we can see/hear that this pitch pattern, in the exact same rhythm, has happened before (see bars 17-18 transition in the piano roll). Therefore, we and/or an algorithm, might predict that the first note of the continuation will follow the pattern established in the previous occurrence, returning to G 1.5 beats later.

[http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/001f5992-527d-4e04-8869-afa7cbb74cd0.wav This] is another example where a previous occurrence of a pattern might help predict the contents of the continuation. Not all excerpts contain patterns (in fact, one of the motivations for running the task is to interrogate the idea that patterns are abundant in music and always informative in terms of predicting what comes next). [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/fc2fda7c-9f55-4bf3-8fa8-f337e35aa20f.wav This one], for instance, does not seem to contain many clues for what will come next. And finally, [http://tomcollinsresearch.net/research/data/mirex/ppdd/ppdd-jul2018/examples/b9261e74-125a-429e-ae27-5b51abdc7d81.wav this one] might not contain any obvious patterns, but other strategies (such as schematic or tonal expectations) might be recruited in order to predict the contents of the continuation.

===Preparation of the data===
Preparation of the monophonic datasets was more involved than the polyphonic datasets: for both, we imported each MIDI file, quantised it using a subset of the Farey sequence of order 6 (Collins, Krebs, et al., 2014), and then excerpted a prime and continuation at a randomly selected time. For the monophonic datasets, we filtered for:
*channels that contained at least 20 events in the prime;
*channels that were at least 80% monophonic at the outset, meaning that at least 80% of their segments (Pardo & Birmingham, 2002) contained no more than one event;
*channels where the maximum inter-ontime interval in the prime was no more than 8 quarter-note beats.
*we then "skylined" these channels (independently) so that no two events had the same start time (maximum MNN chosen in event of a clash), and double-checked that they still contained at least 20 events;
*one suitable channel was then selected at random, and the prime appears in the dataset if the continuation contained at least 10 events.
If any of the above could not be satisfied for the given input, we skipped this MIDI file.

For the polyphonic data, we applied the minimum note criteria of 20 in the prime and 10 in the continuation, as well as the prime maximum inter-ontime interval of 8, but it was not necessary to measure monophony or perform skylining.

Audio files were generated by importing the corresponding CSV and descriptor files and using a sample bank of piano notes from the [https://magenta.tensorflow.org/datasets/nsynth Google Magenta NSynth dataset] (Engel et al., 2017) to construct and export the waveform.

The foil continuations were generated using a Markov model of order 1 over the whole texture (polyphonic) or channel (monophonic) in question, and there was '''no''' attempt to nest this generation process in any other process cognisant of repetitive or phrasal structure. See Collins and Laney (2017) for details of the state space and transition matrix.

==Submission Format==
All submissions should be statically linked to all dependencies and include a README file including the following information:

*command line calling format for all executables and an example formatted set of commands;
*output for subtask 1) in the format of an "ontime", "MNN" CSV file. The CSV may also contain other information, but "ontime" and "MNN" should be in the first two columns, respectively.
*output for subtask 2) should be an indication whether of the two presented continuations, "1" or "2" is judged by the algorithm to be genuine. This should be one CSV file for an entire dataset, with first column "id" referring to the file name of a prime-continuation pair, second column "1" containing a likelihood value in [0, 1] for the genuineness of the continuation in folder 1, and column “2” similarly for the continuation in folder 2.
*number of threads/cores used or whether this should be specified on the command line;
*expected memory footprint;
*expected runtime;
*any required environments and versions, e.g. Python, Java, Bash, MATLAB.

===Example Command Line Calling Format===

Python:

python <your_script_name.py> -i <input_folder> -o <output_folder>

==Evaluation Procedure==
'''In brief''': For subtask (1), we match the algorithmic output with the original continuation and compute a match score (see implementation at [https://github.com/BeritJanssen/PatternsForPrediction/blob/evaluation/evaluate_prediction.py GitHub]). For subtask (2), we count up how many times an algorithm judged the genuine continuation as most likely.

The input excerpt ends with a final note event: <math>(x_0, y_0, z_0)</math>, where <math>x_0</math> is ontime (start time measured in quarter-note beats starting with 0 for bar 1 beat 1), <math>y_0</math> is MNN, and <math>z_0</math> is duration (also measured in quarter-note beats).

The algorithm predicts the continuations: <math>(\hat{x}_1, \hat{y}_1, \hat{z}_1)</math>, <math>(\hat{x}_2, \hat{y}_2, \hat{z}_2)</math>, ..., <math>(\hat{x}_{n^\prime}, \hat{y}_{n^\prime}, \hat{z}_{n^\prime})</math>, where <math>\hat{x}_i</math> are predicted ontimes, <math>\hat{y}_i</math> are predicted MNNs, and <math>\hat{z}_i</math> are predicted durations. The true continuations are notated <math>(x_1, y_1, z_1), (x_2, y_2, z_2),..., (x_n, y_n, z_n)</math>. The predicted continuation ontimes are strictly increasing, that is <math>x_0 < \hat{x}_1 < \cdots < \hat{x}_{n^\prime}</math>, and so are the true continuation ontimes, that is <math>x_0 < x_1 < \cdots < x_n</math>.

===IOI===
This stands for inter-ontime interval 1. It evaluates whether the algorithm's prediction for the time between the excerpt ending (x_0) and the continuation beginning (x_1) is correct. The metric IOI takes the value 1 if <math>\hat{x}_1 = x_1</math>, and takes the value 0 otherwise.

===Pitch===
This metric evaluates whether the algorithm's prediction (<math>\hat{y}_1</math>) for the continuation's first MNN (<math>y_1</math>) is correct.

===IOI_4===
Let <math>P = \{x_1,\ldots, x_n\}</math> be the set of true continuation in the first four beats following the end of the excerpt, and <math>Q = \{\hat{x}_1,\ldots, \hat{x}_{n^\prime}\}</math> be the corresponding set predicted by an algorithm. Then the precision of the algorithm is <math>\mathrm{Prec}(P, Q) = |P \cap Q|/|Q|</math>, the recall of the algorithm is <math>\mathrm{Rec}(P, Q) = |P \cap Q|/|P|</math>, and IOI_4 is defined as the typical F1 ratio of precision and recall, IOI_4 = 2*Prec(P, Q)*Rec(P, Q)/(Prec(P, Q) + Rec(P, Q)). These intersections will probably be calculated "up to translation", meaning that a correct but time- or pitch-shifted solution would not be punished.

===IOI_10===
...is defined in exactly the same way as IOI_4, but for ten beats (or 2.5 measures in 4-4 time) following the end of the prime.

===Pitch_4 and Pitch_10===
...are defined in the same ways as IOI_4 and IOI_10 respectively, but applied to the MNN sets <math>P = \{y_1,\ldots, y_n\}</math> and <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. (Strictly speaking these may contain repeated elements, so the unique elements would be determined before calculating Prec, Rec, and F1.)

===Combo_4 and Combo_10===
In addition to evaluating rhythmic and pitch capacities independently, the metrics Combo_4 and Combo_10 capture the joint ioi-pitch predictive capabilities of algorithms, by applying the above definitions to the sets <math>P = \{(x_1, y_1),\ldots, (x_n, y_n)\}</math> and <math>Q = \{(\hat{x}_1, \hat{y}_1),\ldots, (\hat{x}_{n^\prime}, \hat{y}_{n^\prime})\}</math>.

===Polyphonic Version===
The polyphonic version of the task will be evaluated in the same way as the monophonic version of the task. Only the Pitch metric needs to change, because the true continuation's first event may consist of several MNNs, <math>P = \{y_{1,1},\ldots, y_{1,m}\}</math>, as may the algorithm's prediction, <math>Q = \{\hat{y}_1,\ldots, \hat{y}_{n^\prime}\}</math>. We will apply the concepts of precision, recall, and F1 to <math>P</math> and <math>Q</math> here, as above. While the above definitions have focused on the first predicted events and events in time windows of 4 and 10 quarter-note beats in length, we will probably also produce graphs with a sliding time window length, to more accurately pinpoint changes in performance.

===Entropy===
Some existing work in this area (e.g., Conklin & Witten, 1995; Pearce & Wiggins, 2006; Temperley, 2007) evaluates algorithm performance in terms of entropy. If we have time to collect human listeners' judgments of likely (or not) continuations for given excerpts, then we will be in a position to compare the entropy of listener-generated distributions with the corresponding algorithm distributions. This would open up the possibility of entropy-based metrics, but we consider this of secondary importance to the metrics outlined above.

==Questions and Comments==

Q. Instead of evaluating continuations, have you considered evaluating an algorithm's ability to predict content between two timepoints, or before a timepoint?

A. Yes we considered including this also, but opted not to for sake of simplicity. Furthermore, these alternatives do not have the same intuitive appeal as predicting future events.

Q. Why do some files sound like they contain a drum track rendered on piano?

A. Some of the MIDI files import as a single channel, but upon listening to them it is evident that they contain multiple instruments. For the sake of simplicity, we removed percussion channels where possible, but if everything was squashed down into a single channel, there was not much we could do.

==Time and Hardware Limits==

A total runtime limit of 72 hours will be imposed on each submission.

==Seeking Contributions==

*We would like to evaluate against real (not just synthesized-from-MIDI) audio versions. If you have a good idea of how we might make this available to participants, let us know. We would be happy to acknowledge individuals and/or companies for helping out in this regard.

*More suggestions/comments/ideas on the task is always welcome!

==Acknowledgments==

Thank you to Anja Volk, Darrell Conklin, Srikanth Cherla, David Meredith, Matevz Pesek, and Gissel Velarde for discussions!

==References==
*Cherla, S., Weyde, T., Garcez, A., and Pearce, M. (2013). A distributed model for multiple-viewpoint melodic prediction. In In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 15-20). Curitiba, Brazil.

*Collins, T. (2011). "[http://oro.open.ac.uk/30103/ Improved methods for pattern discovery in music, with applications in automated stylistic composition]". PhD Thesis.

*Collins, T., Böck, S., Krebs, F., & Widmer, G. (2014). [http://tomcollinsresearch.net/pdf/collinsEtAlAES2014.pdf Bridging the audio-symbolic gap: The discovery of repeated note content directly from polyphonic music audio]. In ''Proceedings of the Audio Engineering Society's 53rd Conference on Semantic Audio''. London, UK.

*Collins, T., Tillmann, B., Barrett, F. S., Delbé, C., & Janata, P. (2014). [http://psycnet.apa.org/journals/rev/121/1/33/ A combined model of sensory and cognitive representations underlying tonal expectations in music: From audio signals to behavior]. ''Psychological Review, 121''(1), 33-65.

*Collins T., & Laney, R. (2017). [http://jcms.org.uk/issues/Vol1Issue2/computer-generated-stylistic-compositions/computer-generated-stylistic-compositions.html Computer-generated stylistic compositions with long-term repetitive and phrasal structure]. ''Journal of Creative Music Systems, 1''(2).

*Conklin, D., and Witten, I. H. (1995). Multiple viewpoint systems for music prediction. ''Journal of New Music Research, 24''(1), 51-73.

*Elmsley, A., Weyde, T., & Armstrong, N. (2017). Generating time: Rhythmic perception, prediction and production with recurrent neural networks. ''Journal of Creative Music Systems, 1''(2).

*Engel, J., Resnick, C., Roberts, A., Dieleman, S., Eck, D., Simonyan, K., & Norouzi, M. (2017). Neural audio synthesis of musical notes with WaveNet autoencoders. https://arxiv.org/abs/1704.01279

*Gjerdingen, R. O. (1989). Using connectionist models to explore complex musical patterns. ''Computer Music Journal, 13''(3), 67-75.

*Gjerdingen, R. (2007). Music in the galant style. New York, NY: Oxford University Press.

*Hadjeres, G., Pachet, F., & Nielsen, F. (2016). Deepbach: A steerable model for Bach chorales generation. arXiv preprint arXiv:1612.01010.

*Huron, D. (2006). Sweet Anticipation. Music and the Psychology of Expectation. Cambridge, MA: MIT Press.

*Janssen, B., Burgoyne, J. A., & Honing, H. (2017). Predicting variation of folk songs: A corpus analysis study on the memorability of melodies. ''Frontiers in Psychology, 8'', 621.

*Janssen, B., van Kranenburg, P., & Volk, A. (2017). Finding occurrences of melodic segments in folk songs employing symbolic similarity measures. ''Journal of New Music Research, 46''(2), 118-134.

*Koelsch, S., Gunter, T. C., Wittfoth, M., & Sammler, D. (2005). Interaction between syntax processing in language and in music: an ERP study. ''Journal of Cognitive Neuroscience, 17''(10), 1565-1577.

*Lerdahl, F., and Jackendoff, R. (1983). "A generative theory of tonal music. Cambridge, MA: MIT Press.

*Margulis, E. H. (2014). ''On repeat: How music plays the mind''. New York, NY: Oxford University Press.

*Meredith, D. (1999). The computational representation of octave equivalence in the Western staff notation system. In ''Proceedings of the Cambridge Music Processing Colloquium''. Cambridge, UK.

*Meredith, D. (2013). COSIATEC and SIATECCompress: Pattern discovery by geometric compression. In ''Proceedings of the 10th Annual Music Information Retrieval Evaluation eXchange (MIREX'13)''. Curitiba, Brazil.

*Pardo, B., & Birmingham, W. P. (2002). Algorithms for chordal analysis. ''Computer Music Journal, 26''(2), 27-49.

*Pearce, M. T., & Wiggins, G. A. (2006). Melody: The influence of context and learning. ''Music Perception, 23''(5), 377–405.

*Raffel, C. (2016). "Learning-based methods for comparing sequences, with applications to audio-to-MIDI alignment and matching". PhD Thesis.

*Ren, I.Y., Koops, H.V, Volk, A., Swierstra, W. (2017). In search of the consensus among musical pattern discovery algorithms. In ''Proceedings of the International Society for Music Information Retrieval Conference'' (pp. 671-678). Suzhou, China.

*Roberts, A., Engel, J., Raffel, C., Hawthorne, C., & Eck, D. (2018). A hierarchical latent vector model for learning long-term structure in music. In ''Proceedings of the International Conference on Machine Learning'' (pp. 4361-4370). Stockholm, Sweden.

*Rohrmeier, M., & Pearce, M. (2018). Musical syntax I: theoretical perspectives. In ''Springer Handbook of Systematic Musicology'' (pp. 473-486). Berlin, Germany: Springer.

*Schellenberg, E. G. (1997). Simplifying the implication-realization model of melodic expectancy. ''Music Perception, 14''(3), 295-318.

*Schmuckler, M. A. (1989). Expectation in music: Investigation of melodic and harmonic processes. ''Music Perception, 7''(2), 109-149.

*Sturm, B.L., Santos, J.F., Ben-Tal., O., & Korshunova, I. (2016), Music transcription modelling and composition using deep learning. In ''Proceedings of the International Conference on Computer Simulation of Musical Creativity''. Huddersfield, UK.

*Temperley, D. (2007). ''Music and probability''. Cambridge, MA: MIT Press.

*Widmer, G. (2017). Getting closer to the essence of music: The con espressione manifesto. ''ACM Transactions on Intelligent Systems and Technology (TIST), 8''(2), 19.