2013:Discovery of Repeated Themes & Sections

Description

In brief: algorithms that take a single piece of music as input, and output a list of patterns repeated within that piece. Also known as intra-opus discovery (Conklin & Anagnostopoulou, 2001).

We would be happy to receive ideas for improving aspects of this task. Researchers with wiki accounts are able to post comments below or to edit the relevant sections, and researchers without wiki accounts are welcome to email me directly: tom.collins(a)jku.at

In more detail: for understanding and interpreting a musical work, the discovery of repeated patterns within that piece is a crucial step. Meredith, Lemström, and Wiggins (2002) cite Schenker (1954) as claiming repetition to be 'the basis of music as an art' (p. 5), and also Lerdahl and Jackendoff (1983), who observe that 'the importance of parallelism [i.e., repetition] in musical structure cannot be overestimated. The more parallelism one can detect, the more internally coherent an analysis becomes, and the less independent information must be processed and retained in hearing or remembering a piece' (p. 52).

On the very next page Lerdahl and Jackendoff (1983) acknowledge their 'failure to flesh out the notion of parallelism,' which is symptomatic of a more general failure in music psychology and music computing to address the discovery of repetition. Algorithms that take pieces of music as input, and output a list, visualisation, or summary of repeated patterns do exist (Chiu, Shan, Huang, & Li, 2009; Collins, Thurlow, Laney, Willis, & Garthwaite, 2010; Conklin & Anagnostopoulou, 2001; Forth & Wiggins, 2009; Hsu, Liu, & Chen, 2001; Knopke & Jürgensen, 2009; Lartillot, 2005; Meek & Birmingham, 2003; Meredith et al., 2002; Müller & Jiang, 2012; Nieto, Humphrey, & Bello, 2012; Peeters, 2007), but the pattern discovery task has received less attention than many other tasks in MIR. Until now!

What is a Pattern?

For the purposes of this task, a pattern is defined as a set of ontime-pitch pairs that occurs at least twice (i.e., is repeated at least once) in a piece of music. The second, third, etc. occurrences of the pattern will likely be shifted in time and perhaps also transposed, relative to the first occurrence. Ideally an algorithm would be able to discover all exact and inexact occurrences of a pattern within a piece, so in evaluating this task we are interested in both (1) whether an algorithm can discover one occurrence, up to time shift and transposition, and (2) to what extent it can find all occurrences. It has been pointed out by Lartillot and Toiviainen (2007) among others that as well as ontime-pitch patterns, there are various types of repeating pattern (e.g., ontimes alone, duration, contour, harmony, etc.). For the sake of simplicity, the current task is restricted to ontime-pitch pairs.

Some of the most recognisable riffs and motifs in music consist of as few as four ontime-pitch pairs (for example, the opening riff from 'Purple Haze' by Hendrix, or the opening of the first movement of Symphony no.5 in C minor by Beethoven). If, however, an algorithm returned all patterns consisting of four or more notes in a given piece, a lot of these patterns would not be perceptually salient or analytically interesting. Happily, solutions have been proposed for trying to determine which are the most noticeable and/or important patterns, which are of middling importance, and which have occurred by chance (Cambouropoulos, 2006; Conklin, 2010a, 2010b). Collins, Laney, Willis, & Garthwaite (2011) conducted a meta-analysis and experimental validation of many proposed solutions. More information about the differences between motif, theme, and repeated section can be found in answer to Question 6.6.

Data

My colleagues and I at the Department of Computational Perception, Johannes Kepler University, are compiling a database of classical music annotated with repeated themes and sections (mainly from KernScores; see also Flossmann, Goebl, Grachten, Niedemayer, & Widmer, 2010). To encourage participation in the pattern discovery task, we are offering a representative sample called the JKU Patterns Development Database (~340 MB, July 2013 version). (If you prefer, here is a smaller version with no audio, ~40 MB.) Symbolic and audio versions are crossed with monophonic and polyphonic versions, giving up to four versions of the task in total. Researchers are welcome to submit to more than one version of the task.

As a ground truth, we are basing motifs and themes on Barlow and Morgenstern's (1953) Dictionary of Musical Themes, Schoenberg's (1967) Fundamentals of Musical Composition, and Bruhn's (1993) J. S. Bach’s Well-Tempered Clavier: In-depth Analysis and Interpretation. Repeated sections are based on those marked by the composer. For one of the pieces we created our own annotation. A paper that describes our construction of the Development Database and use of the sources is currently under preparation. No ground truth is perfect: we have chosen the sources as being relatively uncontroversial and transparent, but we would welcome ideas and suggestions from other researchers. As a quick example, Figure 1 is an excerpt from Beethoven's op.2 no.1 mvt.3, with a ground-truth pattern marked as $P_{1}$ (first occurrence) and $P_{2}$ (second occurrence).

Submission Format

Symbolic Version

Participants are able to choose from a number of symbolic representations (MIDI, kern, csv with columns for ontime, MIDI note number, staff height, duration, and staff number), as there may be differing opinions about which aspects of a representation are most useful for discovering repeated patterns. This choice also reflects the importance of designing pattern discovery code that functions irrespective of the exact input format (Wiggins, 2007). For the purposes of standardised evaluation, participants will need to convert each occurrence of a discovered pattern to a point set consisting of event ontimes and MIDI note numbers. For instance, the point-set representation for $P_{1}$ in Figure 1 is

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_1 = \{(-1, 60),\ (-1, 68),\ (0, 61),\ (0, 70),\ (1, 58),\ (1, 67),\ (2, 53),}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (3, 60),\ (3, 68),\ (4, 56),\ (4, 65),\ (5, 53),\ (5, 56),\ (5, 60),\ (5, 65),}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (6, 55),\ (6, 58),\ (6, 60),\ (6, 64),\ (7, 53),\ (7, 56),\ (7, 60),\ (7, 65),}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (8, 52),\ (8, 55),\ (8, 60),\ (8, 67),}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (9, 53),\ (9, 56),\ (9, 60),\ (9, 70),\ (10, 68)\}.}

Sectional repetitions are expanded in all pieces, i.e. as the piece would be heard in a performance. In the monophonic version, pieces consisting of voiced polyphony (e.g., a fugue or choral work) are unfolded, meaning each voice is extracted and re-encoded monophonically, one after the other in the order highest staff to lowest. For example, a fugue with upper, middle, and lower voices would be re-encoded with the upper voice heard first in isolation, followed by the middle voice, and then lower voice. In the monophonic version, pieces consisting of unvoiced polyphony are converted to monophony using the clipped skyline approach.

Audio Version

For the audio version of the task, participating algorithms will have to read audio in wav format, sample rate 44.1 KHz, 16 bit, mono. These wav files are rendered (synthesised) in a metronomically exact fashion from the corresponding symbolic data. Beats per minute (BPM) are different for different pieces, but this information is located in the corresponding kern file (e.g., in a kern file '*MM192' means 192 BPM).

As with the symbolic version of the task, for the purposes of standardised evaluation, participants will need to convert each occurrence of a discovered pattern to a point set consisting of event ontimes and MIDI note numbers. Even if your algorithm only returns a time interval Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle [a,\ b]} in seconds for an occurrence of a pattern, this conversion will be easy enough to do: convert Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle [a,\ b]} to an ontime interval Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle [c,\ d]} using the BPM provided, and then use the csv file for the piece to determine which ontime-MIDI pairs are sounding in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle [c,\ d]} . (A downside to this approach is that the evaluations metrics will be slightly be punitive if not all ontime-pitch pairs sounding in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle [c,\ d]} are part of the ground truth pattern.)

Example Algorithm Output for a Ground-Truth Piece

Regardless of symbolic/audio and polyphonic/monophonic task version, the output of your pattern discovery algorithm for a given piece should adhere to the following text file format:

pattern1 
occurrence1 
7.00000, 45.00000 
7.00000, 48.00000 
... 
11.00000, 60.00000 
occurrence2 
31.00000, 57.00000 
31.00000, 60.00000 
... 
35.00000, 72.00000 
occurrence3 
59.00000, 57.00000 
59.00000, 60.00000 
... 
63.00000, 72.00000 
pattern2 
occurrence1 
7.00000, 45.00000 
7.00000, 48.00000 
... 
11.00000, 57.00000 
occurrence2 
27.00000, 48.00000 
27.00000, 52.00000 
... 
59.00000, 60.00000 
... 
patternM 
occurrence1 
9.00000, 58.00000 
9.50000, 52.00000 
... 
12.00000, 60.0000 
...
occurrencem 
100.00000, 62.00000 
100.50000, 55.00000 
...
103.00000, 61.00000

That is, ontimes are in the left-hand column and MIDI note numbers are in the right. Each occurrence of a discovered pattern is given before moving on to the next pattern. Occurrences do not have to be of the same length, nor do they have to be constrained to exact or transposed repetition (e.g., variations are permitted). Neither the patterns nor the occurrences of patterns need to be in temporal order: the evaluation metrics are robust to different orders.

Order does matter, however, in the following two respects: if possible (1) place the patterns in decreasing order of predicted perceptual salience/musical importance; (2) define occurrence1 to be the prototypical occurrence of each pattern. Fulfilling point (1) is not essential (could defer to future work), but it concerns an application of discovery algorithms wherein a user browses the output patterns. It would be convenient for the user to be shown the most important patterns first, and one metric below (called first five target proportion) evaluates this aspect of algorithm performance. Fulfilling point (2) is important if your discovery method is capable of retrieving inexact occurrences. Some metrics below are designed for assessing the capability for retrieving inexact occurrences, but others are simply concerned with whether or not the prototypical occurrence is discovered. The evaluation code will consider occurrence1 to be the prototype.

Evaluation Procedure

In brief: An implementation of the evaluation metrics and example code are bundled with the Development Database, to save participants having to implement the evaluation metrics themselves. (If you do wish to write your own evaluation code, please note that the appropriate annotation folders for the polyphonic task are 'bruhn', 'barlowAndMorgensternRevised', 'sectionalRepetitions', 'schoenberg', and 'tomCollins'; for the monophonic task it is 'bruhn', 'barlowAndMorgenstern', 'barlowAndMorgensternRevised', 'sectionalRepetitions', 'schoenberg', and 'tomCollins'.) Participating algorithms will be evaluated against the following metrics:

precision, recall, and F1 score;
establishment precision, establishment recall, and establishment F1 score;
occurrence precision, occurrence recall, and occurrence F1 score;
runtime, fifth return time, and first five target proportion.

Precision, Recall, and F1 Score

In more detail: Denote the Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n_{\mathcal{P}}} patterns in a ground truth by Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Pi = \{ \mathcal{P}_1, \mathcal{P}_2,\ldots, \mathcal{P}_{n_\mathcal{P}} \}} , and the Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n_{\mathcal{Q}}} patterns in an algorithm’s output by Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Xi = \{ \mathcal{Q}_1, \mathcal{Q}_2,\ldots, \mathcal{Q}_{n_\mathcal{Q}} \}} . If the algorithm discovers Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k} of the ground truth patterns, up to translation, then the precision of the algorithm is defined as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P = k/n_{\mathcal{Q}}} , the recall of the algorithm is defined as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle R = k/n_{\mathcal{P}}} , and the F1 score as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle F_1 = 2PR/(P + R).\ }

The above metrics, which were used by Collins et al. (2010) in one of the first evaluations of a pattern discovery task, are very strict: an output pattern Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Q} may have only one point different from a large ground truth pattern Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P} , but this will not count as a successful discovery. Therefore, we propose the following new metrics, which are robust to slight differences between output and ground truth patterns.

Robust Versions of Precision, Recall, and F1 score

Symbolic Musical Similarity and the Score Matrix

Suppose that in the ground truth there is a pattern Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P} with occurrences Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{P} = \{ P_1, P_2,\ldots, P_{m_P} \}} , and in an algorithm's output there is a pattern Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Q} with occurrences Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{Q} = \{ Q_1, Q_2,\ldots, Q_{m_Q} \}} . Central to evaluating an algorithm is measuring the extent to which Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{Q}} constitutes the discovery of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{P}} . In order to measure this, we need to be able to compute the symbolic musical similarity of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_i} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Q_j} . We can use the simple cardinality score for symbolic musical similarity,

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle s_c(P_i, Q_j) = |P_i \cap Q_j|/ \max \{ |P_i|, |Q_j| \}, }

or the slightly more involved normalised matching score Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle s_m(P_i, Q_j)} , after Arzt, Böck, and Widmer (2012). Some examples of cardinality and matching scores between original and mutant versions of the theme from Beethoven's op.2 no.2 mvt.3 are given in Figure 2.

Either of these similarity measures, denoted Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle s(P_i, Q_j)} , can be recorded in a so-called score matrix,

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle s(\mathcal{P}, \mathcal{Q}) = \left( \begin{array}{cccc} s(P_1, Q_1) & s(P_1, Q_2) & \cdots & s(P_1, Q_{m_Q}) \\ s(P_2, Q_1) & s(P_2, Q_2) & \cdots & s(P_2, Q_{m_Q}) \\ \vdots & \vdots & \ddots & \vdots \\ s(P_{m_P}, Q_1) & s(P_{m_P}, Q_2) & \cdots & s(P_{m_P}, Q_{m_Q}) \end{array} \right). }

The score matrix shows how all occurrences Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{Q} = \{ Q_1, Q_2,\ldots, Q_{m_Q} \}} of a pattern in an algorithm's output compare to all occurrences Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{P} = \{ P_1, P_2,\ldots, P_{m_P} \}} of a ground truth pattern.

Establishment Precision, Establishment Recall, and Establishment F1 Score

Summaries of the score matrix will be necessary for evaluating all of an algorithm's output against the whole ground truth for a piece. For instance, we may be interested in whether an algorithm is capable of establishing that a pattern Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P} is repeated at least once during a piece, and less interested in whether the algorithm can retrieve all occurrences of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P} (exact and inexact). In this case, the maximum entry in the score matrix, denoted Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle S(\mathcal{P}, \mathcal{Q})} , is the appropriate summary. For a piece's ground truth Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Pi = \{ \mathcal{P}_1, \mathcal{P}_2,\ldots, \mathcal{P}_{n_\mathcal{P}} \}} , and an algorithm's entire output for that piece Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Xi = \{ \mathcal{Q}_1, \mathcal{Q}_2,\ldots, \mathcal{Q}_{n_\mathcal{Q}} \}} , it is now possible to record the algorithm's capability for establishing that patterns in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Pi} are repeated at least once during the piece, using the so-called establishment matrix,

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle S(\Pi, \Xi) = \left( \begin{array}{cccc} S(\mathcal{P}_1, \mathcal{Q}_1) & S(\mathcal{P}_1, \mathcal{Q}_2) & \cdots & S(\mathcal{P}_1, \mathcal{Q}_{n_Q}) \\ S(\mathcal{P}_2, \mathcal{Q}_1) & S(\mathcal{P}_2, \mathcal{Q}_2) & \cdots & S(\mathcal{P}_2, \mathcal{Q}_{n_Q}) \\ \vdots & \vdots & \ddots & \vdots \\ S(\mathcal{P}_{n_P}, \mathcal{Q}_1) & S(\mathcal{P}_{n_P}, \mathcal{Q}_2) & \cdots & S(\mathcal{P}_{n_P}, \mathcal{Q}_{n_Q}) \end{array} \right). }

The establishment precision can then be calculated according to

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_{\text{est}} = \frac{1}{n_\mathcal{Q}} \sum_{j = 1}^{n_\mathcal{Q}} \max \{ S(\mathcal{P}_i, \mathcal{Q}_j) \mid i = 1,\ldots, n_\mathcal{P} \}. }

If an algorithm discovers Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k} of the ground-truth patterns exactly, and misses the remaining Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n_\mathcal{Q} - k} patterns completely, then the establishment precision is equal to standard precision (Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle = k/n_\mathcal{Q}} ). The establishment recall is defined as

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle R_{\text{est}} = \frac{1}{n_\mathcal{P}} \sum_{i = 1}^{n_\mathcal{P}} \max \{ S(\mathcal{P}_i, \mathcal{Q}_j) \mid j = 1,\ldots, n_\mathcal{Q} \}. }

The establishment F1 score is defined as above, but replacing precision with establishment precision, and recall with establishment recall.

Occurrence Precision, Occurrence Recall, and Occurrence F1 Score

As mentioned above, there is a difference between a pattern discovery algorithm (or listener) being able to establish the existence of a repeated pattern, and being able to retrieve all occurrences. We showed how to measure the extent to which an algorithm is capable of establishing that a pattern Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P} is repeated at least once during a piece. Now we focus on an algorithm's ability to retrieve all occurrences of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P} (exact and inexact). These metrics will favour an algorithm that is strong at retrieving all occurrences of the patterns it discovers, even if the algorithm fails completely to discover many of the salient patterns in a piece.

The indices Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle I} of the estalishment matrix with values greater than or equal to some threshold (default value Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle c = .75} ) indicate which ground truth patterns an algorithm is considered to have discovered. We will focus on these indices to define a so-called occurrence matrix. Denoted Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle O(\Pi, \Xi)} , the occurrence matrix begins as an Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n_\mathcal{P} \times n_\mathcal{Q}} zero matrix. Then for each index pair Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (i, j) \in I} , we calculate the precision of the score matrix Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle s(\mathcal{P}_i, \mathcal{Q}_j)} , and record this scalar as element Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle (i, j)} of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle O(\Pi, \Xi)} . The precision of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle s(\mathcal{P}_i, \mathcal{Q}_j)} indicates the precision with which algorithm output Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{Q}_j} retrieved the ground truth item Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{P}_i} . The occurrence precision, denoted Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_{\text{occ}}} , is then defined as the precision of the occurrence matrix Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle O(\Pi, \Xi)} , with the sum taken over nonzero columns. The occurrence recall, denoted Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle R_{\text{occ}}} , is defined analogously, but replacing mentions of 'precision' and 'columns' above with 'recall' and 'rows.' The occurrence Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle F_1} score can be defined also.

Runtime, Fifth Return Time, and First Five Target Proportion

Overall runtime is an important metric. Those wishing to develop pattern discovery algorithms for on-the-fly browsing, however, may find it more relevant to know the time taken to return a smaller number of patterns. (E.g., while the user browses, the algorithm can continue to discover extra patterns.) Fifth return time (FRT) is the time taken for the first five patterns to be output by an algorithm. As these patterns are of little use if none of them are ground truth, we counterbalance FRT with another metric called first five target proportion (FFTP), which is the proportion of times that the first five output patterns contain a ground-truth pattern.

Friedman Tests for the Pattern Discovery Task

The Friedman test will be used to investigate whether any algorithms rank consistently higher or lower than the others, with regard to the F1 scores on individual pieces.

Available Code

Entering an existing MIREX task, where results have been improving for up to 7 years, can be a daunting prospect. The pattern discovery task, on the other hand, is new, and so is a great opportunity for Master's and PhD students to make their mark in MIR. To this end, it should be noted that the following code is freely available, and that students/researchers are very welcome to define pattern discovery algorithms by altering/extending this code, or to use it as a point of comparison with their own algorithms. Please feel free to ask questions, either via this wiki, or by email to authors of the relevant papers.

Java implementation of algorithms from Meredith et al. (2002) and Meredith (2006).
Matlab implementation of algorithms from Collins et al. (2010). (Agree to GNU licence and then download Patterns-Aug2012.zip.)
If you would like to participate in the audio version but are missing an F0 estimator, then you could use the MELODIA plug-in as described by Salamon and Gómez (2012).
Please add links to more implementations here.
...

Questions and Comments

How Is Pattern Discovery Different to the 2012:Structural_Segmentation Task?

We expect structural segmentation algorithms to be adaptable to pattern discovery, so would really welcome segmentation researchers to submit to the pattern discovery task as well. The two tasks are different as follows: structural segmentation results in a list of labelled time intervals that cover an entire piece of music, such as

  0.000    4.273     A
  4.273    8.469     A
  8.469   21.321     B
 21.321   25.734     A
    ...      ...     ...
175.012  179.108     A

The output of a pattern discovery algorithm will not necessarily cover an entire piece. A four-bar theme beginning in bar 1 might be the only output of a pattern discovery algorithm, even if the piece is much longer and contains other material.
Whereas the output of a structural segmentation algorithm is non-overlapping, the output of a pattern discovery algorithm might be overlapping or even nested (hierarchical). For instance, the four-bar theme mentioned above might be output, as well as a sectional repetition that lasts from bars 1-8.

How Is Pattern Discovery Different to Pattern Matching, Or the 2012:Symbolic_Melodic_Similarity Task?

In a typical pattern matching task, more or less exact instances of a given query are retrieved from some larger dataset, and ranked by an appropriate measure of relevance to the original query (e.g., Barton, Cambouropoulos, Iliopoulos, & Lipták, 2012). The setup of pattern discovery is fundamentally different: there are no queries given to begin with, just single pieces of music and the requirement to discover repeating patterns within each piece.

The melodic similarity task fits the pattern matching paradigm, and so is also different to pattern discovery. In the melodic similarity task, algorithms are given a melodic query, and retrieve a supposedly relevant melody from the database. The similarity of the query and the algorithm's match is assessed by human listeners.

Why Not Just Use Optical Music Recognition to Detect Sectional Repetitions?

One could use optical music recognition instead, although what we are trying to understand and model is a listener's awareness of thematic material and sectional repetitions, which often exists without access to staff notation. It would also be interesting to apply pattern discovery to music for which there is no staff notation.

This Is Intra-Opus Discovery, But What About Inter-Opus Discovery?

Inter-opus discovery, the discovery of patterns that recur across multiple pieces of music (Conklin & Anagnostopoulou, 2001), is an interesting problem, and one that we would be interested to see cast as a MIREX task in future. Currently, lack of an appropriate ground truth is an issue here.

There Are Some Issues With the MIDI Files, Please Can You Clarify?

The MIDI files were created and are provided for the purposes of sonifying and checking the symbolic data, and are not intended to be used themselves for input to the pattern discovery algorithms (please see the folders called 'csv' and/or 'lisp' instead). They are not ideal for input for the following reasons: (1) correct pitch spelling is lost, whereas this is maintained by presenting MIDI note number and morphetic pitch number side by side in the 'csv' and 'lisp' folders; (2) each MIDI file is zeroed in the sense that it begins more or less immediately, even if the pattern occurrence it represents occurs halfway through a piece; (3) each MIDI file also contains one extra, very quiet, low note to avoid clipping in the sound file.

What is the Difference Between a Motif, a Theme, and a Repeated Section?

Dictionary definitions of motif, theme, and repeated section are given below. To make the definitions more concrete, I refer to the top system of Figure 2. In terms of ontime-pitch pairs, the motif here consists of {(2, C#5), (2.25, A4), (2.5, E5), (2.75, C#5), (3, A5)}, beginning on beat 3 of bar 2 and ending on beat 1 of bar 3. This is repeated an octave lower one bar later, and occurs with a slightly different intervallic configuration at the very beginning. The theme, according to Barlow and Morgenstern (1948), lasts from the upbeat of bar 1, to beat 2 of bar 4. Bars 5-8 are not shown in Figure 2, but there is a repeated section consisting of bars 1-8. So one might infer from this example that typically a motif lasts less than one bar, a theme 4-8 bars, and a repeated section 8+ bars.

According to Drabkin (2001a), a "motif may be of any size, and is most commonly regarded as the shortest subdivision of a theme or phrase that still maintains its identity as an idea." A theme is the "musical material on which part or all of a work is based, usually having a recognizable melody and sometimes perceivable as a complete musical expression in itself" Drabkin (2001b). A repeated section is the "restatement of a portion of a musical composition of any length from a single bar to a whole section, or occasionally the whole piece. Since the Classical period, repeated passages have not usually been written out; instead they are enclosed within the signs ||: and :||" (Tilmouth, 2001).

Time And Hardware Limits

Depending on the number of participating algorithms, we may place a limit on analysis times.

Potential Participants

Please add name and email here.
...

Acknowledgments

Thank you to the following for feedback on this task description: Ashley Burgoyne, Emilios Cambouropoulos, Darrell Conklin, Stephen Downie, Morwaread Farbood, Jamie Forth, Nanzhu Jiang, Ian Knopke, Olivier Lartillot, David Meredith, Oriol Nieto, Eleanor Selfridge-Field, and Geraint Wiggins.

References

Andreas Arzt, Sebastian Böck, and Gerhard Widmer. Fast identification of piece and score position via symbolic fingerprinting. In F. Gouyon, P. Herrera, L.G. Martin, and M. Müller (Eds), Proc ISMIR, pp. 433-438, Porto, 2012.

Harold Barlow and Sam Morgenstern. A dictionary of musical themes. Crown Publishers, New York, 1948.

Siglind Bruhn. J.S. Bach's Well-Tempered Clavier: in-depth analysis and interpretation. Mainer International, Hong Kong, 1993.

Carl Barton, Emilios Cambouropoulos, Costas S. Iliopoulos, and Zsuzsanna Lipták. Melodic string matching via interval consolidation and fragmentation. In L. Iliadis, I. Maglogiannis, H. Papadopoulos, K. Karatzas, and S. Sioutas (Eds), Artificial Intelligence Applications and Innovations, pp. 460-469. Springer, Berlin, 2012.

Emilios Cambouropoulos. Musical parallelism and melodic segmentation: a computational approach. Music Perception, 23(3):249-267, 2006.

Shih-Chuan Chiu, Man-Kwan Shan, Jiun-Long Huang, and Hua-Fu Li. Mining polyphonic repeating patterns from music data using bit-string based approaches. In R. Radhakrishnan and R. Yan (Eds), Proc IEEE International Conference on Multimedia and Expo, pp. 1170-1173, New York, 2009.

Tom Collins, Jeremy Thurlow, Robin Laney, Alistair Willis, and Paul H. Garthwaite. A comparative evaluation of algorithms for discovering translational patterns in Baroque keyboard works. In J.S. Downie and R. Veltkamp (Eds), Proc ISMIR, pp. 3-8, Utrecht, 2010. Supporting material

Tom Collins, Robin Laney, Alistair Willis, and Paul H. Garthwaite. Modeling pattern importance in Chopin's mazurkas. Music Perception, 28(4):387-414, 2011. Supporting material

Darrell Conklin. Discovery of distinctive patterns in music. Intelligent Data Analysis, 14(5):547-554, 2010a.

Darrell Conklin. Distinctive patterns in the first movement of Brahms' String Quartet in C minor. Journal of Mathematics and Music, 4(2):85-92, 2010b.

Darrell Conklin and Christina Anagnostopoulou. Representation and discovery of multiple viewpoint patterns. In A. Schloss, R. Dannenberg, and P. Driessen (Eds), Proc ICMC, pp. 479-485, Cuba, 2001.

William Drabkin. Motif. In S. Sadie and J. Tyrrell (Eds), "The new Grove dictionary of music and musicians". Macmillan, London, UK, 2nd edition, 2001a.

William Drabkin. Theme. In S. Sadie and J. Tyrrell (Eds), "The new Grove dictionary of music and musicians". Macmillan, London, UK, 2nd edition, 2001b.

Sebastian Flossmann, Werner Goebl, Maarten Grachten, Bernhard Niedemayer, and Gerhard Widmer. The Magaloff project: an interim report. Journal of New Music Research, 39(4):363-377, 2010.

Jamie Forth and Geraint A. Wiggins. An approach for identifying salient repetition in multidimensional representations of polyphonic music. In J. Chan, J. Daykin, and M.S. Rahman (Eds), London algorithmics 2008: Theory and practice, pp. 44-58. College Publications, London, 2009.

Jia-Lien Hsu, Chih-Chin Liu, and Arbee L.P. Chen. Discovering nontrivial repeating patterns in music data. IEEE Transactions on Multimedia, 3(3):311-325, 2001.

Ian Knopke and Frauke Jürgensen. A system for identifying common melodic phrases in the masses of Palestrina. Journal of New Music Research, 38(2):171-181, 2009.

Olivier Lartillot. Efficient extraction of closed motivic patterns in multidimensional symbolic representations of music. In J.D. Reiss and G.A. Wiggins (Eds), Proc ISMIR, pp. 191-198, London, 2005.

Olivier Lartillot and Petri Toiviainen. Motivic matching strategies for automated pattern extraction. Musicae Scientiae, Discussion Forum 4A:281-314, 2007.

Fred Lerdahl and Ray Jackendoff. A generative theory of tonal music. MIT Press, Cambridge, MA, 1983.

Colin Meek and William P. Birmingham. Automatic thematic extractor. Journal of Intelligent Information Systems, 21(1):9-33, 2003.

David Meredith, Kjell Lemstr&oumlm, and Geraint A. Wiggins. Algorithms for discovering repeated patterns in multidimensional representations of polyphonic music. Journal of New Music Research, 31(4):321-345, 2002.

David Meredith. Point-set algorithms for pattern discovery and pattern matching in music. In T. Crawford and R. Veltkamp (Eds), Proc Dagstuhl Seminar on Content-Based Retrieval, 23 pp., Dagstuhl, 2006.

Meinard Müller and Nanzhu Jiang. A scape plot representation for visualizing repetitive structures of music recordings. In F. Gouyon, P. Herrera, L.G. Martin, and M. Müller (Eds), Proc ISMIR, pp. 97-102, Porto, 2012.

Oriol Nieto, Eric J. Humphrey, Juan Pablo Bello. Compressing music recordings into audio summaries. In F. Gouyon, P. Herrera, L.G. Martin, and M. Müller (Eds), Proc ISMIR, pp. 313-318, Porto, 2012.

Geoffroy Peeters. Sequence representation of music structure using higher-order similarity matrix and maximum-likelihood approach. In S. Dixon, D. Bainbridge, and R. Typke (Eds), Proc ISMIR, pp. 35-40, Vienna, 2007.

Justin Salamon and Emilia Gómez. Melody extraction from polyphonic music signals using pitch contour characteristics. IEEE Transactions on Audio, Speech and Language Processing, 20(6):1759-1770, 2012.

Heinrich Schenker. Harmony. University of Chicago Press, London, 1954. (Translated by Elisabeth Mann Borgese and edited by Oswald Jonas. Original work published 1906 by Cotta, Stuttgart).

Arnold Schoenberg. Fundamentals of Musical Composition. Faber and Faber, London, 1967.

Michael Tilmouth. Repeat. In S. Sadie and J. Tyrrell (Eds), "The new Grove dictionary of music and musicians". Macmillan, London, UK, 2nd edition, 2001.

Avery Wang. An industrial strength audio search algorithm. In H.H. Hoos and D. Bainbridge (Eds), Proc ISMIR, Baltimore, MD, 2003.

Geraint A. Wiggins. Computer-representation of music in the research environment. In T. Crawford and L. Gibson (Eds), Modern methods for musicology: prospects, proposals and realities, pp. 7-22. Ashgate, Oxford, UK, 2007.