2013:Discovery of Repeated Themes & Sections

From MIREX Wiki
Revision as of 07:41, 15 March 2013 by Tom Collins (talk | contribs) (Symbolic Version)

Description

In brief: algorithms that take a single piece of music as input, and output a list of patterns repeated within that piece. Also known as intra-opus discovery (Conklin & Anagnostopoulou, 2001).

We would be happy to receive ideas for improving aspects of this task. Researchers with wiki accounts are able to post comments below or to edit the relevant sections, and researchers without wiki accounts are welcome to email me directly: tom.collins(a)jku.at

In more detail: for understanding and interpreting a musical work, the discovery of repeated patterns within that piece is a crucial step. Meredith, Lemström, and Wiggins (2002) cite Schenker (1954) as claiming repetition to be "the basis of music as an art" (p. 5), and also Lerdahl and Jackendoff (1983), who observe that "the importance of parallelism [i.e., repetition] in musical structure cannot be overestimated. The more parallelism one can detect, the more internally coherent an analysis becomes, and the less independent information must be processed and retained in hearing or remembering a piece" (p. 52).

On the very next page Lerdahl and Jackendoff (1983) acknowledge their "failure to flesh out the notion of parallelism," which is symptomatic of a more general failure in music psychology and music computing to address the discovery of repetition. Algorithms that take pieces of music as input, and output a list, visualisation, or summary of repeated patterns do exist (Collins, Thurlow, Laney, Willis, & Garthwaite, 2010; Conklin & Anagnostopoulou, 2001; Forth & Wiggins, 2009; Knopke & Jürgensen, 2009; Lartillot, 2005; Meek & Birmingham, 2003; Meredith et al., 2002; Müller & Jiang, 2012; Nieto, Humphrey, & Bello, 2012; Peeters, 2007), but the pattern discovery task has received less attention than many other tasks in MIR. Until now!

What is a pattern?

For the purposes of this task, a pattern is defined as a set of ontime-pitch pairs that occurs at least twice (i.e., is repeated at least once) in a piece of music. The second, third, etc. occurrences of the pattern will likely be shifted in time and perhaps also transposed, relative to the first occurrence. Ideally an algorithm would be able to discover all exact and inexact occurrences of a pattern within a piece, so in evaluating this task we are interested in both (1) whether an algorithm can discover one occurrence, up to time shift and transposition, and (2) to what extent it can find all occurrences.

Some of the most recognisable riffs and motifs in music consist of as few as four ontime-pitch pairs (for example, the opening riff from 'Purple Haze' by Hendrix, or the opening of the first movement of Symphony no.5 in C minor by Beethoven). If, however, an algorithm returned all patterns consisting of four or more notes in a given piece, a lot of these patterns would not be perceptually salient or analytically interesting. Happily, solutions have been proposed for trying to determine which are the most noticeable and/or important patterns, which are of middling importance, and which have occurred by chance (Cambouropoulos, 2006; Conklin, 2010a, 2010b). Collins, Laney, Willis, & Garthwaite (2011) conducted a meta-analysis and experimental validation of many proposed solutions.

Data

My colleagues and I at the Department of Computational Perception, Johannes Kepler University, are compiling a database of classical music annotated with repeated themes and sections (mainly from KernScores; see also Flossmann, Goebl, Grachten, Niedemayer, & Widmer, 2010). To encourage participation in the pattern discovery task, we are offering a representative sample called the JKU Patterns Development Database (~600 MB). Symbolic and audio versions are crossed with monophonic and polyphonic versions, giving up to four versions of the task in total. Researchers are welcome to submit to more than one version of the task.

As a ground truth, we are basing motifs and themes on Barlow and Morgenstern's (1953) Dictionary of Musical Themes, Schoenberg's (1967) Fundamentals of Musical Composition, and Bruhn's (1993) J. S. Bach’s Well-Tempered Clavier: In-depth Analysis and Interpretation. Repeated sections are based on those marked by the composer. For one of the pieces we created our own annotation. A paper that describes our construction of the Development Database and use of the sources is currently under preparation. No ground truth is perfect: we have chosen the sources as being relatively uncontroversial and transparent, but we would welcome ideas and suggestions from other researchers. As a quick example, Figure 1 is an excerpt from Beethoven's op.2 no.1 mvt.3, with a ground-truth pattern marked as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_1} (first occurrence) and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_2} (second occurrence).


Submission Format

Symbolic Version

Participants are able to choose from a number of symbolic representations (MIDI, kern, csv with columns for ontime, MIDI note number, staff height, duration, and staff number), as there may be differing opinions about which aspects of a representation are most useful for discovering repeated patterns. This choice also reflects the importance of designing pattern discovery code that functions irrespective of the exact input format (Wiggins, 2007). For the purposes of standardised evaluation, participants will need to convert each occurrence of a discovered pattern to a point set consisting of event ontimes and MIDI note numbers. For instance, the point-set representation for Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_1} in Figure 1 is

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle {P_1 = \{(-1, 60),\ (-1, 65),\ (0, 61),\ (0, 66),\ (1, 59),\ (1, 64),\ (2, 56),}}
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle {\, (3, 60),\ (3, 65),\ (4, 58),\ (4, 63),\ (5, 56),\ (5, 58),\ (5, 60),\ (5, 63),}}
Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle {\, (6, 57),\ (6, 59),\ (6, 60),\ (6, 62),\ (7, 56),\ (7, 58),\ (7, 60),\ (7, 65),}}

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle {\, (9, 56),\ (9, 58),\ (9, 60),\ (9, 66),\ (10, 65)\}.}}

Repeated sections are expanded in all pieces, i.e. as the piece would be heard in a performance. In the monophonic version, each voice marked as independent in the score is extracted and re-encoded monophonically one after the other in the order highest to lowest. For example, a fugue with upper, middle, and lower voices would be re-encoded with the upper voice heard first in isolation, followed by the middle voice, and then lower voice. As for within-voice chords, only the top note is included.

Audio Version

For the audio version of the task, participating algorithms will have to read audio in wav format, sample rate 44.1 KHz, 16 bit, mono. These wav files are rendered (synthesised) in a metronomically exact fashion from the corresponding symbolic data. Beats per minute (BPM) are different for different pieces, but this information is located in the corresponding kern file (e.g., '*MM192' means 192 BPM).

As with the symbolic version of the task, for the purposes of standardised evaluation, participants will need to convert each occurrence of a discovered pattern to a point set consisting of event ontimes and MIDI note numbers. Even if your algorithm only returns a time interval Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle [a,\ b]} in seconds for an occurrence of a pattern, this conversion will be easy enough to do: convert Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle [a,\ b]} to an ontime interval Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle [c,\ d]} using the BPM provided, and then use the csv file for the piece to determine which ontime-MIDI pairs are sounding in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle [c,\ d]} . (A downside to this approach is that the evaluations metrics will be slightly be punitive if not all ontime-pitch pairs sounding in Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle [c,\ d]} are part of the ground truth pattern.)

Evaluation Procedure

An implementation of the evaluation metrics and example code are bundled with the Development Database, to save participants having to implement the evaluation metrics themselves.

Precision, Recall, and F1 Score

Denote the Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n_{\mathcal{P}}} patterns in a ground truth by Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Pi = \{ \mathcal{P}_1, \mathcal{P}_2,\ldots, \mathcal{P}_{n_\mathcal{P}} \}} , and the Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle n_{\mathcal{Q}}} patterns in an algorithm’s output by Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \Xi = \{ \mathcal{Q}_1, \mathcal{Q}_2,\ldots, \mathcal{Q}_{n_\mathcal{Q}} \}} . If the algorithm discovers Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle k} of the ground truth patterns, up to translation, then the precision of the algorithm is defined as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P = k/n_{\mathcal{P}}} , the recall of the algorithm is defined as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle R = k/n_{\mathcal{Q}}} , and the F1 score as Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle F_1 = 2PR/(P + R).\ }

The above metrics, which were used by Collins et al. (2010) in one of the first evaluations of a pattern discovery task, are very strict: an output pattern Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Q} may have only one point different from a large ground truth pattern Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P} , but this will not count as a successful discovery. Therefore, we propose the following new metrics, which are robust to slight differences between output and ground truth patterns.

Robust Versions of Precision, Recall, and F1 score

Suppose that in the ground truth there is a pattern Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P} with occurrences Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{P} = \{ P_1, P_2,\ldots, P_{m_P} \}} , and in an algorith's output there is a pattern Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Q} with occurrences Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{Q} = \{ Q_1, Q_2,\ldots, Q_{m_Q} \}} . Central to evaluating an algorithm is measuring the extent to which Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{Q}} constitutes the discovery of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle \mathcal{P}} . In order to measure this, we need to be able to compute the symbolic musical similarity of Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle P_i} and Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle Q_j} . We use both the cardinality score of symbolic musical similarity,

Failed to parse (MathML with SVG or PNG fallback (recommended for modern browsers and accessibility tools): Invalid response ("Math extension cannot connect to Restbase.") from server "https://wikimedia.org/api/rest_v1/":): {\displaystyle s_c(P_i, Q_j) = |P_i \cap Q_j|/ \max \{ |P_i|, |Q_j| \}, }

and the normalised matching score s_m(P_i, Q_j), after Arzt, B&oumlck, and Widmer (2012).

Either of these similarity measures (denoted \(s\) can be recorded in a so-called score matrix, $$\begin{equation}\label{eq:score-matrix} s(\mathcal{P}, \mathcal{Q}) = \left( \begin{array}{cccc} s(P_1, Q_1) & s(P_1, Q_2) & \cdots & s(P_1, Q_{m_Q}) \\ s(P_2, Q_1) & s(P_2, Q_2) & \cdots & s(P_2, Q_{m_Q}) \\ \vdots & \vdots & \ddots & \vdots \\ s(P_{m_P}, Q_1) & s(P_{m_P}, Q_2) & \cdots & s(P_{m_P}, Q_{m_Q}) \end{array} \right). \end{equation}$$ The score matrix shows how all occurrences \(\mathcal{Q} = \{ Q_1, Q_2,\ldots, Q_{m_Q} \}\) of a pattern in an algorithm's output compare to all occurrences \(\mathcal{P} = \{ P_1, P_2,\ldots, P_{m_P} \}\) of a ground truth pattern.