Difference between revisions of "2010:Harmonic Analysis"
|  (Initial edit) |  (→References) | ||
| (20 intermediate revisions by 2 users not shown) | |||
| Line 1: | Line 1: | ||
| == Description == | == Description == | ||
| − | This task is suggested for MIREX 2010.  | + | This task is suggested for MIREX 2010. The general goal is to estimate the latent harmonic progression from score-like music data, a process typically called harmonic analysis or functional analysis in musicology (although this task cannot be really called functional analysis, functional analysis can be used to obtain the required output data). This typically involves estimating the musical key of a given piece of music and movement of the tonal center (modulations and tonicizations), as well as the progression of chords. [http://supremedissertation.co.uk/ Dissertation Proposal] | 
| + | |||
| + | |||
| + | === Time units === | ||
| + | It is well known that harmony is strongly coupled with metric structure: progressing from one chord to another typically occurs at strong beats, while most non-chord tones are played on weak beats or off-beat. For this reason, knowing the temporal structure of the piece of music is important when performing harmonic analysis, and in this task the harmony is estimated on a beat-based time basis from score-like symbolic data (as opposed to performance data). | ||
| + | |||
| + | === Chord dictionary === | ||
| + | Harmonic analysis can be performed with varying degree of precision, depending on the goal at hand. One might only want to extract general progression patterns to be used in harmonic similarity or harmonic queries, for example check if a song at hand contains the so called 50s progression: I–vi–IV–V. Another one might be interested in full musicological analysis of a score, i.e. obtaining information about root position, chord type (quality), added tones, invertion, etc. This person would for example say the the second bar of Chopin's Nocturne Op. 9 No. 2 (in the key of E♭-major) starts in the tonic I (E♭, G, B♭), then it progresses to the dominant ninth chord in the third inversion with the root removed and the tonic added in bass (E♭, A♭, D, F). | ||
| + | |||
| + | [[File:Chopin 9 2.png||500px]] | ||
| + | |||
| + | The first person would therefore use a very small dictionary of chord labels with 48 different chords: 12 root positions × 4 basic chord types (major, minor, augmented, diminished). 48 or 24 dictionaries are common in chord recognition from audio data. On the other hand, the second person would use a very large chord label dictionary containing millions of labels. This is the case with the dataset to be used in this task. | ||
| == Data format ==   | == Data format ==   | ||
| Not much data exists that can be used as ground truth for harmonic analysis. Several researchers used Bach chorales ([6] and [8]) and some reported using Kostka-Payne corpus. Unfortunately, the latter is very small, too small in fact for proper model training and evaluation: it consists of 46 very short musical excerpts. | Not much data exists that can be used as ground truth for harmonic analysis. Several researchers used Bach chorales ([6] and [8]) and some reported using Kostka-Payne corpus. Unfortunately, the latter is very small, too small in fact for proper model training and evaluation: it consists of 46 very short musical excerpts. | ||
| + | |||
| + | === Input data === | ||
| We propose to use a dataset developed in cooperation of the members of Sagayama/Ono laboratory at the University of Tokyo and prof. Hitomi Kaneko from Toho Gakuen School of Music and her students. This dataset contains very detailed harmony labels for all of RWC database's classical pieces (almost 6 hours of music). This data will be coupled with quantized (score-like) set of MIDI files, and the latter will be converted to CSV files for easy input. The first line of such  CSV file will contain information about the number of data lines per downbeat and the number of lines per upbeat (Auftakt). The rest of the lines will hold lists of notes active in the consecutive divisions of downbeat expressed by MIDI note numbers. The first two bars of  Chopin's Nocturne Op. 9 No. 2 (see picture above) will therefore be encoded as: | We propose to use a dataset developed in cooperation of the members of Sagayama/Ono laboratory at the University of Tokyo and prof. Hitomi Kaneko from Toho Gakuen School of Music and her students. This dataset contains very detailed harmony labels for all of RWC database's classical pieces (almost 6 hours of music). This data will be coupled with quantized (score-like) set of MIDI files, and the latter will be converted to CSV files for easy input. The first line of such  CSV file will contain information about the number of data lines per downbeat and the number of lines per upbeat (Auftakt). The rest of the lines will hold lists of notes active in the consecutive divisions of downbeat expressed by MIDI note numbers. The first two bars of  Chopin's Nocturne Op. 9 No. 2 (see picture above) will therefore be encoded as: | ||
| Line 19: | Line 32: | ||
| In other words, there would be 4 lines per measure, each spanning 3 eighth notes and the first line will be a 3 eighth notes of upbeat.   | In other words, there would be 4 lines per measure, each spanning 3 eighth notes and the first line will be a 3 eighth notes of upbeat.   | ||
| − | = Ground truth = | + | === Ground truth === | 
| The ground truth for the same piece will be encoded as: | The ground truth for the same piece will be encoded as: | ||
| Line 29: | Line 42: | ||
|   ... |   ... | ||
| − | Each line will consist of harmony data corresponding to one line of the input data. Harmony is given by pitch class of the root tone followed by a list of all pitch classes belonging to the chord  | + | Each line will consist of harmony data corresponding to one line of the input data. Harmony is given by pitch class of the root tone, followed by a list of all pitch classes belonging to the chord. Pitch classes are labeled with capital Latin letters (A through G) with flats represented by minus signs and sharps by plus signs. Non-harmony (e.g. a rest or upbeat) is marked with a special symbol 'z'. So, in the example above, “E- E- G B-” means a chord with root in E♭ (which is the tonic) that consists of three notes: E♭, G and B♭, and so it is a major triad (in its root position, although the order of the pitch classes play no role — it is straightforward to extract that from the input data). | 
| − | The tonal center (and therefore the structural meaning of a given chord) is not explicitly given in order to simplify the task and allow more participants to compete. Nevertheless, the algorithms will benefit much from considering e.g. tonal center movement and degrees of scale. | + | The tonal center (and therefore the structural meaning of a given chord) is not explicitly given in order to simplify the task and allow more participants to compete. Nevertheless, the algorithms will benefit much from considering e.g. tonal center movement and degrees of scale (as in functional analysis). | 
| Labels for the entire RWC's classical portion will be made available for the MIREX participants (with some copying and usage restrictions). In order to allow cross-validation of the submitted algorithms, an independent set of labels will be created (to which the participants will have no access), and the algorithms will be executed on both sets. | Labels for the entire RWC's classical portion will be made available for the MIREX participants (with some copying and usage restrictions). In order to allow cross-validation of the submitted algorithms, an independent set of labels will be created (to which the participants will have no access), and the algorithms will be executed on both sets. | ||
| + | |||
| + | === Output data === | ||
| + | |||
| + | The output from the algorithms is expected to comply with the above description of the ground truth data format. For example, a common 24-chord dictionary algorithm would output a series of lines from the following set (12 major and 12 minor chords): | ||
| + | |||
| + |  C C E G | ||
| + |  C+ C+ E+ G+ | ||
| + |  D- D- F A- | ||
| + |  D D F+ A | ||
| + |  D+ D+ F A+ | ||
| + |  E- E- G B- | ||
| + |  E E G+ B | ||
| + |  F F A C | ||
| + |  F+ F+ A+ C+ | ||
| + |  G- G- B- D- | ||
| + |  G G B D | ||
| + |  G+ G+ B+ D+ | ||
| + |  A- A- C E- | ||
| + |  A A C+ E | ||
| + |  A+ A+ C E+ | ||
| + |  B- B- D F | ||
| + |  B B D+ F+ | ||
| + |  C C E- G | ||
| + |  C+ C+ E G+ | ||
| + |  D- D- F- A- | ||
| + |  D D F A | ||
| + |  D+ D+ F+ A+ | ||
| + |  E- E- G- B- | ||
| + |  E E G B | ||
| + |  F F A- C | ||
| + |  F+ F+ A C+ | ||
| + |  G- G- B D- | ||
| + |  G G B- D | ||
| + |  G+ G+ B D+ | ||
| + |  A- A- C- E- | ||
| + |  A A C E | ||
| + |  A+ A+ C+ E+ | ||
| + |  B- B- D- F | ||
| + |  B B D F+ | ||
| == Evaluation Measures == | == Evaluation Measures == | ||
| + | |||
| + | Evaluation is a difficult problem for 2 reasons: | ||
| + | # Ground truth is kind of an ambiguous thing in harmonic analysis, as sometimes multiple interpretations are possible. | ||
| + | # Algorithms can output partial harmonic data (as e.g. it is the case of the 24-chord dictionary algorithm mentioned earlier) and they should be awarded partial score. | ||
| + | |||
| + | Therefore, we propose to use a weighted chord distance measure along other, simpler evaluation measures (root identification accuracy, and pitch class precision, recall and F-measure). In this measure, mistaking the root will cost much more than e.g. overlooking an added ninth and mistaking major for augmented chord will be less costly than e.g. interpreting major as minor. This should enable comparison of algorithms that work with different chord dictionaries: using a simplified dictionary will decrease the estimation accuracy, but not significantly so, and an algorithm using an accurate 24-chord dictionary should outperform a poor algorithm that deals with accurate chord labels. | ||
| + | |||
| + | The following formula will be used to evaluate the performance of the submitted algorithms: | ||
| + | |||
| + | [[File:Evaluation.png||250px]] | ||
| + | |||
| + | where: | ||
| + | * ''t'' is the time. | ||
| + | * ''r'' is the root of a chord (0..11). | ||
| + | * ''N'' is the set of notes that belong to a chord. | ||
| + | * ''W'' is the 12-dimensional weighting vector. | ||
| + | * ''S'' is the score given obtained by an algorithm. | ||
| + | * ''A'' is calculated for ground truth and the hat over ''A'' means it is calculated for the estimated harmony. | ||
| + | |||
| + | The weighting vector is obtained from the circle of fifths: | ||
| + | |||
| + | [[File:Weights.png||200px]] | ||
| + | |||
| + | where ''D'' is the distance between its arguments on the circle of fifths (minimal number of steps 0..6) and ''r'' is any chord root (0..11). ''W'' is normalized to 1. | ||
| + | |||
| + | [[File:Weights plot.png||700px]] | ||
| + | |||
| + | The score will be equal to 1 if all the notes in the chord were identified correctly and between 0 and 1 for other cases (see pictures below). | ||
| + | |||
| + | [[File:Wrong_root.png||400px]] | ||
| + | |||
| + | [[File:Wrong_type.png||300px]] | ||
| + | [[File:Wrong added.png||300px]] | ||
| + | [[File:Wrong removed.png||300px]] | ||
| + | |||
| + | == Test data == | ||
| + | A dataset containing harmonic data for the entire RWC database's classical part will be made available to this task's participants. During the evaluation, this set will be coupled with a new set which is being currently developed by us and which will remain unavailable to participants in order enable assesment of the ability of the submitted algorithms to perform outside the training set. | ||
| == References == | == References == | ||
| − | #  | + | # | 
| # W. B. de Haas, R. C. Veltkamp, F. Wiering, [http://ismir2008.ismir.net/papers/ISMIR2008_252.pdf “Tonal pitch step distance: a similarity measure for chord progressions,”] Proc. of 9th ISMIR, 2008 | # W. B. de Haas, R. C. Veltkamp, F. Wiering, [http://ismir2008.ismir.net/papers/ISMIR2008_252.pdf “Tonal pitch step distance: a similarity measure for chord progressions,”] Proc. of 9th ISMIR, 2008 | ||
| # J.-F. Paiement, D. Eck, S. Bengio, [http://publications.idiap.ch/downloads/reports/2005/rr-05-58.pdf “Chord representations for probabilistic models,”] IDIAP Research report, 2005 | # J.-F. Paiement, D. Eck, S. Bengio, [http://publications.idiap.ch/downloads/reports/2005/rr-05-58.pdf “Chord representations for probabilistic models,”] IDIAP Research report, 2005 | ||
| Line 45: | Line 134: | ||
| # P. Kröger, A. Passos, M. Sampaio, G. de Cidra, [http://classes.berklee.edu/mbierylo/ICMC08/defevent/papers/cr1516.pdf “Rameau: a system for automatic harmonic analysis,”] Proc. of ICMC, 2008 | # P. Kröger, A. Passos, M. Sampaio, G. de Cidra, [http://classes.berklee.edu/mbierylo/ICMC08/defevent/papers/cr1516.pdf “Rameau: a system for automatic harmonic analysis,”] Proc. of ICMC, 2008 | ||
| # C. S. Sapp, [http://www.ccarh.org/publications/cm/vol/15/cm15-06-sapp.pdf “Computational chord-root identification in symbolic musical data: rationale, methods and applications,”] Computing in Musicology 15, 2007 | # C. S. Sapp, [http://www.ccarh.org/publications/cm/vol/15/cm15-06-sapp.pdf “Computational chord-root identification in symbolic musical data: rationale, methods and applications,”] Computing in Musicology 15, 2007 | ||
| − | # H. Taube, [http://www-camil.music.uiuc.edu/Software/cc/doc/cmj.pdf “Automatic tonal analysis: toward the implementation of a music theory workbench,”] Computer Music Journal vol. 23 nr. 4, 1999 | + | # H. Taube, [http://www-camil.music.uiuc.edu/Software/cc/doc/cmj.pdf “Automatic tonal analysis: toward the implementation of a music theory workbench,”] Computer Music Journal vol. 23 nr. 4, 1999 [http://www.dissertationwritinghelp.co.uk/ Dissertation Help] [http://www.customwrittenpaper.com/ College Essay Help] [http://power-scooters.com/ power scooters] | 
| − | + | [http://www.logoinn.com/ Logo Design] [http://www.logoinn.com/web-design/ Web Design] | |
| == Potential Participants == | == Potential Participants == | ||
| == Discussion for 2010 == | == Discussion for 2010 == | ||
Latest revision as of 06:58, 4 June 2011
Contents
Description
This task is suggested for MIREX 2010. The general goal is to estimate the latent harmonic progression from score-like music data, a process typically called harmonic analysis or functional analysis in musicology (although this task cannot be really called functional analysis, functional analysis can be used to obtain the required output data). This typically involves estimating the musical key of a given piece of music and movement of the tonal center (modulations and tonicizations), as well as the progression of chords. Dissertation Proposal
Time units
It is well known that harmony is strongly coupled with metric structure: progressing from one chord to another typically occurs at strong beats, while most non-chord tones are played on weak beats or off-beat. For this reason, knowing the temporal structure of the piece of music is important when performing harmonic analysis, and in this task the harmony is estimated on a beat-based time basis from score-like symbolic data (as opposed to performance data).
Chord dictionary
Harmonic analysis can be performed with varying degree of precision, depending on the goal at hand. One might only want to extract general progression patterns to be used in harmonic similarity or harmonic queries, for example check if a song at hand contains the so called 50s progression: I–vi–IV–V. Another one might be interested in full musicological analysis of a score, i.e. obtaining information about root position, chord type (quality), added tones, invertion, etc. This person would for example say the the second bar of Chopin's Nocturne Op. 9 No. 2 (in the key of E♭-major) starts in the tonic I (E♭, G, B♭), then it progresses to the dominant ninth chord in the third inversion with the root removed and the tonic added in bass (E♭, A♭, D, F).
The first person would therefore use a very small dictionary of chord labels with 48 different chords: 12 root positions × 4 basic chord types (major, minor, augmented, diminished). 48 or 24 dictionaries are common in chord recognition from audio data. On the other hand, the second person would use a very large chord label dictionary containing millions of labels. This is the case with the dataset to be used in this task.
Data format
Not much data exists that can be used as ground truth for harmonic analysis. Several researchers used Bach chorales ([6] and [8]) and some reported using Kostka-Payne corpus. Unfortunately, the latter is very small, too small in fact for proper model training and evaluation: it consists of 46 very short musical excerpts.
Input data
We propose to use a dataset developed in cooperation of the members of Sagayama/Ono laboratory at the University of Tokyo and prof. Hitomi Kaneko from Toho Gakuen School of Music and her students. This dataset contains very detailed harmony labels for all of RWC database's classical pieces (almost 6 hours of music). This data will be coupled with quantized (score-like) set of MIDI files, and the latter will be converted to CSV files for easy input. The first line of such CSV file will contain information about the number of data lines per downbeat and the number of lines per upbeat (Auftakt). The rest of the lines will hold lists of notes active in the consecutive divisions of downbeat expressed by MIDI note numbers. The first two bars of Chopin's Nocturne Op. 9 No. 2 (see picture above) will therefore be encoded as:
4 1 62 39 55 58 63 67 79 51 56 59 62 68 77 79 39 55 58 63 67 77 38 55 58 63 67 70 75 ...
In other words, there would be 4 lines per measure, each spanning 3 eighth notes and the first line will be a 3 eighth notes of upbeat.
Ground truth
The ground truth for the same piece will be encoded as:
z E- E- G B- B- E- A- D F E- E- G B- E- D E- G B- ...
Each line will consist of harmony data corresponding to one line of the input data. Harmony is given by pitch class of the root tone, followed by a list of all pitch classes belonging to the chord. Pitch classes are labeled with capital Latin letters (A through G) with flats represented by minus signs and sharps by plus signs. Non-harmony (e.g. a rest or upbeat) is marked with a special symbol 'z'. So, in the example above, “E- E- G B-” means a chord with root in E♭ (which is the tonic) that consists of three notes: E♭, G and B♭, and so it is a major triad (in its root position, although the order of the pitch classes play no role — it is straightforward to extract that from the input data).
The tonal center (and therefore the structural meaning of a given chord) is not explicitly given in order to simplify the task and allow more participants to compete. Nevertheless, the algorithms will benefit much from considering e.g. tonal center movement and degrees of scale (as in functional analysis).
Labels for the entire RWC's classical portion will be made available for the MIREX participants (with some copying and usage restrictions). In order to allow cross-validation of the submitted algorithms, an independent set of labels will be created (to which the participants will have no access), and the algorithms will be executed on both sets.
Output data
The output from the algorithms is expected to comply with the above description of the ground truth data format. For example, a common 24-chord dictionary algorithm would output a series of lines from the following set (12 major and 12 minor chords):
C C E G C+ C+ E+ G+ D- D- F A- D D F+ A D+ D+ F A+ E- E- G B- E E G+ B F F A C F+ F+ A+ C+ G- G- B- D- G G B D G+ G+ B+ D+ A- A- C E- A A C+ E A+ A+ C E+ B- B- D F B B D+ F+ C C E- G C+ C+ E G+ D- D- F- A- D D F A D+ D+ F+ A+ E- E- G- B- E E G B F F A- C F+ F+ A C+ G- G- B D- G G B- D G+ G+ B D+ A- A- C- E- A A C E A+ A+ C+ E+ B- B- D- F B B D F+
Evaluation Measures
Evaluation is a difficult problem for 2 reasons:
- Ground truth is kind of an ambiguous thing in harmonic analysis, as sometimes multiple interpretations are possible.
- Algorithms can output partial harmonic data (as e.g. it is the case of the 24-chord dictionary algorithm mentioned earlier) and they should be awarded partial score.
Therefore, we propose to use a weighted chord distance measure along other, simpler evaluation measures (root identification accuracy, and pitch class precision, recall and F-measure). In this measure, mistaking the root will cost much more than e.g. overlooking an added ninth and mistaking major for augmented chord will be less costly than e.g. interpreting major as minor. This should enable comparison of algorithms that work with different chord dictionaries: using a simplified dictionary will decrease the estimation accuracy, but not significantly so, and an algorithm using an accurate 24-chord dictionary should outperform a poor algorithm that deals with accurate chord labels.
The following formula will be used to evaluate the performance of the submitted algorithms:
where:
- t is the time.
- r is the root of a chord (0..11).
- N is the set of notes that belong to a chord.
- W is the 12-dimensional weighting vector.
- S is the score given obtained by an algorithm.
- A is calculated for ground truth and the hat over A means it is calculated for the estimated harmony.
The weighting vector is obtained from the circle of fifths:
where D is the distance between its arguments on the circle of fifths (minimal number of steps 0..6) and r is any chord root (0..11). W is normalized to 1.
The score will be equal to 1 if all the notes in the chord were identified correctly and between 0 and 1 for other cases (see pictures below).
Test data
A dataset containing harmonic data for the entire RWC database's classical part will be made available to this task's participants. During the evaluation, this set will be coupled with a new set which is being currently developed by us and which will remain unavailable to participants in order enable assesment of the ability of the submitted algorithms to perform outside the training set.
References
- W. B. de Haas, R. C. Veltkamp, F. Wiering, “Tonal pitch step distance: a similarity measure for chord progressions,” Proc. of 9th ISMIR, 2008
- J.-F. Paiement, D. Eck, S. Bengio, “Chord representations for probabilistic models,” IDIAP Research report, 2005
- J.-F. Paiement, D. Eck, S. Bengio, ”A probabilistic model for chord progressions,” Proc. of 6th ISMIR, 2005
- C. Raphael, J. Stoddard, “Harmonic analysis with probabilistic graphical models,” Proc. of 4th ISMIR, 2003
- P. Kröger, A. Passos, M. Sampaio, G. de Cidra, “Rameau: a system for automatic harmonic analysis,” Proc. of ICMC, 2008
- C. S. Sapp, “Computational chord-root identification in symbolic musical data: rationale, methods and applications,” Computing in Musicology 15, 2007
- H. Taube, “Automatic tonal analysis: toward the implementation of a music theory workbench,” Computer Music Journal vol. 23 nr. 4, 1999 Dissertation Help College Essay Help power scooters









