Difference between revisions of "2006:Score Following Proposal"

From MIREX Wiki
(Realtime Audio to Score Alignment -- initial proposal page)
 
m (Robot: Automated text replacement (-\[\[([A-Z][^:]+)\]\] +2006:\1))
 
(21 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 +
==Results==
 +
Results are on [[2006:Score Following Results ]] page.
 +
 
== Proposers ==
 
== Proposers ==
 
* Arshia Cont (University of California in San Diego (UCSD) and Ircam - Realtime Applications Team, France) - [mailto:cont@ircam.fr cont@ircam.fr]
 
* Arshia Cont (University of California in San Diego (UCSD) and Ircam - Realtime Applications Team, France) - [mailto:cont@ircam.fr cont@ircam.fr]
* Christopher Raphael (School of informatics, Indiana University, US) - [mailto:craphael@postal.informatics.indiana.edu craphael@postal.informatics.indiana.edu]
+
* Diemo Schwarz (Ircam - Realtime Applications Team, France) - [mailto:schwarz@ircam.fr schwarz@ircam.fr]
  
 
== Title ==
 
== Title ==
Realtime Audio to Score Alignment (Score Following)
+
Score Following  
  
 
== Description ==
 
== Description ==
Evaluation of real-time Audio to Score Aligment (score following) systems. Discussion of the evaluation procedures on the [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com01 MIREX 06 "ScoreFollowing06" contest planning list] will be documented on the [[Realtime Audio to Score Alignment]] page. A full digest of the discussions is available to subscribers from the [https://mail.lis.uiuc.edu/mailman/private/mrx-com01/ MIREX 06 "ScoreFollowing06" contest planning list archives].
+
Score Following is the ''real-time'' alignment of incoming music signal to the music score. The music signal can be symbolic (Midi Score Following) or Audio.
 +
 
 +
This page describes a proposal for evaluation of score following systems. Discussion of the evaluation procedures on the [https://mail.lis.uiuc.edu/mailman/listinfo/mrx-com01 MIREX 06 "ScoreFollowing06" contest planning list] will be documented on the [[2006:Score Following]] page. A full digest of the discussions is available to subscribers from the [https://mail.lis.uiuc.edu/mailman/private/mrx-com01/ MIREX 06 "ScoreFollowing06" contest planning list archives].
  
Submissions will be required to estimate alignment precision according to the indexed times, type of alignment (monophonic, polyphonic), type of training and realtime performance.
+
Submissions will be required to estimate alignment precision according to the indexed times, type of alignment (monophonic, polyphonic), type of training and realtime performance, also separated into two domains (upon enough submissions) for symbolic and audio systems.
  
 
== Status ==
 
== Status ==
Line 15: Line 20:
 
== Evaluation procedures ==
 
== Evaluation procedures ==
  
=== Evaluation metrics ===
+
Evaluation procedure consists of running score followers on a database of aligned audio to score where the database contains score, and performance audio (for system call) and a reference alignment (for evaluations) -- See below for details.
 +
 
 +
=== Suggested calling formats for submitted algorithms ===
 +
During evaluation, each system will be called in '''command line''' with probably something like the following format:
 +
 
 +
:: <system-execution-file> <performance-audio-filename> <MIDI-score-filename> <result-filename> <log-filename>
 +
 
 +
N.B.: the previous calling format given here in July of <system-execution-file> <input-folder> <output-filename> is discouraged by the IMIRSEL team.
 +
 
 +
Your submitted binaries should be able to use the appropriate score and audio file and undertake the score following task, and write the results to the output file as given.
 +
 
 +
It is important to be able to create the output ascii file in a "different" path than the default.
  
=== Evaluator pseudo-code ===
+
In order to consider the issue of training and audio following, an alternative call format would be:
  
=== Statistical Significance testing ===
+
:: <system-execution-file> <performance-audio-filename> <MIDI-score-filename> <audio-score-filename> <reference-alignment-filename> <result-filename> <log-filename>
  
=== Visualisation ===
+
The reference alignment file links the MIDI to the audio score.
 +
The audio score can be used for training for non-audio score followers.
  
 
=== Input data ===
 
=== Input data ===
==== Content ====
+
Each system will need an Audio input as well as a Score to follow (or align).
 
==== File formats ====
 
==== File formats ====
 +
Score used for this year's MIREX would be MIDI files.
 +
Audio format would be standard WAV or AIFF, as performances of the given MIDI score.
  
 
=== Output data ===
 
=== Output data ===
 +
==== File formats ====
 +
ASCII output for each score following system as described below. column data should be separated by white space and rows by unix-newline '\n'.
 +
 
==== Content ====
 
==== Content ====
==== File formats ====
+
The ''result files'' represent the alignment found by a score following system between a MIDI score and a recording of a performance of it. They have one line per detected note with the columns:
 +
 
 +
# estimated note onset time in performance audio file (ms)
 +
# detection time relative to performance audio file (ms)
 +
# note start time in score (ms)
 +
# MIDI note number in score (int)
 +
 
 +
Remarks: The third column with the detected note's start time in score serves as the unique identifier of a note (or chord for polyphonic scores) that links it to the ground truth onset of that note within the reference alignment files.  The fourth column of MIDI note number is there only for your convenience, to know your way around in the result files, if you know the melody in MIDI.
 +
 
 +
=== Evaluation metrics ===
 +
 
 +
Evaluation is done by comparing the ''reference file'' with a ''result file'' for each audio in the database. Here are the criteria used for this year's evaluation:
 +
 
 +
==== Missed Notes ====
 +
 
 +
Missed notes are:
 +
 
 +
# Notes that are reported in the reference file and are not reported in the result file.
 +
# Notes that are reported in both reference and result file but with an offset of greater than 2000 milli-seconds.
 +
 
 +
==== False Positive ====
 +
 
 +
False positives are notes reported in both reference and result files but with a delay (absolute value offset) of greater than 2000 milli-seconds. This is also reported as part of the missed notes.
 +
 
 +
==== Offset ====
 +
 
 +
* Average ''Offset'' between the ''detected'' note onsets and reference alignment.
 +
 
 +
==== Latency ====
 +
 
 +
* Difference between detection time and the time the system sees the audio.
 +
 
 +
==== Metrics ====
 +
 
 +
All measures above are calculated both locally (i.e. for each sound file) and globally (over the whole database). This way we will have two ''precision rates'':
 +
 
 +
# '''Piecewise precision rate:''' is the average of the percentage of detected notes (<tt>total number of events - missed notes</tt>) for each piece.
 +
# '''Overall precision rate:''' is equal to percentage of <tt>total number of events to detect - total number of missed notes</tt>
 +
 
 +
Besides these measures, the following measures are provided:
 +
 
 +
* '''Average of absolute value of offset''' both piece-wise and global.
 +
* '''Mean of offset''' including negative and positive signs.
 +
* '''Standard deviation offset'''
 +
* '''Average latency''' both piece-wise and global.
 +
 
 +
=== Evaluator pseudo-code ===
 +
 
 +
For each audio/system output/reference tuple from the database,
 +
 
 +
* Compute the number of "missed note" by comparing the output and the reference.
 +
* Compute the "offset" (the difference between onset time reference and score follower's onset time output) for each note.
 +
 
 +
Errors computed in this version are:
 +
 
 +
* Missed note percentage
 +
* Average latency
 +
* Average offset
 +
* Offset statistics: mean and standard deviation
 +
 
 +
== Reference Database ==
 +
Reference database contains score, and performance audio (for system call) and a reference alignment (for evaluations).
 +
 +
=== Contributions ===
 +
 
 +
* Christopher Raphael:
 +
** Mozart ''Dorabella'', voice
 +
** Mozart ''Clarinet Concerto K370'', clarinet
 +
 
 +
* Ircam (Arshia Cont and Diemo Schwarz)
 +
** Boulez ''... Explosante-Fixe ...'', flute (47 files, duration approx. 1 hour)
 +
** Bach ''Violin Sonatas'', performed by Menuhin (Three movements)
 +
 
 +
All files were aligned by an external offline score alignment algorithm and corrected by hand.
 +
 
 +
=== Statistics ===
 +
 
 +
{| border="1"
 +
|+ MIREX06 Score Following Reference Database
 +
|-
 +
!Piece name !! Composer !! Instrument !! Number of Files !! Number of score events !! Total duration
 +
|-
 +
! ... Explosante-Fixe...
 +
|Pierre Boulez
 +
|Flute
 +
|47
 +
| 2022
 +
| 17:10
 +
|-
 +
! Violin Sonatas (excerpt)
 +
|J.S. Bach
 +
| Violin
 +
| 3
 +
| 3996
 +
| 13:50
 +
|-
 +
! Clarinet Quintet KV370
 +
| Mozart
 +
| Clarinet
 +
| 4
 +
| 2710
 +
| 14:44
 +
|-
 +
! Dorabella
 +
| Mozart
 +
| Voice
 +
| 4
 +
| 229
 +
| 01:44
 +
|- style="border-top:1px solid red; border-right:1px solid red; border-bottom:2px solid red; border-left:1px solid red; background:silver"
 +
! Total
 +
|
 +
|
 +
| 58
 +
| 8957
 +
| 47:38
 +
|}
 +
 
 +
 
 +
 
 +
=== Content Format ===
 +
==== Score Files ====
 +
Scores are in MIDI formats.
 +
==== Audio Files ====
 +
Audio will be either WAVE or AIFF that contain real performances of a given MIDI score.
 +
==== Reference alignment ====
 +
The ''reference files'' constitute a ground truth alignment between a MIDI score and a recording of it. They have one line per score note, with the columns:
 +
 
 +
# note onset time in reference audio file [ms]
 +
# note start time in score [ms]
 +
# MIDI note number in score [nn]
  
=== Suggested calling formats for submitted algorithms ===
+
Note that more columns might be added to this definition in the future, e.g. to mark trills, so please program your reference file parser in a way that additional columns don't confuse the program and are gracefully are ignored.
  
=== Maximum submission runtime ===
+
==== Example ====
  
== Test collections ==
+
To see a sample and example of the database, refer to:    http://crca.ucsd.edu/arshia/mirex06-scofo/
  
 
== Potential Participants ==
 
== Potential Participants ==
Line 44: Line 196:
 
* Diemo Schwarz (Ircam)
 
* Diemo Schwarz (Ircam)
 
* Miller Puckette (UCSD)
 
* Miller Puckette (UCSD)
 +
* Ozgur Izmirli (Connolle)
 +
* Cort Lippe (University of Buffalo)
 +
* Frank Weinstock (TimeWarp Technologies)

Latest revision as of 13:05, 13 May 2010

Results

Results are on 2006:Score Following Results page.

Proposers

  • Arshia Cont (University of California in San Diego (UCSD) and Ircam - Realtime Applications Team, France) - cont@ircam.fr
  • Diemo Schwarz (Ircam - Realtime Applications Team, France) - schwarz@ircam.fr

Title

Score Following

Description

Score Following is the real-time alignment of incoming music signal to the music score. The music signal can be symbolic (Midi Score Following) or Audio.

This page describes a proposal for evaluation of score following systems. Discussion of the evaluation procedures on the MIREX 06 "ScoreFollowing06" contest planning list will be documented on the 2006:Score Following page. A full digest of the discussions is available to subscribers from the MIREX 06 "ScoreFollowing06" contest planning list archives.

Submissions will be required to estimate alignment precision according to the indexed times, type of alignment (monophonic, polyphonic), type of training and realtime performance, also separated into two domains (upon enough submissions) for symbolic and audio systems.

Status

Evaluation procedures

Evaluation procedure consists of running score followers on a database of aligned audio to score where the database contains score, and performance audio (for system call) and a reference alignment (for evaluations) -- See below for details.

Suggested calling formats for submitted algorithms

During evaluation, each system will be called in command line with probably something like the following format:

<system-execution-file> <performance-audio-filename> <MIDI-score-filename> <result-filename> <log-filename>

N.B.: the previous calling format given here in July of <system-execution-file> <input-folder> <output-filename> is discouraged by the IMIRSEL team.

Your submitted binaries should be able to use the appropriate score and audio file and undertake the score following task, and write the results to the output file as given.

It is important to be able to create the output ascii file in a "different" path than the default.

In order to consider the issue of training and audio following, an alternative call format would be:

<system-execution-file> <performance-audio-filename> <MIDI-score-filename> <audio-score-filename> <reference-alignment-filename> <result-filename> <log-filename>

The reference alignment file links the MIDI to the audio score. The audio score can be used for training for non-audio score followers.

Input data

Each system will need an Audio input as well as a Score to follow (or align).

File formats

Score used for this year's MIREX would be MIDI files. Audio format would be standard WAV or AIFF, as performances of the given MIDI score.

Output data

File formats

ASCII output for each score following system as described below. column data should be separated by white space and rows by unix-newline '\n'.

Content

The result files represent the alignment found by a score following system between a MIDI score and a recording of a performance of it. They have one line per detected note with the columns:

  1. estimated note onset time in performance audio file (ms)
  2. detection time relative to performance audio file (ms)
  3. note start time in score (ms)
  4. MIDI note number in score (int)

Remarks: The third column with the detected note's start time in score serves as the unique identifier of a note (or chord for polyphonic scores) that links it to the ground truth onset of that note within the reference alignment files. The fourth column of MIDI note number is there only for your convenience, to know your way around in the result files, if you know the melody in MIDI.

Evaluation metrics

Evaluation is done by comparing the reference file with a result file for each audio in the database. Here are the criteria used for this year's evaluation:

Missed Notes

Missed notes are:

  1. Notes that are reported in the reference file and are not reported in the result file.
  2. Notes that are reported in both reference and result file but with an offset of greater than 2000 milli-seconds.

False Positive

False positives are notes reported in both reference and result files but with a delay (absolute value offset) of greater than 2000 milli-seconds. This is also reported as part of the missed notes.

Offset

  • Average Offset between the detected note onsets and reference alignment.

Latency

  • Difference between detection time and the time the system sees the audio.

Metrics

All measures above are calculated both locally (i.e. for each sound file) and globally (over the whole database). This way we will have two precision rates:

  1. Piecewise precision rate: is the average of the percentage of detected notes (total number of events - missed notes) for each piece.
  2. Overall precision rate: is equal to percentage of total number of events to detect - total number of missed notes

Besides these measures, the following measures are provided:

  • Average of absolute value of offset both piece-wise and global.
  • Mean of offset including negative and positive signs.
  • Standard deviation offset
  • Average latency both piece-wise and global.

Evaluator pseudo-code

For each audio/system output/reference tuple from the database,

  • Compute the number of "missed note" by comparing the output and the reference.
  • Compute the "offset" (the difference between onset time reference and score follower's onset time output) for each note.

Errors computed in this version are:

  • Missed note percentage
  • Average latency
  • Average offset
  • Offset statistics: mean and standard deviation

Reference Database

Reference database contains score, and performance audio (for system call) and a reference alignment (for evaluations).

Contributions

  • Christopher Raphael:
    • Mozart Dorabella, voice
    • Mozart Clarinet Concerto K370, clarinet
  • Ircam (Arshia Cont and Diemo Schwarz)
    • Boulez ... Explosante-Fixe ..., flute (47 files, duration approx. 1 hour)
    • Bach Violin Sonatas, performed by Menuhin (Three movements)

All files were aligned by an external offline score alignment algorithm and corrected by hand.

Statistics

MIREX06 Score Following Reference Database
Piece name Composer Instrument Number of Files Number of score events Total duration
... Explosante-Fixe... Pierre Boulez Flute 47 2022 17:10
Violin Sonatas (excerpt) J.S. Bach Violin 3 3996 13:50
Clarinet Quintet KV370 Mozart Clarinet 4 2710 14:44
Dorabella Mozart Voice 4 229 01:44
Total 58 8957 47:38


Content Format

Score Files

Scores are in MIDI formats.

Audio Files

Audio will be either WAVE or AIFF that contain real performances of a given MIDI score.

Reference alignment

The reference files constitute a ground truth alignment between a MIDI score and a recording of it. They have one line per score note, with the columns:

  1. note onset time in reference audio file [ms]
  2. note start time in score [ms]
  3. MIDI note number in score [nn]

Note that more columns might be added to this definition in the future, e.g. to mark trills, so please program your reference file parser in a way that additional columns don't confuse the program and are gracefully are ignored.

Example

To see a sample and example of the database, refer to: http://crca.ucsd.edu/arshia/mirex06-scofo/

Potential Participants

  • Arshia Cont (UCSD / Ircam)
  • Roger Dannenberg (Carnegie Mellon University)
  • Christopher Raphael (Indiana university)
  • Diemo Schwarz (Ircam)
  • Miller Puckette (UCSD)
  • Ozgur Izmirli (Connolle)
  • Cort Lippe (University of Buffalo)
  • Frank Weinstock (TimeWarp Technologies)