Difference between revisions of "2015:Set List Identification"

From MIREX Wiki
(Created page with "== Description == The singing voice separation task solicits competing entries to blindly separate the singer's voice from pop music recordings. The entries are evaluated using ...")
 
Line 1: Line 1:
== Description ==
+
__TOC__
  
The singing voice separation task solicits competing entries to blindly separate the singer's voice from pop music recordings. The entries are evaluated using standard metrics (see Evaluation below).
+
==Description==
  
=== Task specific mailing list ===
+
'''This task is new for 2015!'''
  
All discussions take place on the MIREX "EvalFest" list. If you have an question or comment, simply include the task name in the subject heading.
+
This task requires that algorithm identify the '''set list''' (list of song sequence) in live concert. (See [http://en.wikipedia.org/wiki/Set_list Set list])
 +
 
 +
Recently, more and more full-length live concert videos have become available on website (e.g. [https://www.youtube.com/ Youtube]). Most of them are lacking sufficient information to describe itself, such as the set list, and start/end time of each song. In this task, we collect the audio of live concerts and studio songs, applying music information retrieval techniques to answer this question -- what songs had been sung in this concert and when are the songs start and end.
 +
 
 +
For the first step of this task, we assume that which '''artist is known'''. In the live concert, '''the performers play their studio songs only''', however the ultimate goal is granted a full-length live concert audio and studio song database, we still can find out the set list and the start/end time of each song.
 +
 
 +
here are two sub tasks in this task:
 +
 
 +
===Sub task 1: Set list identify===
 +
*To identify the set list (only song sequences) of a live concert.
 +
 
 +
Assign a live concert audio and studio songs dataset of specific artist. Assuming all songs in live concert are included in studio songs dataset, to identify the set list of this live concert.
 +
 
 +
===Sub task 2: Boundary identify===
 +
*To identify the start/end time of set list
 +
 
 +
Assign a live concert audio, the set list of live concert and studio songs dataset of specific artist, to identify start time and end time of each song in live concert.
  
 
== Data ==
 
== Data ==
 +
To satisfy our assessment, we pre-process all audio -- '''removing the "out of artist song" form the audio'''. (See the [http://140.109.22.101/MediaWiki/index.php?title=Set_List_Identification#Description description])
 +
 +
We provide two set for this task,participating algorithms will have to read audio in the following format.
  
A collection of 100 clips of recorded pop music (vocals plus music) are used to evaluate the singing voice separation algorithms.
+
* Sample rate: 44.1 KHz
 +
* Sample size: 16 bit
 +
* Number of channels: 1 (mono)
 +
* Encoding: WAV
 +
 
 +
===Developing set===
 +
This set contain 3 artists and 7 live concerts, the following information would be release
 +
* artist
 +
* live concert name and links
 +
* studio collection list
 +
* start/end time tags
  
 
Collection statistics:
 
Collection statistics:
 +
* 3 artists
 +
* 7 live concerts
 +
* 279 tracks
  
# Size of collection: 100 clips
+
=== Testing set ===
# Audio details: 16-bit, mono, 44.1kHz, WAV
+
This set contain 7 artists and 13 live concerts, no information would be release.
# Duration of each clip: 30 seconds
+
 
 +
Collection statistics:
 +
* 7 artists
 +
* 13 live concerts
 +
* 873 tracks
  
 
== Evaluation ==
 
== Evaluation ==
  
For evaluation we use [http://hal.inria.fr/inria-00630985/PDF/vincent_SigPro11.pdf Vincent ''et al.'''s (2012)] Source to Distortion Ratio (SDR), Source to Interferences Ratio (SIR), and Sources to Artifacts Ratio (SAR), as implemented by [http://bass-db.gforge.inria.fr/bss_eval/bss_eval_sources.m bss_eval_sources.m] in [http://bass-db.gforge.inria.fr/bss_eval/ BSS Eval Version 3.0]. Everything will be normalized to enable a fairer evaluation. More specifically, their function will be invoked as follows:
+
=== Sub task 1===
 +
 
 +
* Edit distance (see [http://en.wikipedia.org/wiki/Edit_distance Edit distance])
 +
 
 +
=== Sub task 2===
 +
 
 +
* average time boundary
 +
 
 +
set list contains <math>N</math> songs
 +
 
 +
Start time of ground truth:<math>sBD_{GT}</math>
 +
 
 +
end time of ground truth:<math>eBD_{GT}</math>
 +
 
 +
Start time of identification result:<math>sBD_{ID}</math>
  
>> trueVoice = wavread('trueVoice.wav');
+
end time of identification result:<math>eBD_{ID}</math>
>> trueKaraoke = wavread('trueKaraoke.wav');
 
>> trueMixed = trueVoice + trueKaraoke;
 
>> [estimatedVoice, estimatedKaraoke] = wrapper_function_calling_your_separation_algorithm(trueMixed);
 
>> [SDR, SIR, SAR] = bss_eval_sources([estimatedVoice estimatedKaraoke]' / norm(estimatedVoice + estimatedKaraoke), [trueVoice trueKaraoke]' / norm(trueVoice + trueKaraoke));
 
>> [NSDR, NSIR, NSAR] = bss_eval_sources([trueMixed trueMixed]' / norm(trueMixed + trueMixed), [trueVoice trueKaraoke]' / norm(trueVoice + trueKaraoke));
 
>> NSDR = SDR - NSDR;
 
>> NSIR = SIR - NSIR;
 
>> NSAR = SAR - NSAR;
 
  
The final scores will be determined by the mean over all 100 clips (note that GSIR and GSAR are not normalized):
+
<big><math> AVGsBD = \frac{|sBD_{GT}\space -\space sBD_{ID}|}{N}  </math></big>,
  
<math>GNSDR=\frac{\sum_{i=1}^{100}NSDR_i}{100}</math>,
+
<big><math> AVGeBD = \frac{|eBD_{GT}\space -\space eBD_{ID}|}{N} </math></big>,
  
<math>GSIR=\frac{\sum_{i=1}^{100}SIR_i}{100}</math>,
+
=== Runtime performance ===
 +
In addition computation times for feature extraction and training/classification will be measured.
  
<math>GSAR=\frac{\sum_{i=1}^{100}SAR_i}{100}</math>.
+
== Submission Format ==
 +
* '''\n''' is end of line
  
In addition, sd, min, max and median will also be reported.
+
Submission to this task will have to conform to a specified format detailed below.
 +
=== Implementation details ===
 +
we recommend your submission folder construction as follow:
 +
/root_folder/... all the code you submitted
 +
/root_folder/extract_feature/... all feature your extracted
 +
/root_folder/output/... the folder to save results
  
== Submission format ==
+
=== Sub task 1 ===
  
Participants are required to submit an entry that takes in an input filename (full native pathname ending in *.wav) and an output directory as arguments. The entries must send their voice-separated outputs to *-voice.wav and *-music.wav under the output directory. For example:
+
==== Input file ====
 +
The input for studio songs list file format will be of the form:
  
  function singing_voice_separation(infile, outdir)
+
  /path/to/artist_1/studio/song/001.wav\n  1st
  [~, name, ext] = fileparts(infile);
+
/path/to/artist_1/studio/song/002.wav\n  2nd
your_algorithm(infile, fullfile(outdir, [name '-voice' ext]), fullfile(outdir, [name '-music' ext]));
+
/path/to/artist_1/studio/song/003.wav\n  3rd
   
+
...
  function your_algorithm(infile, voiceoutfile, musicoutfile)
+
 
  mixed = wavread(infile);
+
The input for live concert list file format will be of the form:
   
+
 
  % Insert your algorithm here
+
  /path/to/artist_1/live/concert/001.wav\n
   
+
 
  wavwrite(voice, 44100, voiceoutfile);
+
==== Output file ====
wavwrite(music, 44100, musicoutfile);
+
The output is a list file (song ID sequence), '''the song ID is the order of input list file''', not the file name of *.wav file.
 +
 
 +
3\n  <-- 003.wav is the first song of set list for your identification result
 +
  17\n
 +
  59\n
 +
  ...
 +
 
 +
=== Sub task 2 ===
 +
==== Input file ====
 +
 
 +
The input is a list of song ID (song ID sequence), '''the song ID is the order of studio songs list file'''.
 +
 
 +
Your system should read the *.wav file according that order and find the time boundary of the song.
 +
 
 +
  3\n
 +
  17\n
 +
  59\n
 +
  ...
 +
 
 +
==== Output file ====
  
If scratch space is required, please use the three-argument format instead:
+
The output for studio songs time boundary list file format will be of the form:
 +
* please round the time boundary to millisecond
 +
* '''\t''' is tab space
 +
Start time                          end time
 +
hours.minutes.seconds.milliseconds \t hours.minutes.seconds.milliseconds\n  ...for input input sond ID:3
 +
hours.minutes.seconds.milliseconds \t hours.minutes.seconds.milliseconds\n  ...for input input sond ID:17
 +
hours.minutes.seconds.milliseconds \t hours.minutes.seconds.milliseconds\n  ...for input input sond ID:59
 +
...
  
  function singing_voice_separation(infile, outdir, tmpdir)
+
Examples:
 +
  0.7.23.521    0.13.24.512
 +
0.14.3.021    0.19.53.38
 +
0.20.9.893    0.27.15.987
 +
...
 +
...
 +
0.56.22.433    1.1.46.593
 +
1.3.51.146    1.9.21.138
 +
...
  
Following the convention of other MIREX tasks, an extended abstract is also required (see MIREX 2015 Submission Instructions below).
+
=== Packaging submissions ===
  
== Packaging submissions ==
+
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).
All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed).
 
# Be sure to follow the [[2006:Best Coding Practices for MIREX | Best Coding Practices for MIREX]].
 
# Be sure to follow the [[MIREX 2015 Submission Instructions]]. For example, under '''Very Important Things to Note''', Clause 6 states that if you plan to submit more than one algorithm or algorithm variant to a given task, EACH algorithm or variant needs its own complete submission to be made including the README and binary bundle upload. Each package will be given its own unique identifier. Tell us in the README the priority of a given algorithm in case we have to limit a task to only one or two algorithms/variants per submitter/team. [Note: our current limit is two entries per team.]
 
  
 
All submissions should include a README file including the following the information:
 
All submissions should include a README file including the following the information:
# Command line calling format for all executables and an example formatted set of commands
 
# Number of threads/cores used or whether this should be specified on the command line
 
# Expected memory footprint
 
# Expected runtime
 
# Approximately how much scratch disk space will the submission need to store any feature/cache files?
 
# Any required environments/architectures (and versions), e.g. python, java, bash, matlab.
 
# Any special notice regarding to running your algorithm
 
  
Note that the information that you place in the README file is '''extremely''' important in ensuring that your submission is evaluated properly.
+
* Which task you want to participate (sub task1, sub task2 or all)
 +
* Command line calling format for all executables and an example formatted set of commands
 +
* Number of threads/cores used or whether this should be specified on the command line
 +
* Expected memory footprint
 +
* Expected runtime
 +
* Any required environments (and versions), e.g. python, java, bash, matlab.
  
 
== Time and hardware limits ==
 
== Time and hardware limits ==
 
+
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.  
Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.
+
 
+
A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.
A hard limit of 24 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.  
 
  
 
== Potential Participants ==
 
== Potential Participants ==
 
 
name / email
 
name / email

Revision as of 08:11, 11 May 2015

Description

This task is new for 2015!

This task requires that algorithm identify the set list (list of song sequence) in live concert. (See Set list)

Recently, more and more full-length live concert videos have become available on website (e.g. Youtube). Most of them are lacking sufficient information to describe itself, such as the set list, and start/end time of each song. In this task, we collect the audio of live concerts and studio songs, applying music information retrieval techniques to answer this question -- what songs had been sung in this concert and when are the songs start and end.

For the first step of this task, we assume that which artist is known. In the live concert, the performers play their studio songs only, however the ultimate goal is granted a full-length live concert audio and studio song database, we still can find out the set list and the start/end time of each song.

here are two sub tasks in this task:

Sub task 1: Set list identify

  • To identify the set list (only song sequences) of a live concert.

Assign a live concert audio and studio songs dataset of specific artist. Assuming all songs in live concert are included in studio songs dataset, to identify the set list of this live concert.

Sub task 2: Boundary identify

  • To identify the start/end time of set list

Assign a live concert audio, the set list of live concert and studio songs dataset of specific artist, to identify start time and end time of each song in live concert.

Data

To satisfy our assessment, we pre-process all audio -- removing the "out of artist song" form the audio. (See the description)

We provide two set for this task,participating algorithms will have to read audio in the following format.

  • Sample rate: 44.1 KHz
  • Sample size: 16 bit
  • Number of channels: 1 (mono)
  • Encoding: WAV

Developing set

This set contain 3 artists and 7 live concerts, the following information would be release

  • artist
  • live concert name and links
  • studio collection list
  • start/end time tags

Collection statistics:

  • 3 artists
  • 7 live concerts
  • 279 tracks

Testing set

This set contain 7 artists and 13 live concerts, no information would be release.

Collection statistics:

  • 7 artists
  • 13 live concerts
  • 873 tracks

Evaluation

Sub task 1

Sub task 2

  • average time boundary

set list contains songs

Start time of ground truth:

end time of ground truth:

Start time of identification result:

end time of identification result:

Failed to parse (unknown function "\space"): {\displaystyle AVGsBD = \frac{|sBD_{GT}\space -\space sBD_{ID}|}{N} } ,

Failed to parse (unknown function "\space"): {\displaystyle AVGeBD = \frac{|eBD_{GT}\space -\space eBD_{ID}|}{N} } ,

Runtime performance

In addition computation times for feature extraction and training/classification will be measured.

Submission Format

  • \n is end of line

Submission to this task will have to conform to a specified format detailed below.

Implementation details

we recommend your submission folder construction as follow:

/root_folder/... all the code you submitted
/root_folder/extract_feature/... all feature your extracted
/root_folder/output/... the folder to save results

Sub task 1

Input file

The input for studio songs list file format will be of the form:

/path/to/artist_1/studio/song/001.wav\n  1st
/path/to/artist_1/studio/song/002.wav\n  2nd
/path/to/artist_1/studio/song/003.wav\n  3rd
... 

The input for live concert list file format will be of the form:

/path/to/artist_1/live/concert/001.wav\n

Output file

The output is a list file (song ID sequence), the song ID is the order of input list file, not the file name of *.wav file.

3\n   <-- 003.wav is the first song of set list for your identification result
17\n
59\n
...

Sub task 2

Input file

The input is a list of song ID (song ID sequence), the song ID is the order of studio songs list file.

Your system should read the *.wav file according that order and find the time boundary of the song.

3\n
17\n
59\n
...

Output file

The output for studio songs time boundary list file format will be of the form:

  • please round the time boundary to millisecond
  • \t is tab space
Start time                           end time
hours.minutes.seconds.milliseconds \t hours.minutes.seconds.milliseconds\n  ...for input input sond ID:3
hours.minutes.seconds.milliseconds \t hours.minutes.seconds.milliseconds\n  ...for input input sond ID:17
hours.minutes.seconds.milliseconds \t hours.minutes.seconds.milliseconds\n  ...for input input sond ID:59
... 

Examples:

0.7.23.521    0.13.24.512
0.14.3.021    0.19.53.38
0.20.9.893    0.27.15.987
...
...
0.56.22.433    1.1.46.593
1.3.51.146    1.9.21.138
...

Packaging submissions

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guarenteed).

All submissions should include a README file including the following the information:

  • Which task you want to participate (sub task1, sub task2 or all)
  • Command line calling format for all executables and an example formatted set of commands
  • Number of threads/cores used or whether this should be specified on the command line
  • Expected memory footprint
  • Expected runtime
  • Any required environments (and versions), e.g. python, java, bash, matlab.

Time and hardware limits

Due to the potentially high number of particpants in this and other audio tasks, hard limits on the runtime of submissions are specified.

A hard limit of 72 hours will be imposed on runs (total feature extraction and querying times). Submissions that exceed this runtime may not receive a result.

Potential Participants

name / email