2021:Automatic Lyrics Transcription - Revision history

Georgi Dzhambazov at 19:35, 12 March 2022

2022-03-12T19:35:31Z

Georgi Dzhambazov at 19:35, 12 March 2022

2022-03-12T19:35:03Z

Georgi Dzhambazov: /* Submission Format */

2021-10-29T14:35:15Z

‎Submission Format

Georgi Dzhambazov: /* Submission Format */

2021-10-29T14:34:49Z

‎Submission Format

Georgi Dzhambazov: /* Bibliography */

2021-10-27T13:23:06Z

‎Bibliography

Georgi Dzhambazov: /* Evaluation Datasets */

2021-10-27T13:16:32Z

‎Evaluation Datasets

Georgi Dzhambazov: /* DALI Dataset */

2021-10-27T13:15:52Z

‎DALI Dataset

Georgi Dzhambazov: /* DALI Dataset */

2021-10-27T13:15:04Z

‎DALI Dataset

Georgi Dzhambazov: /* Bibliography */

2021-10-27T13:13:29Z

‎Bibliography

Georgi Dzhambazov: /* DAMP dataset */

2021-10-27T13:09:24Z

‎DAMP dataset

@@ Line 148: / Line 148: @@
 * 5677 words annotated in total
-== Time and hardware limits ==
+= Time and hardware limits =
 Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions will be imposed.
 A hard limit of 24 hours will be imposed on analysis times. Submissions exceeding this limit may not receive a result. In addition, submission that are not able to run with the provided RAM and CPU instructions provided by you may not receive a result.

@@ Line 147: / Line 147: @@
 * file duration up to 4:43 (total time: 1h 12m)
 * 5677 words annotated in total
 = Submission closing dates =

@@ Line 37: / Line 37: @@
 = Submission Format =
-Submissions must be done through the MIREX system (info available [https://www.music-ir.org/mirex/wiki/2021:Main_Page#MIREX_2021_Submission_Instructions here] and should be packaged in a compressed file (.zip or .rar, etc.) which contains at least two files:
+Submissions must be done through the MIREX system (info available [https://www.music-ir.org/mirex/wiki/2021:Main_Page#MIREX_2021_Submission_Instructions here]) and should be packaged in a compressed file (.zip or .rar, etc.) which contains at least two files:
 === A) The main transcription script ===

@@ Line 37: / Line 37: @@
 = Submission Format =
-Submissions should be packaged in a compressed file (.zip or .rar, etc.) which contains at least two files:
+Submissions must be done through the MIREX system (info available [https://www.music-ir.org/mirex/wiki/2021:Main_Page#MIREX_2021_Submission_Instructions here] and should be packaged in a compressed file (.zip or .rar, etc.) which contains at least two files:
 === A) The main transcription script ===
@@ Line 82: / Line 82: @@
 Any submission that is failed to meet above requirements will not be considered in evaluation!
 = Training Datasets =

@@ Line 175: / Line 175: @@
 - G.R., Barker, J. (2019) Automatic Lyric Transcription from Karaoke Vocal Tracks: Resources and a Baseline System. Proc. Interspeech 2019, 579-583, doi: 10.21437/Interspeech.2019-2378
-- Demirel, E., Ahlbäck, S., & Dixon, S. (2020). Automatic Lyrics Transcription using Dilated Convolutional Neural Networks with Self-Attention. In 2020 International Joint Conference on Neural Networks (IJCNN), 1-8. IEEE.
+- Demirel, E., Ahlbäck, S., & Dixon, S. (2020). Automatic Lyrics Transcription using Dilated Convolutional Neural Networks with Self-Attention. In IJCNN 2020, 1-8. IEEE.
-- Meseguer-Brocal, G., Cohen-Hadria, A., & Peeters, G. (2019). DALI: A large dataset of synchronized audio, lyrics and notes, automatically created using teacher-student machine learning paradigm.
+- Meseguer-Brocal, G., Cohen-Hadria, A., & Peeters, G. (2019). DALI: A large dataset of synchronized audio, lyrics and notes, automatically created using teacher-student machine learning paradigm. In ISMIR 2018.
-- Hansen, J. K., & Fraunhofer, I. D. M. T. (2012). Recognition of phonemes in a-cappella recordings using temporal patterns and mel frequency cepstral coefficients. In 9th Sound and Music Computing Conference (SMC), 494-499.
+- Gupta, C., Yılmaz, E., & Li, H. (2020). Automatic lyrics alignment and transcription in polyphonic music: Does background music help?. In ICASSP 2020, 496-500. IEEE.
-- Mauch, M., Fujihara, H., & Goto, M. (2012). Integrating additional chord information into HMM-based lyrics-to-audio alignment. IEEE Transactions on Audio, Speech, and Language Processing, 20(1), 200-210.
+- Basak, S., Agarwal, S., Ganapathy, S., & Takahashi, N. (2021, June). End-to-End Lyrics Recognition with Voice to Singing Style Transfer. In ICASSP 2021, 266-270. IEEE.
-- Stoller, D. and Durand, S. and Ewert, S. (2019) End-to-end Lyrics Alignment for Polyphonic Music Using An Audio-to-Character Recognition Model. ICASSP 2019.
+- Demirel, E., Ahlbäck, S., & Dixon, S. (2021). MSTRE-Net: Multistreaming Acoustic Modeling for Automatic Lyrics Transcription. Proc. ISMIR 2021.

@@ Line 125: / Line 125: @@
 The audio has two versions: the original mix with instrumental accompaniment and a cappella singing voice only one. An example song can be seen [https://www.dropbox.com/sh/wm6k4dqrww0fket/AAC1o1uRFxBPg9iAeSAd1Wxta?dl=0 here].
-You can read in detail about how the dataset was made here: [http://publica.fraunhofer.de/documents/N-345612.html (4)]. The recordings have been provided by Jens Kofod Hansen for public evaluation.
+You can read in detail about how the dataset was made here: [http://publica.fraunhofer.de/documents/N-345612.html (7)]. The recordings have been provided by Jens Kofod Hansen for public evaluation.
 * file duration up to 4:40 minutes (total time: 35:33 minutes)
@@ Line 135: / Line 135: @@
 The audio has instrumental accompaniment. An example song can be seen [https://www.dropbox.com/sh/8pp4u2xg93z36d4/AAAsCE2eYW68gxRhKiPH_VvFa?dl=0 here].
-You can read in detail about how the dataset was used for the first time here: [https://pdfs.semanticscholar.org/547d/7a5d105380562ca3543bf05b4d5f7a8bee66.pdf (5)] . The dataset has been provided by Sungkyun Chang.
+You can read in detail about how the dataset was used for the first time here: [https://pdfs.semanticscholar.org/547d/7a5d105380562ca3543bf05b4d5f7a8bee66.pdf (8)] . The dataset has been provided by Sungkyun Chang.
 * file duration up to 5:40 minutes (total time: 1h 19m)
@@ Line 144: / Line 144: @@
 This dataset contains 20 recordings with varying Western music genres, annotated with start-of-word timestamps. All songs have instrumental accompaniment.
-It is available online on [https://github.com/f90/jamendolyrics Github], although note that we do not allow tuning model parameters using this data, it can only be used to gain insight into the general structure of the test data. For more information also refer to [https://arxiv.org/abs/1902.06797 this paper (6)].
+It is available online on [https://github.com/f90/jamendolyrics Github], although note that we do not allow tuning model parameters using this data, it can only be used to gain insight into the general structure of the test data. For more information also refer to [https://arxiv.org/abs/1902.06797 this paper (9)].
 * file duration up to 4:43 (total time: 1h 12m)

@@ Line 104: / Line 104: @@
 === DALI Dataset ===
-DALI (a large '''D'''ataset of synchronised '''A'''udio, '''L'''yr'''I'''cs and notes) (3) is the benchmark dataset for building an acoustic model on polyphonic recordings (,) and it contains over 5000 songs with semi-automatically aligned lyrics annotations.
+DALI (a large '''D'''ataset of synchronised '''A'''udio, '''L'''yr'''I'''cs and notes) (3) is the benchmark dataset for building an acoustic model on polyphonic recordings (4,5,6) and it contains over 5000 songs with semi-automatically aligned lyrics annotations.
 The songs are commercial recordings in full-duration, whereas the lyrics are described according to different levels of granularity including words and notes (and syllables underlying a given note).

@@ Line 104: / Line 104: @@
 === DALI Dataset ===
-DALI (a large '''D'''ataset of synchronised '''A'''udio, '''L'''yr'''I'''cs and notes) is the benchmark dataset for building an acoustic model on polyphonic recordings and it contains over 5000 songs with semi-automatically aligned lyrics annotations.
+DALI (a large '''D'''ataset of synchronised '''A'''udio, '''L'''yr'''I'''cs and notes) (3) is the benchmark dataset for building an acoustic model on polyphonic recordings (,) and it contains over 5000 songs with semi-automatically aligned lyrics annotations.
 The songs are commercial recordings in full-duration, whereas the lyrics are described according to different levels of granularity including words and notes (and syllables underlying a given note).
@@ Line 110: / Line 110: @@
 For each song DALI provides a link to a matched youtube video for the audio retrieval.
-* For more details how, see its full description [https://github.com/gabolsgabs/DALI here]. Paper [https://arxiv.org/pdf/1906.10606.pdf here (3)].
+* For more details how, see its full description [https://github.com/gabolsgabs/DALI here]. Paper [https://arxiv.org/pdf/1906.10606.pdf here].
 = Evaluation Datasets =

@@ Line 99: / Line 99: @@
 * The audio can be downloaded from the [https://ccrma.stanford.edu/damp/ Smule web site]
-* Lyrics boundary annotations can be generated from raw annotations using [https://github.com/groadabike/Kaldi-Dsing-task this repository]. Paper [doi:10.21437/Interspeech.2019-2378 here (1)].
+* Lyrics boundary annotations can be generated from raw annotations using [https://github.com/groadabike/Kaldi-Dsing-task this repository]. Paper [https://isca-speech.org/archive/Interspeech_2019/pdfs/2378.pdf here (1)].
 * Or annotations can be directly retrieved in the Kaldi form [https://github.com/emirdemirel/ALTA/s5/data here] Paper [https://arxiv.org/pdf/2007.06486.pdf here (2)].