Difference between revisions of "2014:Singing Voice Separation Results"
From MIREX Wiki
m (→Labels) |
|||
Line 36: | Line 36: | ||
|- | |- | ||
! LFR1 | ! LFR1 | ||
− | | Kernel Additive Modelling with light models || style="text-align: center;" | | + | | Kernel Additive Modelling with light models || style="text-align: center;" | [http://dx.doi.org/10.1109/ICASSP.2015.7177935 PDF] || Antoine Liutkus, Derry Fitzgerald, Zafar Rafii |
|- | |- | ||
! RNA1 | ! RNA1 |
Revision as of 21:27, 18 November 2015
Contents
[hide]Introduction
Description
These are the results for the 2014 running of the Singing Voice Separation task set. For more information about this task set please refer to the 2014:Singing Voice Separation page.
Legend
Submission code | Submission name | Abstract PDF | Contributors |
---|---|---|---|
GW1 | Bayesian Singing-Voice Separation | Guan-Xiang Wang, Po-Kai Yang, Chung-Chien Hsu, Jen-Tzung Chien | |
HKHS1 | Singing-Voice Separation using Deep Recurrent Neural Networks | Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis | |
HKHS2 | Singing-Voice Separation using Deep Recurrent Neural Networks | Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis | |
HKHS3 | Singing-Voice Separation using Deep Recurrent Neural Networks | Po-Sen Huang, Minje Kim, Mark Hasegawa-Johnson, Paris Smaragdis | |
IIY1 | Singing Voice Separation and Vocal F0 Estimation based on Robust PCA and Subharmonic Summation | Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii | |
IIY2 | Singing Voice Separation and Vocal F0 Estimation based on Robust PCA and Subharmonic Summation | Yukara Ikemiya, Katsutoshi Itoyama, Kazuyoshi Yoshii | |
JL1 | Singing Voice Separation Based on Sparse Nature and Spectral/Temporal Discontinuity | Il-Young Jeong, Kyogu Lee | |
LFR1 | Kernel Additive Modelling with light models | Antoine Liutkus, Derry Fitzgerald, Zafar Rafii | |
RNA1 | Singing Voice Separation using Adaptive Window Harmonic Sinusoidal Modeling | Preeti Rao, Nagesh Nayak, Sharath Adavanne | |
RP1 | REPET-SIM for Singing Voice Separation | Zafar Rafii, Bryan Pardo | |
YC1 | MIREX 2014 Submission for Singing Voice Separation | Frederick Yen, Tai-Shih Chi |
Evaluation Criteria
GNSDR = Global Normalized Signal-to-Distortion Ratio
NSDR = Normalized Signal-to-Distortion Ratio
SIR = Signal-to-Interference Ratio
SAR = Signal-to-Artifacts Ratio
Summary
Summary Results
Algorithm | Voice GNSDR (dB) | Music GNSDR (dB) | Runtime (hh) |
---|---|---|---|
GW1 | 2.8861 | 5.2549 | 24 |
HKHS1 | -1.3988 | 0.3483 | 06 |
HKHS2 | -1.9413 | 0.5239 | 06 |
HKHS3 | -2.4807 | 0.1414 | 06 |
IIY1 | 4.2190 | 7.7893 | 02 |
IIY2 | 4.4764 | 7.8661 | 02 |
JL1 | 4.1564 | 5.6304 | 01 |
LFR1 | 0.6499 | 3.0867 | 03 |
RNA1 | 3.6915 | 7.3153 | 06 |
RP1 | 2.8602 | 5.0306 | 01 |
YC1 | -0.8202 | -3.1150 | 13 |
NSDR
For the Singing Voice (dB)
Algorithm | Mean | SD | Min | Max | Median |
---|---|---|---|---|---|
GW1 | 2.8861 | 3.4543 | -7.1344 | 12.819 | 2.5745 |
HKHS1 | -1.3988 | 3.0574 | -9.4234 | 4.292 | -1.2971 |
HKHS2 | -1.9413 | 3.2899 | -11.309 | 7.2794 | -1.4234 |
HKHS3 | -2.4807 | 3.8173 | -12.272 | 9.7879 | -1.5772 |
IIY1 | 4.219 | 3.2378 | -3.4536 | 15.517 | 4.4345 |
IIY2 | 4.4764 | 3.0584 | -2.3763 | 16.212 | 4.2927 |
JL1 | 4.1564 | 3.9819 | -3.9431 | 15.822 | 3.7558 |
LFR1 | 0.64992 | 3.7455 | -9.6199 | 7.4555 | 0.97393 |
RNA1 | 3.6915 | 3.4319 | -1.8064 | 14.38 | 3.4024 |
RP1 | 2.8602 | 2.7926 | -3.771 | 12.105 | 2.4553 |
YC1 | -0.82015 | 3.4857 | -8.7424 | 7.9435 | -0.42864 |
For the Music Accompaniment (dB)
Algorithm | Mean | SD | Min | Max | Median |
---|---|---|---|---|---|
GW1 | 5.2549 | 4.0553 | -0.792 | 16.155 | 5.0222 |
HKHS1 | 0.34825 | 2.207 | -5.7359 | 6.1051 | 0.33855 |
HKHS2 | 0.52394 | 2.5029 | -6.1304 | 6.0994 | 0.90947 |
HKHS3 | 0.14144 | 2.3196 | -6.3693 | 5.8651 | 0.55883 |
IIY1 | 7.7893 | 3.0938 | -4.0068 | 13.949 | 8.1274 |
IIY2 | 7.8661 | 3.5329 | -2.4807 | 15.082 | 8.7023 |
JL1 | 5.6304 | 4.0732 | -0.91101 | 17.648 | 5.5284 |
LFR1 | 3.0867 | 2.6421 | -6.9241 | 10.887 | 2.9156 |
RNA1 | 7.3153 | 2.9143 | -5.9455 | 13.753 | 7.5214 |
RP1 | 5.0306 | 3.004 | -0.99542 | 15.424 | 4.9872 |
YC1 | -3.115 | 3.6797 | -12.229 | 3.5503 | -2.9997 |
Boxplots
SIR
For the Singing Voice (dB)
Algorithm | Mean | SD | Min | Max | Median |
---|---|---|---|---|---|
GW1 | 6.9844 | 9.43 | -26.961 | 18.215 | 8.8768 |
HKHS1 | 6.7499 | 10.673 | -30.345 | 23.034 | 7.058 |
HKHS2 | 8.3009 | 11.705 | -32.287 | 29.393 | 7.8647 |
HKHS3 | 7.7489 | 12.137 | -30.839 | 28.544 | 8.9649 |
IIY1 | 15.472 | 11.954 | -28.445 | 32.446 | 18.307 |
IIY2 | 13.267 | 11.466 | -30.369 | 30.901 | 16.314 |
JL1 | 9.6169 | 9.6173 | -24.122 | 24.341 | 11.755 |
LFR1 | 10.454 | 10.442 | -26.952 | 23.638 | 13.042 |
RNA1 | 16.323 | 10.951 | -24.713 | 34.263 | 18.799 |
RP1 | 7.2958 | 9.7631 | -28.981 | 20.303 | 9.7841 |
YC1 | 10.873 | 10.809 | -28.646 | 27.301 | 12.837 |
For the Music Accompaniment (dB)
Algorithm | Mean | SD | Min | Max | Median |
---|---|---|---|---|---|
GW1 | 6.96 | 13.076 | -12.643 | 42.653 | 4.3054 |
HKHS1 | 1.4953 | 10.084 | -13.232 | 38.909 | -1.4525 |
HKHS2 | 2.4162 | 10.465 | -9.9978 | 34.081 | -0.51852 |
HKHS3 | 0.90212 | 9.7862 | -11.02 | 34.345 | -0.66779 |
IIY1 | 12.44 | 8.1972 | -0.61968 | 41.502 | 11.163 |
IIY2 | 14.301 | 8.3307 | 0.49447 | 41.767 | 13.809 |
JL1 | 5.6509 | 10.636 | -13.16 | 39.5 | 4.3978 |
LFR1 | 4.4493 | 10.109 | -11.445 | 41.717 | 2.0394 |
RNA1 | 12.938 | 8.5096 | -1.3967 | 40.34 | 11.979 |
RP1 | 5.5158 | 10.417 | -11.092 | 44.235 | 4.6256 |
YC1 | 0.90846 | 8.4936 | -12.296 | 32.53 | -0.63057 |
Boxplots
SAR
For the Singing Voice (dB)
Algorithm | Mean | SD | Min | Max | Median |
---|---|---|---|---|---|
GW1 | 10.398 | 6.6431 | -13.219 | 19.227 | 11.757 |
HKHS1 | 4.4392 | 5.1179 | -16.316 | 17.676 | 5.1547 |
HKHS2 | 3.6845 | 6.1018 | -23.638 | 15.233 | 4.4692 |
HKHS3 | 3.6391 | 5.615 | -14.303 | 15.068 | 3.2243 |
IIY1 | 7.7078 | 7.4547 | -25.591 | 16.613 | 9.6827 |
IIY2 | 8.5817 | 7.2202 | -24.222 | 17.066 | 10.487 |
JL1 | 10.026 | 7.5205 | -16.962 | 21.028 | 11.47 |
LFR1 | 4.729 | 5.6625 | -23.426 | 12.721 | 5.5804 |
RNA1 | 6.662 | 7.3118 | -25.659 | 15.083 | 8.9188 |
RP1 | 9.8241 | 6.5477 | -13.189 | 24.156 | 11.033 |
YC1 | 2.9058 | 5.0893 | -21.403 | 8.872 | 4.0133 |
For the Music Accompaniment (dB)
Algorithm | Mean | SD | Min | Max | Median |
---|---|---|---|---|---|
GW1 | 8.7701 | 3.3088 | -2.6918 | 16.204 | 8.6529 |
HKHS1 | 4.4585 | 4.2757 | -12.382 | 16.774 | 4.7823 |
HKHS2 | 4.2321 | 4.22 | -8.0849 | 14.269 | 4.5204 |
HKHS3 | 5.3476 | 4.6397 | -12.255 | 14.438 | 5.794 |
IIY1 | 5.4262 | 3.1853 | -2.678 | 15.981 | 5.2362 |
IIY2 | 5.0379 | 3.325 | -3.1753 | 16.62 | 5.0873 |
JL1 | 9.6038 | 3.7963 | -3.8019 | 17.577 | 10.158 |
LFR1 | 4.8871 | 3.4349 | -12.787 | 10.789 | 5.0422 |
RNA1 | 4.7221 | 3.4545 | -1.6892 | 15.052 | 4.9501 |
RP1 | 7.6957 | 3.3901 | -7.2754 | 14.854 | 7.9782 |
YC1 | -1.9525 | 2.8357 | -12.203 | 6.2045 | -2.3271 |
Boxplots
Individual Spectrograms
As the MIREX test set is private, we use three other songs with similar characteristics to demonstrate the algorithms.
Labels
a = input mixture x
b = ground truth voice for x
c = extracted voice from x
d = input mixture y
e = ground truth voice for y
f = extracted voice from y
g = input mixture z
h = ground truth voice for z
i = extracted voice from z
Runtime Data
Submission Code | Runtime (hh) |
---|---|
GW1 | 24 |
HKHS1 | 06 |
HKHS2 | 06 |
HKHS3 | 06 |
IIY1 | 02 |
IIY2 | 02 |
JL1 | 01 |
LFR1 | 03 |
RNA1 | 06 |
RP1 | 01 |
YC1 | 13 |