MIREX Wiki - User contributions [en]

2019:Music Detection Results

2019-10-30T13:55:58Z

Blai Melendez-Catalan: /* Event-level Evaluation */

2019:Music Detection Results

2019-10-30T13:27:00Z

Blai Melendez-Catalan: /* Task 2: Relative Music Loudness Estimation */

2019:Music Detection Results

2019-10-30T13:26:01Z

Blai Melendez-Catalan: /* Event-level Evaluation */

2019:Music Detection Results

2019-10-30T13:23:27Z

Blai Melendez-Catalan: /* Event-level Evaluation */

2019:Music Detection Results

2019-10-30T12:50:36Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

2019:Music Detection Results

2019-10-30T12:43:33Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

2019:Music Detection Results

2019-10-30T12:27:16Z

Blai Melendez-Catalan:

2019:Music Detection Results

2019-10-30T10:37:36Z

Blai Melendez-Catalan: Created page with "==Introduction== These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the 20..."

2019:Music Detection

2019-08-30T08:10:20Z

Blai Melendez-Catalan: /* Datasets */

==Description==

Music detection refers to the task of finding music segments in an audio file. The two main applications of music detection algorithms are (1) the automatic indexing and retrieving of auditory information based on its audio content, and (2) the monitoring of music for copyright management. Additionally, the detection of music can be applied as an intermediate step to improve the performance of algorithms designed for other purposes.

Regarding the application of music detection algorithms to the copyright management, the industry is lately becoming more and more interested in not only detecting the presence of music but also estimating if it appears in the foreground (as the main focus of attention) or in the background. In this scenario, the music detection task falls short as we need to estimate the loudness of music in relation to other simultaneous non-music sounds, i.e., its relative loudness. This is why we propose a second task that we name Music Relative Loudness Estimation. We define this second task as the task of finding music segments in and audio file and classifying them into foreground or background music.

==Tasks==

===Music Detection===

The music detection sub-task consists in finding segments of music in a signal. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: music (and non-music)

===Music Relative Loudness Estimation===

The music relative loudness estimation sub-task consists in finding segments of one of the following two classes: foreground music and background music. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: fg-music, bg-music (and non-music)

==Datasets==

===Available Training Datasets===

These resources may be a good starting point for participants.

GTZAN Speech and Music Dataset
http://opihi.cs.uvic.ca/sound/music_speech.tar.gz

Scheirer & Slaney Music Speech Corpus
http://www.ee.columbia.edu/~dpwe/sounds/musp/scheislan.html

MUSAN Corpus
http://www.openslr.org/17/

Muspeak Speech and Music Detection Dataset
http://mirg.city.ac.uk/datasets/muspeak/muspeak-mirex2015-detection-examples.zip

Music detection dataset:
www.seyerlehner.info/download/music_detection_dataset_dafx_07.zip
(Ask the author for the password)

Open Broadcast Media Audio from TV:
https://zenodo.org/record/3381249

===Evaluation Dataset===

====Content====

The evaluation dataset consists of 2987 1-minute, stereo excerpts at 22050 Hz extracted from programs from France (753), Germany (760), Spain (723) and the United States (751).

====Annotation====

The evaluation dataset has been cross-annotated by 3 annotators using a 6-class taxonomy: ''Music'', ''Foreground Music'', ''Similar'', ''Background Music'', ''Low Background Music'', and ''No Music'' as done in the OpenBMAT dataset, which can be used for training.

==Evaluation==

In the literature we find two ways of measuring the performance of an algorithm depending on the way we compare the ground truth with an algorithm's estimation: the segment-level evaluation and the event-level evaluation. We will report the statistics for each of these evaluations by file and for the whole dataset. We will do that for each algorithm and dataset.

===Segment-level evaluation:===

In the segment-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in segments of 10 ms. We first compute the intermediate statistics for each class C, which include:
* True Positives (TPc): ref segment’s class = C & est segment’s class = C
* False Positives (FPc): ref segment’s class != C & est segment’s class = C
* True Negatives (TNc): ref segment’s class != C & est segment’s class != C
* False Negatives (FNc): ref segment’s class = C & est segment’s class != C

Then we report class-wise Precision, Recall and F-measure.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)

As well as the overall Accuracy:
* Accuracy = (TP + TN) / (TP + TN + FP + FN)

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c

===Event-level evaluation:===

In the event-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in terms of events. Each annotated segment of the ground truth is considered and event. We first compute the intermediate statistics for the onsets and offsets of each class C, which include:
* True Positives (TPc): an est event of class = C that starts and ends at the same temporal positions as a ref event of class = C, taking into account a tolerance time-window.
* False Positives (FPc): an est event of class = C that starts and ends at temporal positions where no ref event of class = C does, taking into account a tolerance time-window.
* False Negatives (FNc): a ref event of class = C that starts and ends at temporal positions where no est event of class = C does, taking into account a tolerance time-window.

Then we report class-wise Precision, Recall, F-measure, Deletion Rate, Insertion Rate and Error Rate.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)
* Deletion Rate (Dc) = FNc / Nc
* Insertion Rate (Ic) = FPc / Nc
* Error Rate (Ec) = Dc + Ic

Where:
* Nc is the number of ref events of class = C.

We also report the overall version of these statistics:
* Precision (P) = TP / (TP + FP)
* Recall (R) = TP / (TP + FN)
* F-measure (F) = 2 * P * R / (P + R)
* Deletion Rate (D) = FN / N
* Insertion Rate (I) = FP / N
* Error Rate (E) = D + I

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c
* N is the number of ref events.

Different tolerance time-windows will be used: +/- 1000 ms, +/- 500 ms, +/- 200 ms, +/- 100 ms.

===Other evaluated features===

The execution time of each algorithm will also be reported.

==Submission Format==

===Command line calling format===

Submissions have to conform to the specified format below:

Music Detection: ''doMusicDetection path/to/file.wav path/to/output/file.mud ''

Music Relative Loudness Estimation: ''doMusicRelLoudEstimation path/to/file.wav path/to/output/file.mrle ''

where:
* path/to/file.wav: Path to the input audio file.
* path/to/output/file.*: Path to the output file.

Programs can use their working directory if they need to keep temporary cache files or internal debugging info. Stdout and stderr will be logged.

===I/O format===

For each detected segment, the file should include a row containing the onset (seconds), offset (seconds) and the class separated by a tab. Rows should be ordered by onset time:

''onset1 offset1 class1''
''onset2 offset2 class2''
''... ... ...''

(note that events in the case of music and speech detection can overlap)

===Packaging submissions===

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed) and include a README file including the following the information:
* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Potential Participants ==
name/email

Blai Meléndez-Catalán, bmelendez … bmat.com
----

==Time and hardware limits==

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions are specified.
A hard limit of 72 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.

==Submission closing date==

September 30th 2019

==Task specific mailing list==

All discussions on this task will take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have a question or comment, simply include the task name in the subject heading.

2019:Music Detection

2019-08-26T14:37:22Z

Blai Melendez-Catalan: /* Submission closing date */

==Description==

Music detection refers to the task of finding music segments in an audio file. The two main applications of music detection algorithms are (1) the automatic indexing and retrieving of auditory information based on its audio content, and (2) the monitoring of music for copyright management. Additionally, the detection of music can be applied as an intermediate step to improve the performance of algorithms designed for other purposes.

Regarding the application of music detection algorithms to the copyright management, the industry is lately becoming more and more interested in not only detecting the presence of music but also estimating if it appears in the foreground (as the main focus of attention) or in the background. In this scenario, the music detection task falls short as we need to estimate the loudness of music in relation to other simultaneous non-music sounds, i.e., its relative loudness. This is why we propose a second task that we name Music Relative Loudness Estimation. We define this second task as the task of finding music segments in and audio file and classifying them into foreground or background music.

==Tasks==

===Music Detection===

The music detection sub-task consists in finding segments of music in a signal. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: music (and non-music)

===Music Relative Loudness Estimation===

The music relative loudness estimation sub-task consists in finding segments of one of the following two classes: foreground music and background music. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: fg-music, bg-music (and non-music)

==Datasets==

===Available Training Datasets===

These resources may be a good starting point for participants.

GTZAN Speech and Music Dataset
http://opihi.cs.uvic.ca/sound/music_speech.tar.gz

Scheirer & Slaney Music Speech Corpus
http://www.ee.columbia.edu/~dpwe/sounds/musp/scheislan.html

MUSAN Corpus
http://www.openslr.org/17/

Muspeak Speech and Music Detection Dataset
http://mirg.city.ac.uk/datasets/muspeak/muspeak-mirex2015-detection-examples.zip

Music detection dataset:
www.seyerlehner.info/download/music_detection_dataset_dafx_07.zip
(Ask the author for the password)

Open Broadcast Media Audio from TV:
The link will be available soon.

===Evaluation Dataset===

====Content====

The evaluation dataset consists of 2987 1-minute, stereo excerpts at 22050 Hz extracted from programs from France (753), Germany (760), Spain (723) and the United States (751).

====Annotation====

The evaluation dataset has been cross-annotated by 3 annotators using a 6-class taxonomy: ''Music'', ''Foreground Music'', ''Similar'', ''Background Music'', ''Low Background Music'', and ''No Music'' as done in the OpenBMAT dataset, which can be used for training.

==Evaluation==

In the literature we find two ways of measuring the performance of an algorithm depending on the way we compare the ground truth with an algorithm's estimation: the segment-level evaluation and the event-level evaluation. We will report the statistics for each of these evaluations by file and for the whole dataset. We will do that for each algorithm and dataset.

===Segment-level evaluation:===

In the segment-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in segments of 10 ms. We first compute the intermediate statistics for each class C, which include:
* True Positives (TPc): ref segment’s class = C & est segment’s class = C
* False Positives (FPc): ref segment’s class != C & est segment’s class = C
* True Negatives (TNc): ref segment’s class != C & est segment’s class != C
* False Negatives (FNc): ref segment’s class = C & est segment’s class != C

Then we report class-wise Precision, Recall and F-measure.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)

As well as the overall Accuracy:
* Accuracy = (TP + TN) / (TP + TN + FP + FN)

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c

===Event-level evaluation:===

In the event-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in terms of events. Each annotated segment of the ground truth is considered and event. We first compute the intermediate statistics for the onsets and offsets of each class C, which include:
* True Positives (TPc): an est event of class = C that starts and ends at the same temporal positions as a ref event of class = C, taking into account a tolerance time-window.
* False Positives (FPc): an est event of class = C that starts and ends at temporal positions where no ref event of class = C does, taking into account a tolerance time-window.
* False Negatives (FNc): a ref event of class = C that starts and ends at temporal positions where no est event of class = C does, taking into account a tolerance time-window.

Then we report class-wise Precision, Recall, F-measure, Deletion Rate, Insertion Rate and Error Rate.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)
* Deletion Rate (Dc) = FNc / Nc
* Insertion Rate (Ic) = FPc / Nc
* Error Rate (Ec) = Dc + Ic

Where:
* Nc is the number of ref events of class = C.

We also report the overall version of these statistics:
* Precision (P) = TP / (TP + FP)
* Recall (R) = TP / (TP + FN)
* F-measure (F) = 2 * P * R / (P + R)
* Deletion Rate (D) = FN / N
* Insertion Rate (I) = FP / N
* Error Rate (E) = D + I

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c
* N is the number of ref events.

Different tolerance time-windows will be used: +/- 1000 ms, +/- 500 ms, +/- 200 ms, +/- 100 ms.

===Other evaluated features===

The execution time of each algorithm will also be reported.

==Submission Format==

===Command line calling format===

Submissions have to conform to the specified format below:

Music Detection: ''doMusicDetection path/to/file.wav path/to/output/file.mud ''

Music Relative Loudness Estimation: ''doMusicRelLoudEstimation path/to/file.wav path/to/output/file.mrle ''

where:
* path/to/file.wav: Path to the input audio file.
* path/to/output/file.*: Path to the output file.

Programs can use their working directory if they need to keep temporary cache files or internal debugging info. Stdout and stderr will be logged.

===I/O format===

For each detected segment, the file should include a row containing the onset (seconds), offset (seconds) and the class separated by a tab. Rows should be ordered by onset time:

''onset1 offset1 class1''
''onset2 offset2 class2''
''... ... ...''

(note that events in the case of music and speech detection can overlap)

===Packaging submissions===

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed) and include a README file including the following the information:
* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Potential Participants ==
name/email

Blai Meléndez-Catalán, bmelendez … bmat.com
----

==Time and hardware limits==

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions are specified.
A hard limit of 72 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.

==Submission closing date==

September 30th 2019

==Task specific mailing list==

All discussions on this task will take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have a question or comment, simply include the task name in the subject heading.

2019:Music Detection

2019-08-26T14:36:41Z

Blai Melendez-Catalan: /* References */

==Description==

Music detection refers to the task of finding music segments in an audio file. The two main applications of music detection algorithms are (1) the automatic indexing and retrieving of auditory information based on its audio content, and (2) the monitoring of music for copyright management. Additionally, the detection of music can be applied as an intermediate step to improve the performance of algorithms designed for other purposes.

Regarding the application of music detection algorithms to the copyright management, the industry is lately becoming more and more interested in not only detecting the presence of music but also estimating if it appears in the foreground (as the main focus of attention) or in the background. In this scenario, the music detection task falls short as we need to estimate the loudness of music in relation to other simultaneous non-music sounds, i.e., its relative loudness. This is why we propose a second task that we name Music Relative Loudness Estimation. We define this second task as the task of finding music segments in and audio file and classifying them into foreground or background music.

==Tasks==

===Music Detection===

The music detection sub-task consists in finding segments of music in a signal. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: music (and non-music)

===Music Relative Loudness Estimation===

The music relative loudness estimation sub-task consists in finding segments of one of the following two classes: foreground music and background music. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: fg-music, bg-music (and non-music)

==Datasets==

===Available Training Datasets===

These resources may be a good starting point for participants.

GTZAN Speech and Music Dataset
http://opihi.cs.uvic.ca/sound/music_speech.tar.gz

Scheirer & Slaney Music Speech Corpus
http://www.ee.columbia.edu/~dpwe/sounds/musp/scheislan.html

MUSAN Corpus
http://www.openslr.org/17/

Muspeak Speech and Music Detection Dataset
http://mirg.city.ac.uk/datasets/muspeak/muspeak-mirex2015-detection-examples.zip

Music detection dataset:
www.seyerlehner.info/download/music_detection_dataset_dafx_07.zip
(Ask the author for the password)

Open Broadcast Media Audio from TV:
The link will be available soon.

===Evaluation Dataset===

====Content====

The evaluation dataset consists of 2987 1-minute, stereo excerpts at 22050 Hz extracted from programs from France (753), Germany (760), Spain (723) and the United States (751).

====Annotation====

The evaluation dataset has been cross-annotated by 3 annotators using a 6-class taxonomy: ''Music'', ''Foreground Music'', ''Similar'', ''Background Music'', ''Low Background Music'', and ''No Music'' as done in the OpenBMAT dataset, which can be used for training.

==Evaluation==

In the literature we find two ways of measuring the performance of an algorithm depending on the way we compare the ground truth with an algorithm's estimation: the segment-level evaluation and the event-level evaluation. We will report the statistics for each of these evaluations by file and for the whole dataset. We will do that for each algorithm and dataset.

===Segment-level evaluation:===

In the segment-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in segments of 10 ms. We first compute the intermediate statistics for each class C, which include:
* True Positives (TPc): ref segment’s class = C & est segment’s class = C
* False Positives (FPc): ref segment’s class != C & est segment’s class = C
* True Negatives (TNc): ref segment’s class != C & est segment’s class != C
* False Negatives (FNc): ref segment’s class = C & est segment’s class != C

Then we report class-wise Precision, Recall and F-measure.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)

As well as the overall Accuracy:
* Accuracy = (TP + TN) / (TP + TN + FP + FN)

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c

===Event-level evaluation:===

In the event-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in terms of events. Each annotated segment of the ground truth is considered and event. We first compute the intermediate statistics for the onsets and offsets of each class C, which include:
* True Positives (TPc): an est event of class = C that starts and ends at the same temporal positions as a ref event of class = C, taking into account a tolerance time-window.
* False Positives (FPc): an est event of class = C that starts and ends at temporal positions where no ref event of class = C does, taking into account a tolerance time-window.
* False Negatives (FNc): a ref event of class = C that starts and ends at temporal positions where no est event of class = C does, taking into account a tolerance time-window.

Then we report class-wise Precision, Recall, F-measure, Deletion Rate, Insertion Rate and Error Rate.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)
* Deletion Rate (Dc) = FNc / Nc
* Insertion Rate (Ic) = FPc / Nc
* Error Rate (Ec) = Dc + Ic

Where:
* Nc is the number of ref events of class = C.

We also report the overall version of these statistics:
* Precision (P) = TP / (TP + FP)
* Recall (R) = TP / (TP + FN)
* F-measure (F) = 2 * P * R / (P + R)
* Deletion Rate (D) = FN / N
* Insertion Rate (I) = FP / N
* Error Rate (E) = D + I

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c
* N is the number of ref events.

Different tolerance time-windows will be used: +/- 1000 ms, +/- 500 ms, +/- 200 ms, +/- 100 ms.

===Other evaluated features===

The execution time of each algorithm will also be reported.

==Submission Format==

===Command line calling format===

Submissions have to conform to the specified format below:

Music Detection: ''doMusicDetection path/to/file.wav path/to/output/file.mud ''

Music Relative Loudness Estimation: ''doMusicRelLoudEstimation path/to/file.wav path/to/output/file.mrle ''

where:
* path/to/file.wav: Path to the input audio file.
* path/to/output/file.*: Path to the output file.

Programs can use their working directory if they need to keep temporary cache files or internal debugging info. Stdout and stderr will be logged.

===I/O format===

For each detected segment, the file should include a row containing the onset (seconds), offset (seconds) and the class separated by a tab. Rows should be ordered by onset time:

''onset1 offset1 class1''
''onset2 offset2 class2''
''... ... ...''

(note that events in the case of music and speech detection can overlap)

===Packaging submissions===

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed) and include a README file including the following the information:
* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Potential Participants ==
name/email

Blai Meléndez-Catalán, bmelendez … bmat.com
----

==Time and hardware limits==

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions are specified.
A hard limit of 72 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.

==Submission closing date==

TBD

==Task specific mailing list==

All discussions on this task will take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have a question or comment, simply include the task name in the subject heading.

2019:Music Detection

2019-08-26T14:28:56Z

Blai Melendez-Catalan: /* Annotation */

==Description==

Music detection refers to the task of finding music segments in an audio file. The two main applications of music detection algorithms are (1) the automatic indexing and retrieving of auditory information based on its audio content, and (2) the monitoring of music for copyright management. Additionally, the detection of music can be applied as an intermediate step to improve the performance of algorithms designed for other purposes.

Regarding the application of music detection algorithms to the copyright management, the industry is lately becoming more and more interested in not only detecting the presence of music but also estimating if it appears in the foreground (as the main focus of attention) or in the background. In this scenario, the music detection task falls short as we need to estimate the loudness of music in relation to other simultaneous non-music sounds, i.e., its relative loudness. This is why we propose a second task that we name Music Relative Loudness Estimation. We define this second task as the task of finding music segments in and audio file and classifying them into foreground or background music.

==Tasks==

===Music Detection===

The music detection sub-task consists in finding segments of music in a signal. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: music (and non-music)

===Music Relative Loudness Estimation===

The music relative loudness estimation sub-task consists in finding segments of one of the following two classes: foreground music and background music. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: fg-music, bg-music (and non-music)

==Datasets==

===Available Training Datasets===

These resources may be a good starting point for participants.

GTZAN Speech and Music Dataset
http://opihi.cs.uvic.ca/sound/music_speech.tar.gz

Scheirer & Slaney Music Speech Corpus
http://www.ee.columbia.edu/~dpwe/sounds/musp/scheislan.html

MUSAN Corpus
http://www.openslr.org/17/

Muspeak Speech and Music Detection Dataset
http://mirg.city.ac.uk/datasets/muspeak/muspeak-mirex2015-detection-examples.zip

Music detection dataset:
www.seyerlehner.info/download/music_detection_dataset_dafx_07.zip
(Ask the author for the password)

Open Broadcast Media Audio from TV:
The link will be available soon.

===Evaluation Dataset===

====Content====

The evaluation dataset consists of 2987 1-minute, stereo excerpts at 22050 Hz extracted from programs from France (753), Germany (760), Spain (723) and the United States (751).

====Annotation====

The evaluation dataset has been cross-annotated by 3 annotators using a 6-class taxonomy: ''Music'', ''Foreground Music'', ''Similar'', ''Background Music'', ''Low Background Music'', and ''No Music'' as done in the OpenBMAT dataset, which can be used for training.

==Evaluation==

In the literature we find two ways of measuring the performance of an algorithm depending on the way we compare the ground truth with an algorithm's estimation: the segment-level evaluation and the event-level evaluation. We will report the statistics for each of these evaluations by file and for the whole dataset. We will do that for each algorithm and dataset.

===Segment-level evaluation:===

In the segment-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in segments of 10 ms. We first compute the intermediate statistics for each class C, which include:
* True Positives (TPc): ref segment’s class = C & est segment’s class = C
* False Positives (FPc): ref segment’s class != C & est segment’s class = C
* True Negatives (TNc): ref segment’s class != C & est segment’s class != C
* False Negatives (FNc): ref segment’s class = C & est segment’s class != C

Then we report class-wise Precision, Recall and F-measure.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)

As well as the overall Accuracy:
* Accuracy = (TP + TN) / (TP + TN + FP + FN)

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c

===Event-level evaluation:===

In the event-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in terms of events. Each annotated segment of the ground truth is considered and event. We first compute the intermediate statistics for the onsets and offsets of each class C, which include:
* True Positives (TPc): an est event of class = C that starts and ends at the same temporal positions as a ref event of class = C, taking into account a tolerance time-window.
* False Positives (FPc): an est event of class = C that starts and ends at temporal positions where no ref event of class = C does, taking into account a tolerance time-window.
* False Negatives (FNc): a ref event of class = C that starts and ends at temporal positions where no est event of class = C does, taking into account a tolerance time-window.

Then we report class-wise Precision, Recall, F-measure, Deletion Rate, Insertion Rate and Error Rate.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)
* Deletion Rate (Dc) = FNc / Nc
* Insertion Rate (Ic) = FPc / Nc
* Error Rate (Ec) = Dc + Ic

Where:
* Nc is the number of ref events of class = C.

We also report the overall version of these statistics:
* Precision (P) = TP / (TP + FP)
* Recall (R) = TP / (TP + FN)
* F-measure (F) = 2 * P * R / (P + R)
* Deletion Rate (D) = FN / N
* Insertion Rate (I) = FP / N
* Error Rate (E) = D + I

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c
* N is the number of ref events.

Different tolerance time-windows will be used: +/- 1000 ms, +/- 500 ms, +/- 200 ms, +/- 100 ms.

===Other evaluated features===

The execution time of each algorithm will also be reported.

==Submission Format==

===Command line calling format===

Submissions have to conform to the specified format below:

Music Detection: ''doMusicDetection path/to/file.wav path/to/output/file.mud ''

Music Relative Loudness Estimation: ''doMusicRelLoudEstimation path/to/file.wav path/to/output/file.mrle ''

where:
* path/to/file.wav: Path to the input audio file.
* path/to/output/file.*: Path to the output file.

Programs can use their working directory if they need to keep temporary cache files or internal debugging info. Stdout and stderr will be logged.

===I/O format===

For each detected segment, the file should include a row containing the onset (seconds), offset (seconds) and the class separated by a tab. Rows should be ordered by onset time:

''onset1 offset1 class1''
''onset2 offset2 class2''
''... ... ...''

(note that events in the case of music and speech detection can overlap)

===Packaging submissions===

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed) and include a README file including the following the information:
* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Potential Participants ==
name/email

Blai Meléndez-Catalán, bmelendez … bmat.com
----

==Time and hardware limits==

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions are specified.
A hard limit of 72 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.

==Submission closing date==

TBD

==Task specific mailing list==

All discussions on this task will take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have a question or comment, simply include the task name in the subject heading.

==References==
{{Reflist}}

2019:Music Detection

2019-08-26T14:28:33Z

Blai Melendez-Catalan:

==Description==

Music detection refers to the task of finding music segments in an audio file. The two main applications of music detection algorithms are (1) the automatic indexing and retrieving of auditory information based on its audio content, and (2) the monitoring of music for copyright management. Additionally, the detection of music can be applied as an intermediate step to improve the performance of algorithms designed for other purposes.

Regarding the application of music detection algorithms to the copyright management, the industry is lately becoming more and more interested in not only detecting the presence of music but also estimating if it appears in the foreground (as the main focus of attention) or in the background. In this scenario, the music detection task falls short as we need to estimate the loudness of music in relation to other simultaneous non-music sounds, i.e., its relative loudness. This is why we propose a second task that we name Music Relative Loudness Estimation. We define this second task as the task of finding music segments in and audio file and classifying them into foreground or background music.

==Tasks==

===Music Detection===

The music detection sub-task consists in finding segments of music in a signal. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: music (and non-music)

===Music Relative Loudness Estimation===

The music relative loudness estimation sub-task consists in finding segments of one of the following two classes: foreground music and background music. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: fg-music, bg-music (and non-music)

==Datasets==

===Available Training Datasets===

These resources may be a good starting point for participants.

GTZAN Speech and Music Dataset
http://opihi.cs.uvic.ca/sound/music_speech.tar.gz

Scheirer & Slaney Music Speech Corpus
http://www.ee.columbia.edu/~dpwe/sounds/musp/scheislan.html

MUSAN Corpus
http://www.openslr.org/17/

Muspeak Speech and Music Detection Dataset
http://mirg.city.ac.uk/datasets/muspeak/muspeak-mirex2015-detection-examples.zip

Music detection dataset:
www.seyerlehner.info/download/music_detection_dataset_dafx_07.zip
(Ask the author for the password)

Open Broadcast Media Audio from TV:
The link will be available soon.

===Evaluation Dataset===

====Content====

The evaluation dataset consists of 2987 1-minute, stereo excerpts at 22050 Hz extracted from programs from France (753), Germany (760), Spain (723) and the United States (751).

====Annotation====

The evaluation dataset has been cross-annotated by 3 annotators using a 6-class taxonomy: ''Music'', ''Foreground Music'', ''Similar'', ''Background Music'', ''Low Background Music'', and ''No Music'' as done in the OpenBMAT dataset<ref>Meléndez-Catalán, B., Molina, E., & Gómez, E. (2019). Open Broadcast Media Audio from TV: A Dataset of TV Broadcast Audio with Relative Music Loudness Annotations. Transactions of the International Society for Music Information Retrieval, 2(1), pp. 43–51. DOI: https://doi.org/10.5334/tismir.29</ref>, which can be used for training.

==Evaluation==

In the literature we find two ways of measuring the performance of an algorithm depending on the way we compare the ground truth with an algorithm's estimation: the segment-level evaluation and the event-level evaluation. We will report the statistics for each of these evaluations by file and for the whole dataset. We will do that for each algorithm and dataset.

===Segment-level evaluation:===

In the segment-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in segments of 10 ms. We first compute the intermediate statistics for each class C, which include:
* True Positives (TPc): ref segment’s class = C & est segment’s class = C
* False Positives (FPc): ref segment’s class != C & est segment’s class = C
* True Negatives (TNc): ref segment’s class != C & est segment’s class != C
* False Negatives (FNc): ref segment’s class = C & est segment’s class != C

Then we report class-wise Precision, Recall and F-measure.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)

As well as the overall Accuracy:
* Accuracy = (TP + TN) / (TP + TN + FP + FN)

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c

===Event-level evaluation:===

In the event-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in terms of events. Each annotated segment of the ground truth is considered and event. We first compute the intermediate statistics for the onsets and offsets of each class C, which include:
* True Positives (TPc): an est event of class = C that starts and ends at the same temporal positions as a ref event of class = C, taking into account a tolerance time-window.
* False Positives (FPc): an est event of class = C that starts and ends at temporal positions where no ref event of class = C does, taking into account a tolerance time-window.
* False Negatives (FNc): a ref event of class = C that starts and ends at temporal positions where no est event of class = C does, taking into account a tolerance time-window.

Then we report class-wise Precision, Recall, F-measure, Deletion Rate, Insertion Rate and Error Rate.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)
* Deletion Rate (Dc) = FNc / Nc
* Insertion Rate (Ic) = FPc / Nc
* Error Rate (Ec) = Dc + Ic

Where:
* Nc is the number of ref events of class = C.

We also report the overall version of these statistics:
* Precision (P) = TP / (TP + FP)
* Recall (R) = TP / (TP + FN)
* F-measure (F) = 2 * P * R / (P + R)
* Deletion Rate (D) = FN / N
* Insertion Rate (I) = FP / N
* Error Rate (E) = D + I

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c
* N is the number of ref events.

Different tolerance time-windows will be used: +/- 1000 ms, +/- 500 ms, +/- 200 ms, +/- 100 ms.

===Other evaluated features===

The execution time of each algorithm will also be reported.

==Submission Format==

===Command line calling format===

Submissions have to conform to the specified format below:

Music Detection: ''doMusicDetection path/to/file.wav path/to/output/file.mud ''

Music Relative Loudness Estimation: ''doMusicRelLoudEstimation path/to/file.wav path/to/output/file.mrle ''

where:
* path/to/file.wav: Path to the input audio file.
* path/to/output/file.*: Path to the output file.

Programs can use their working directory if they need to keep temporary cache files or internal debugging info. Stdout and stderr will be logged.

===I/O format===

For each detected segment, the file should include a row containing the onset (seconds), offset (seconds) and the class separated by a tab. Rows should be ordered by onset time:

''onset1 offset1 class1''
''onset2 offset2 class2''
''... ... ...''

(note that events in the case of music and speech detection can overlap)

===Packaging submissions===

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed) and include a README file including the following the information:
* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Potential Participants ==
name/email

Blai Meléndez-Catalán, bmelendez … bmat.com
----

==Time and hardware limits==

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions are specified.
A hard limit of 72 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.

==Submission closing date==

TBD

==Task specific mailing list==

All discussions on this task will take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have a question or comment, simply include the task name in the subject heading.

==References==
{{Reflist}}

2019:Music Detection

2019-08-26T14:16:17Z

Blai Melendez-Catalan: /* Annotation */

==Description==

Music detection refers to the task of finding music segments in an audio file. The two main applications of music detection algorithms are (1) the automatic indexing and retrieving of auditory information based on its audio content, and (2) the monitoring of music for copyright management. Additionally, the detection of music can be applied as an intermediate step to improve the performance of algorithms designed for other purposes.

Regarding the application of music detection algorithms to the copyright management, the industry is lately becoming more and more interested in not only detecting the presence of music but also estimating if it appears in the foreground (as the main focus of attention) or in the background. In this scenario, the music detection task falls short as we need to estimate the loudness of music in relation to other simultaneous non-music sounds, i.e., its relative loudness. This is why we propose a second task that we name Music Relative Loudness Estimation. We define this second task as the task of finding music segments in and audio file and classifying them into foreground or background music.

==Tasks==

===Music Detection===

The music detection sub-task consists in finding segments of music in a signal. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: music (and non-music)

===Music Relative Loudness Estimation===

The music relative loudness estimation sub-task consists in finding segments of one of the following two classes: foreground music and background music. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: fg-music, bg-music (and non-music)

==Datasets==

===Available Training Datasets===

These resources may be a good starting point for participants.

GTZAN Speech and Music Dataset
http://opihi.cs.uvic.ca/sound/music_speech.tar.gz

Scheirer & Slaney Music Speech Corpus
http://www.ee.columbia.edu/~dpwe/sounds/musp/scheislan.html

MUSAN Corpus
http://www.openslr.org/17/

Muspeak Speech and Music Detection Dataset
http://mirg.city.ac.uk/datasets/muspeak/muspeak-mirex2015-detection-examples.zip

Music detection dataset:
www.seyerlehner.info/download/music_detection_dataset_dafx_07.zip
(Ask the author for the password)

Open Broadcast Media Audio from TV:
The link will be available soon.

===Evaluation Dataset===

====Content====

The evaluation dataset consists of 2987 1-minute, stereo excerpts at 22050 Hz extracted from programs from France (753), Germany (760), Spain (723) and the United States (751).

====Annotation====

The evaluation dataset has been cross-annotated by 3 annotators using a 6-class taxonomy: ''Music'', ''Foreground Music'', ''Similar'', ''Background Music'', ''Low Background Music'', and ''No Music'' as done in the OpenBMAT dataset, which can be used for training.

==Evaluation==

In the literature we find two ways of measuring the performance of an algorithm depending on the way we compare the ground truth with an algorithm's estimation: the segment-level evaluation and the event-level evaluation. We will report the statistics for each of these evaluations by file and for the whole dataset. We will do that for each algorithm and dataset.

===Segment-level evaluation:===

In the segment-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in segments of 10 ms. We first compute the intermediate statistics for each class C, which include:
* True Positives (TPc): ref segment’s class = C & est segment’s class = C
* False Positives (FPc): ref segment’s class != C & est segment’s class = C
* True Negatives (TNc): ref segment’s class != C & est segment’s class != C
* False Negatives (FNc): ref segment’s class = C & est segment’s class != C

Then we report class-wise Precision, Recall and F-measure.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)

As well as the overall Accuracy:
* Accuracy = (TP + TN) / (TP + TN + FP + FN)

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c

===Event-level evaluation:===

In the event-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in terms of events. Each annotated segment of the ground truth is considered and event. We first compute the intermediate statistics for the onsets and offsets of each class C, which include:
* True Positives (TPc): an est event of class = C that starts and ends at the same temporal positions as a ref event of class = C, taking into account a tolerance time-window.
* False Positives (FPc): an est event of class = C that starts and ends at temporal positions where no ref event of class = C does, taking into account a tolerance time-window.
* False Negatives (FNc): a ref event of class = C that starts and ends at temporal positions where no est event of class = C does, taking into account a tolerance time-window.

Then we report class-wise Precision, Recall, F-measure, Deletion Rate, Insertion Rate and Error Rate.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)
* Deletion Rate (Dc) = FNc / Nc
* Insertion Rate (Ic) = FPc / Nc
* Error Rate (Ec) = Dc + Ic

Where:
* Nc is the number of ref events of class = C.

We also report the overall version of these statistics:
* Precision (P) = TP / (TP + FP)
* Recall (R) = TP / (TP + FN)
* F-measure (F) = 2 * P * R / (P + R)
* Deletion Rate (D) = FN / N
* Insertion Rate (I) = FP / N
* Error Rate (E) = D + I

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c
* N is the number of ref events.

Different tolerance time-windows will be used: +/- 1000 ms, +/- 500 ms, +/- 200 ms, +/- 100 ms.

===Other evaluated features===

The execution time of each algorithm will also be reported.

==Submission Format==

===Command line calling format===

Submissions have to conform to the specified format below:

Music Detection: ''doMusicDetection path/to/file.wav path/to/output/file.mud ''

Music Relative Loudness Estimation: ''doMusicRelLoudEstimation path/to/file.wav path/to/output/file.mrle ''

where:
* path/to/file.wav: Path to the input audio file.
* path/to/output/file.*: Path to the output file.

Programs can use their working directory if they need to keep temporary cache files or internal debugging info. Stdout and stderr will be logged.

===I/O format===

For each detected segment, the file should include a row containing the onset (seconds), offset (seconds) and the class separated by a tab. Rows should be ordered by onset time:

''onset1 offset1 class1''
''onset2 offset2 class2''
''... ... ...''

(note that events in the case of music and speech detection can overlap)

===Packaging submissions===

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed) and include a README file including the following the information:
* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Potential Participants ==
name/email

Blai Meléndez-Catalán, bmelendez … bmat.com
----

==Time and hardware limits==

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions are specified.
A hard limit of 72 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.

==Submission closing date==

TBD

==Task specific mailing list==

All discussions on this task will take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have a question or comment, simply include the task name in the subject heading.

2019:Music Detection

2019-08-26T14:08:14Z

Blai Melendez-Catalan: /* Content */

==Description==

Music detection refers to the task of finding music segments in an audio file. The two main applications of music detection algorithms are (1) the automatic indexing and retrieving of auditory information based on its audio content, and (2) the monitoring of music for copyright management. Additionally, the detection of music can be applied as an intermediate step to improve the performance of algorithms designed for other purposes.

Regarding the application of music detection algorithms to the copyright management, the industry is lately becoming more and more interested in not only detecting the presence of music but also estimating if it appears in the foreground (as the main focus of attention) or in the background. In this scenario, the music detection task falls short as we need to estimate the loudness of music in relation to other simultaneous non-music sounds, i.e., its relative loudness. This is why we propose a second task that we name Music Relative Loudness Estimation. We define this second task as the task of finding music segments in and audio file and classifying them into foreground or background music.

==Tasks==

===Music Detection===

The music detection sub-task consists in finding segments of music in a signal. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: music (and non-music)

===Music Relative Loudness Estimation===

The music relative loudness estimation sub-task consists in finding segments of one of the following two classes: foreground music and background music. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: fg-music, bg-music (and non-music)

==Datasets==

===Available Training Datasets===

These resources may be a good starting point for participants.

GTZAN Speech and Music Dataset
http://opihi.cs.uvic.ca/sound/music_speech.tar.gz

Scheirer & Slaney Music Speech Corpus
http://www.ee.columbia.edu/~dpwe/sounds/musp/scheislan.html

MUSAN Corpus
http://www.openslr.org/17/

Muspeak Speech and Music Detection Dataset
http://mirg.city.ac.uk/datasets/muspeak/muspeak-mirex2015-detection-examples.zip

Music detection dataset:
www.seyerlehner.info/download/music_detection_dataset_dafx_07.zip
(Ask the author for the password)

Open Broadcast Media Audio from TV:
The link will be available soon.

===Evaluation Dataset===

====Content====

The evaluation dataset consists of 2987 1-minute, stereo excerpts at 22050 Hz extracted from programs from France (753), Germany (760), Spain (723) and the United States (751).

====Annotation====

TBD

==Evaluation==

In the literature we find two ways of measuring the performance of an algorithm depending on the way we compare the ground truth with an algorithm's estimation: the segment-level evaluation and the event-level evaluation. We will report the statistics for each of these evaluations by file and for the whole dataset. We will do that for each algorithm and dataset.

===Segment-level evaluation:===

In the segment-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in segments of 10 ms. We first compute the intermediate statistics for each class C, which include:
* True Positives (TPc): ref segment’s class = C & est segment’s class = C
* False Positives (FPc): ref segment’s class != C & est segment’s class = C
* True Negatives (TNc): ref segment’s class != C & est segment’s class != C
* False Negatives (FNc): ref segment’s class = C & est segment’s class != C

Then we report class-wise Precision, Recall and F-measure.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)

As well as the overall Accuracy:
* Accuracy = (TP + TN) / (TP + TN + FP + FN)

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c

===Event-level evaluation:===

In the event-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in terms of events. Each annotated segment of the ground truth is considered and event. We first compute the intermediate statistics for the onsets and offsets of each class C, which include:
* True Positives (TPc): an est event of class = C that starts and ends at the same temporal positions as a ref event of class = C, taking into account a tolerance time-window.
* False Positives (FPc): an est event of class = C that starts and ends at temporal positions where no ref event of class = C does, taking into account a tolerance time-window.
* False Negatives (FNc): a ref event of class = C that starts and ends at temporal positions where no est event of class = C does, taking into account a tolerance time-window.

Then we report class-wise Precision, Recall, F-measure, Deletion Rate, Insertion Rate and Error Rate.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)
* Deletion Rate (Dc) = FNc / Nc
* Insertion Rate (Ic) = FPc / Nc
* Error Rate (Ec) = Dc + Ic

Where:
* Nc is the number of ref events of class = C.

We also report the overall version of these statistics:
* Precision (P) = TP / (TP + FP)
* Recall (R) = TP / (TP + FN)
* F-measure (F) = 2 * P * R / (P + R)
* Deletion Rate (D) = FN / N
* Insertion Rate (I) = FP / N
* Error Rate (E) = D + I

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c
* N is the number of ref events.

Different tolerance time-windows will be used: +/- 1000 ms, +/- 500 ms, +/- 200 ms, +/- 100 ms.

===Other evaluated features===

The execution time of each algorithm will also be reported.

==Submission Format==

===Command line calling format===

Submissions have to conform to the specified format below:

Music Detection: ''doMusicDetection path/to/file.wav path/to/output/file.mud ''

Music Relative Loudness Estimation: ''doMusicRelLoudEstimation path/to/file.wav path/to/output/file.mrle ''

where:
* path/to/file.wav: Path to the input audio file.
* path/to/output/file.*: Path to the output file.

Programs can use their working directory if they need to keep temporary cache files or internal debugging info. Stdout and stderr will be logged.

===I/O format===

For each detected segment, the file should include a row containing the onset (seconds), offset (seconds) and the class separated by a tab. Rows should be ordered by onset time:

''onset1 offset1 class1''
''onset2 offset2 class2''
''... ... ...''

(note that events in the case of music and speech detection can overlap)

===Packaging submissions===

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed) and include a README file including the following the information:
* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Potential Participants ==
name/email

Blai Meléndez-Catalán, bmelendez … bmat.com
----

==Time and hardware limits==

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions are specified.
A hard limit of 72 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.

==Submission closing date==

TBD

==Task specific mailing list==

All discussions on this task will take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have a question or comment, simply include the task name in the subject heading.

2019:Music Detection

2019-05-24T15:12:42Z

Blai Melendez-Catalan:

==Description==

Music detection refers to the task of finding music segments in an audio file. The two main applications of music detection algorithms are (1) the automatic indexing and retrieving of auditory information based on its audio content, and (2) the monitoring of music for copyright management. Additionally, the detection of music can be applied as an intermediate step to improve the performance of algorithms designed for other purposes.

Regarding the application of music detection algorithms to the copyright management, the industry is lately becoming more and more interested in not only detecting the presence of music but also estimating if it appears in the foreground (as the main focus of attention) or in the background. In this scenario, the music detection task falls short as we need to estimate the loudness of music in relation to other simultaneous non-music sounds, i.e., its relative loudness. This is why we propose a second task that we name Music Relative Loudness Estimation. We define this second task as the task of finding music segments in and audio file and classifying them into foreground or background music.

==Tasks==

===Music Detection===

The music detection sub-task consists in finding segments of music in a signal. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: music (and non-music)

===Music Relative Loudness Estimation===

The music relative loudness estimation sub-task consists in finding segments of one of the following two classes: foreground music and background music. This task applies to complete recordings from archives. No assumptions are made about the number of segments present in each archive or about their duration.

classes: fg-music, bg-music (and non-music)

==Datasets==

===Available Training Datasets===

These resources may be a good starting point for participants.

GTZAN Speech and Music Dataset
http://opihi.cs.uvic.ca/sound/music_speech.tar.gz

Scheirer & Slaney Music Speech Corpus
http://www.ee.columbia.edu/~dpwe/sounds/musp/scheislan.html

MUSAN Corpus
http://www.openslr.org/17/

Muspeak Speech and Music Detection Dataset
http://mirg.city.ac.uk/datasets/muspeak/muspeak-mirex2015-detection-examples.zip

Music detection dataset:
www.seyerlehner.info/download/music_detection_dataset_dafx_07.zip
(Ask the author for the password)

Open Broadcast Media Audio from TV:
The link will be available soon.

===Evaluation Dataset===

====Content====

TBD

====Annotation====

TBD

==Evaluation==

In the literature we find two ways of measuring the performance of an algorithm depending on the way we compare the ground truth with an algorithm's estimation: the segment-level evaluation and the event-level evaluation. We will report the statistics for each of these evaluations by file and for the whole dataset. We will do that for each algorithm and dataset.

===Segment-level evaluation:===

In the segment-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in segments of 10 ms. We first compute the intermediate statistics for each class C, which include:
* True Positives (TPc): ref segment’s class = C & est segment’s class = C
* False Positives (FPc): ref segment’s class != C & est segment’s class = C
* True Negatives (TNc): ref segment’s class != C & est segment’s class != C
* False Negatives (FNc): ref segment’s class = C & est segment’s class != C

Then we report class-wise Precision, Recall and F-measure.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)

As well as the overall Accuracy:
* Accuracy = (TP + TN) / (TP + TN + FP + FN)

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c

===Event-level evaluation:===

In the event-level evaluation, we compare the estimation (est) produced by the algorithms with the reference (ref) in terms of events. Each annotated segment of the ground truth is considered and event. We first compute the intermediate statistics for the onsets and offsets of each class C, which include:
* True Positives (TPc): an est event of class = C that starts and ends at the same temporal positions as a ref event of class = C, taking into account a tolerance time-window.
* False Positives (FPc): an est event of class = C that starts and ends at temporal positions where no ref event of class = C does, taking into account a tolerance time-window.
* False Negatives (FNc): a ref event of class = C that starts and ends at temporal positions where no est event of class = C does, taking into account a tolerance time-window.

Then we report class-wise Precision, Recall, F-measure, Deletion Rate, Insertion Rate and Error Rate.
* Precision (Pc) = TPc / (TPc + FPc)
* Recall (Rc) = TPc / (TPc + FNc)
* F-measure (Fc) = 2 * Pc * Rc / (Pc + Rc)
* Deletion Rate (Dc) = FNc / Nc
* Insertion Rate (Ic) = FPc / Nc
* Error Rate (Ec) = Dc + Ic

Where:
* Nc is the number of ref events of class = C.

We also report the overall version of these statistics:
* Precision (P) = TP / (TP + FP)
* Recall (R) = TP / (TP + FN)
* F-measure (F) = 2 * P * R / (P + R)
* Deletion Rate (D) = FN / N
* Insertion Rate (I) = FP / N
* Error Rate (E) = D + I

Where:
* TP = sum(TPc), for every class c
* FP = sum(FPc), for every class c
* TN = sum(TNc), for every class c
* FN = sum(FNc), for every class c
* N is the number of ref events.

Different tolerance time-windows will be used: +/- 1000 ms, +/- 500 ms, +/- 200 ms, +/- 100 ms.

===Other evaluated features===

The execution time of each algorithm will also be reported.

==Submission Format==

===Command line calling format===

Submissions have to conform to the specified format below:

Music Detection: ''doMusicDetection path/to/file.wav path/to/output/file.mud ''

Music Relative Loudness Estimation: ''doMusicRelLoudEstimation path/to/file.wav path/to/output/file.mrle ''

where:
* path/to/file.wav: Path to the input audio file.
* path/to/output/file.*: Path to the output file.

Programs can use their working directory if they need to keep temporary cache files or internal debugging info. Stdout and stderr will be logged.

===I/O format===

For each detected segment, the file should include a row containing the onset (seconds), offset (seconds) and the class separated by a tab. Rows should be ordered by onset time:

''onset1 offset1 class1''
''onset2 offset2 class2''
''... ... ...''

(note that events in the case of music and speech detection can overlap)

===Packaging submissions===

All submissions should be statically linked to all libraries (the presence of dynamically linked libraries cannot be guaranteed) and include a README file including the following the information:
* Command line calling format for all executables and an example formatted set of commands
* Number of threads/cores used or whether this should be specified on the command line
* Expected memory footprint
* Expected runtime
* Any required environments (and versions), e.g. python, java, bash, matlab.

== Potential Participants ==
name/email

Blai Meléndez-Catalán, bmelendez … bmat.com
----

==Time and hardware limits==

Due to the potentially high number of participants in this and other audio tasks, hard limits on the runtime of submissions are specified.
A hard limit of 72 hours will be imposed on runs. Submissions that exceed this runtime may not receive a result.

==Submission closing date==

TBD

==Task specific mailing list==

All discussions on this task will take place on the MIREX [https://mail.lis.illinois.edu/mailman/listinfo/evalfest "EvalFest" list]. If you have a question or comment, simply include the task name in the subject heading.

2019:Main Page

2019-05-24T14:04:05Z

Blai Melendez-Catalan: /* MIREX 2019 Possible Evaluation Tasks */

==Welcome to MIREX 2019==

This is the main page for the 15th running of the Music Information Retrieval Evaluation eXchange (MIREX 2019). The International Music Information Retrieval Systems Evaluation Laboratory (IMIRSEL) at [https://ischool.illinois.edu School of Information Sciences], University of Illinois at Urbana-Champaign ([http://www.illinois.edu UIUC]) is the principal organizer of MIREX 2019.

The MIREX 2019 community will hold its annual meeting as part of [http://ismir2019.ewi.tudelft.nl The 20th International Society for Music Information Retrieval Conference], ISMIR 2019, which will be held in Delft, The Netherlands, November 4-8, 2019.

J. Stephen Downie 
Director, IMIRSEL 

==Task Leadership Model==

Like previous years, we are prepared to improve the distribution of tasks for the upcoming MIREX 2019. To do so, we really need leaders to help us organize and run each task.

To volunteer to lead a task, please complete the form [TBD]. Current information about task captains can be found on the [[2019:Task Captains]] page. Please direct any communication to the [https://lists.ischool.illinois.edu/lists/admin/evalfest EvalFest] mailing list.

What does it mean to lead a task?
* Update wiki pages as needed
* Communicate with submitters and troubleshooting submissions
* Execution and evaluation of submissions
* Publishing final results

Due to the proprietary nature of much of the data, the submission system, evaluation framework, and most of the datasets will continue to be hosted by IMIRSEL. However, we are prepared to provide access to task organizers to manage and run submissions on the IMIRSEL systems.

We really need leaders to help us this year!

==MIREX 2019 Deadline Dates==
* TBD

==MIREX 2019 Possible Evaluation Tasks==
* [[2019:Audio Classification (Train/Test) Tasks]], incorporating:
** Audio US Pop Genre Classification
** Audio Latin Genre Classification
** Audio Music Mood Classification
** Audio Classical Composer Identification
** [[2019:Audio K-POP Mood Classification]]
** [[2019:Audio K-POP Genre Classification]]
* [[2019:Audio Beat Tracking]]
* [[2019:Audio Chord Estimation]]
* [[2019:Audio Cover Song Identification]]
* [[2019:Audio Downbeat Estimation]]
* [[2019:Audio Key Detection]]
* [[2019:Audio Onset Detection]]
* [[2019:Audio Tempo Estimation]]
* [[2019:Automatic Lyrics-to-Audio Alignment]]
* [[2019:Drum Transcription]]
* [[2019:Multiple Fundamental Frequency Estimation & Tracking]]
* [[2019:Real-time Audio to Score Alignment (a.k.a Score Following)]]
* [[2019:Structural Segmentation]]
* [[2019:Discovery of Repeated Themes & Sections]]
* [[2019:Audio Fingerprinting]]
* [[2019:Set List Identification]]
* [[2019:Query by Singing/Humming]]
* [[2019:Singing Voice Separation]]
* [[2019:Audio Tag Classification]]
* [[2019:Audio Music Similarity and Retrieval]]
* [[2019:Symbolic Melodic Similarity]]
* [[2019:Audio Melody Extraction]]
* [[2019:Query by Tapping]]
* [[2019:Music Detection]]

==MIREX 2019 Submission Instructions==
* Be sure to read through the rest of this page
* Be sure to read though the task pages for which you are submitting
* Be sure to follow the [[2009:Best Coding Practices for MIREX | Best Coding Practices for MIREX]]
* Be sure to follow the [[MIREX 2019 Submission Instructions]] including both the tutorial video and the text
* The MIREX 2019 Submission System is coming soon at: https://www.music-ir.org/mirex/sub/ .

==MIREX 2019 Evaluation==

===Note to New Participants===
Please take the time to read the following review articles that explain the history and structure of MIREX.

Downie, J. Stephen (2008). The Music Information Retrieval Evaluation Exchange (2005-2007): 
A window into music information retrieval research.''Acoustical Science and Technology 29'' (4): 247-255. 
Available at: [http://dx.doi.org/10.1250/ast.29.247 http://dx.doi.org/10.1250/ast.29.247]

Downie, J. Stephen, Andreas F. Ehmann, Mert Bay and M. Cameron Jones. (2010). 
The Music Information Retrieval Evaluation eXchange: Some Observations and Insights. 
''Advances in Music Information Retrieval'' Vol. 274, pp. 93-115 
Available at: [http://bit.ly/KpM5u5 http://bit.ly/KpM5u5]

===Runtime Limits===

We reserve the right to stop any process that exceeds runtime limits for each task. We will do our best to notify you in enough time to allow revisions, but this may not be possible in some cases. Please respect the published runtime limits.

===Note to All Participants===

Because MIREX is premised upon the sharing of ideas and results, '''ALL''' MIREX participants are expected to:

# submit a DRAFT 2-3 page extended abstract PDF in the ISMIR format about the submitted program(s) to help us and the community better understand how the algorithm works when submitting their programme(s).
# submit a FINALIZED 2-3 page extended abstract PDF in the ISMIR format prior to ISMIR 2019 for posting on the respective results pages (sometimes the same abstract can be used for multiple submissions; in many cases the DRAFT and FINALIZED abstracts are the same)
# present a poster at the MIREX 2019 poster session at ISMIR 2019

===Software Dependency Requests===
If you have not submitted to MIREX before or are unsure whether IMIRSEL currently supports some of the software/architecture dependencies for your submission a [https://goo.gl/forms/96Wndw9j9dzv4x3c2 dependency request form is available]. Please submit details of your dependencies on this form and the IMIRSEL team will attempt to satisfy them for you.

Due to the high volume of submissions expected at MIREX 2019, submissions with difficulty to satisfy dependencies that the team has not been given sufficient notice of may result in the submission being rejected.

Finally, you will also be expected to detail your software/architecture dependencies in a README file to be provided to the submission system.

==Getting Involved in MIREX 2019==
MIREX is a community-based endeavour. Be a part of the community and help make MIREX 2019 the best yet.

===Mailing List Participation===
If you are interested in formal MIR evaluation, you should also subscribe to the "MIREX" (aka "EvalFest") mail list and participate in the community discussions about defining and running MIREX 2019 tasks. Subscription information at:
[https://mail.lis.illinois.edu/mailman/listinfo/evalfest EvalFest Central].

If you are participating in MIREX 2019, it is VERY IMPORTANT that you are subscribed to EvalFest. Deadlines, task updates and other important information will be announced via this mailing list. Please use the EvalFest for discussion of MIREX task proposals and other MIREX related issues. This wiki (MIREX 2019 wiki) will be used to embody and disseminate task proposals, however, task related discussions should be conducted on the MIREX organization mailing list (EvalFest) rather than on this wiki, but should be summarized here.

Where possible, definitions or example code for new evaluation metrics or tasks should be provided to the IMIRSEL team who will embody them in software as part of the NEMA analytics framework, which will be released to the community at or before ISMIR 2019 - providing a standardised set of interfaces and output to disciplined evaluation procedures for a great many MIR tasks.

===Wiki Participation===
If you find that you cannot edit a MIREX wiki page, you will need to create a new account via: [[Special:Userlogin]].

Please note that because of "spam-bots", MIREX wiki registration requests may be moderated by IMIRSEL members. It might take up to 24 hours for approval (Thank you for your patience!).

==MIREX 2005 - 2018 Wikis==
Content from MIREX 2005 - 2018 are available at:
'''[[2018:Main_Page|MIREX 2018]]'''
'''[[2017:Main_Page|MIREX 2017]]'''
'''[[2016:Main_Page|MIREX 2016]]'''
'''[[2015:Main_Page|MIREX 2015]]'''
'''[[2014:Main_Page|MIREX 2014]]'''
'''[[2013:Main_Page|MIREX 2013]]'''
'''[[2012:Main_Page|MIREX 2012]]'''
'''[[2011:Main_Page|MIREX 2011]]'''
'''[[2010:Main_Page|MIREX 2010]]'''
'''[[2009:Main_Page|MIREX 2009]]'''
'''[[2008:Main_Page|MIREX 2008]]'''
'''[[2007:Main_Page|MIREX 2007]]'''
'''[[2006:Main_Page|MIREX 2006]]'''
'''[[2005:Main_Page|MIREX 2005]]'''

2018:Music and or Speech Detection Results

2018-09-24T21:44:43Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| [https://www.music-ir.org/mirex/abstracts/2018/DD1.pdf PDF] || David Doukhan, Eliott Lechapt, Marc Evrard, Jean Carrive
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG3
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

Accuracy = segment-level accuracy

<class>_P = segment-level precision for the class <class>

<class>_R = segment-level recall for the class <class>

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| 0.8506 || 0.967 || 0.7134 || 0.8211 || 0.7866 || 0.9775 || 0.8717
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| 0.4403 || 0.1991 || 0.4973 || 0.2788
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.624 || 0.4082 || 0.4936 || 0.9683 || 0.6415 || 0.7718
|-
! MM1
| 0.8072 || 0.257 || 0.3899 || 0.8795 || 0.946 || 0.9115
|-
! MM2
| 0.857 || 0.4026 || 0.5478 || 0.8945 || 0.9241 || 0.909
|-
! MM3
| 0.9873 || 0.1856 || 0.3124 || 0.8383 || 0.9917 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! LN1(GAFMF)
| 0.0727 || 0.0197 || 0.0965 || 0.031 || 0.1918 || 0.0505 || 0.2637 || 0.0889
|-
! LN1(GAFSF)
| 0.0677 || 0.0145 || 0.0977 || 0.0266 || 0.2063 || 0.0524 || 0.2804 || 0.092
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_P
! width="80" | Fg-Music_R
! width="80" | Fg-Music_F
! width="80" | Bg-Music_P
! width="80" | Bg-Music_R
! width="80" | Bg-Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.8025 || 0.774 || 0.788 || 0.8211 || 0.821 || 0.821 || 0.9026 || 0.9103 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | No-Music_F_500_on
! width="80" | No-Music_F_500_onoff
! width="80" | No-Music_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-24T21:43:56Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| [https://www.music-ir.org/mirex/abstracts/2018/DD1.pdf PDF] || David Doukhan, Eliott Lechapt, Marc Evrard, Jean Carrive
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG3
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

Accuracy = segment-level accuracy

<class>_P = segment-level precision for the class <class>

<class>_R = segment-level recall for the class <class>

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| 0.8506 || 0.967 || 0.7134 || 0.8211 || 0.7866 || 0.9775 || 0.8717
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| 0.4403 || 0.1991 || 0.4973 || 0.2788
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.624 || 0.4082 || 0.4936 || 0.9683 || 0.6415 || 0.7718
|-
! MM1
| 0.8072 || 0.257 || 0.3899 || 0.8795 || 0.946 || 0.9115
|-
! MM2
| 0.857 || 0.4026 || 0.5478 || 0.8945 || 0.9241 || 0.909
|-
! MM3
| 0.9873 || 0.1856 || 0.3124 || 0.8383 || 0.9917 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! LN1(GAFMF)
| 0.0727 || 0.0197 || 0.0965 || 0.031 || 0.1918 || 0.0505 || 0.2637 || 0.0889
|-
! LN1(GAFSF)
| 0.0677 || 0.0145 || 0.0977 || 0.0266 || 0.2063 || 0.0524 || 0.2804 || 0.092
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_P
! width="80" | Fg-Music_R
! width="80" | Fg-Music_F
! width="80" | Bg-Music_P
! width="80" | Bg-Music_R
! width="80" | Bg-Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.8025 || 0.774 || 0.788 || 0.8211 || 0.821 || 0.821 || 0.9026 || 0.9103 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | NoMusic_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-24T16:27:40Z

Blai Melendez-Catalan: /* General Legend */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| [https://www.music-ir.org/mirex/abstracts/2018/DD1.pdf PDF] || David Doukhan, Eliott Lechapt, Marc Evrard, Jean Carrive
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG3
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

Accuracy = segment-level accuracy

<class>_P = segment-level precision for the class <class>

<class>_R = segment-level recall for the class <class>

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| 0.8506 || 0.967 || 0.7134 || 0.8211 || 0.7866 || 0.9775 || 0.8717
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| 0.4403 || 0.1991 || 0.4973 || 0.2788
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.624 || 0.4082 || 0.4936 || 0.9683 || 0.6415 || 0.7718
|-
! MM1
| 0.8072 || 0.257 || 0.3899 || 0.8795 || 0.946 || 0.9115
|-
! MM2
| 0.857 || 0.4026 || 0.5478 || 0.8945 || 0.9241 || 0.909
|-
! MM3
| 0.9873 || 0.1856 || 0.3124 || 0.8383 || 0.9917 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! LN1(GAFMF)
| 0.0727 || 0.0197 || 0.0965 || 0.031 || 0.1918 || 0.0505 || 0.2637 || 0.0889
|-
! LN1(GAFSF)
| 0.0677 || 0.0145 || 0.0977 || 0.0266 || 0.2063 || 0.0524 || 0.2804 || 0.092
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_P
! width="80" | Fg-Music_R
! width="80" | Fg-Music_F
! width="80" | Bg-Music_P
! width="80" | Bg-Music_R
! width="80" | Bg-Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.8025 || 0.774 || 0.788 || 0.8211 || 0.821 || 0.821 || 0.9026 || 0.9103 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-21T15:50:23Z

Blai Melendez-Catalan: /* Statistics notation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG3
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

Accuracy = segment-level accuracy

<class>_P = segment-level precision for the class <class>

<class>_R = segment-level recall for the class <class>

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| 0.8506 || 0.967 || 0.7134 || 0.8211 || 0.7866 || 0.9775 || 0.8717
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| 0.4403 || 0.1991 || 0.4973 || 0.2788
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.624 || 0.4082 || 0.4936 || 0.9683 || 0.6415 || 0.7718
|-
! MM1
| 0.8072 || 0.257 || 0.3899 || 0.8795 || 0.946 || 0.9115
|-
! MM2
| 0.857 || 0.4026 || 0.5478 || 0.8945 || 0.9241 || 0.909
|-
! MM3
| 0.9873 || 0.1856 || 0.3124 || 0.8383 || 0.9917 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! LN1(GAFMF)
| 0.0727 || 0.0197 || 0.0965 || 0.031 || 0.1918 || 0.0505 || 0.2637 || 0.0889
|-
! LN1(GAFSF)
| 0.0677 || 0.0145 || 0.0977 || 0.0266 || 0.2063 || 0.0524 || 0.2804 || 0.092
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_P
! width="80" | Fg-Music_R
! width="80" | Fg-Music_F
! width="80" | Bg-Music_P
! width="80" | Bg-Music_R
! width="80" | Bg-Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.8025 || 0.774 || 0.788 || 0.8211 || 0.821 || 0.821 || 0.9026 || 0.9103 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-21T15:48:49Z

Blai Melendez-Catalan: /* General Legend */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG3
| [https://www.music-ir.org/mirex/abstracts/2018/MMG.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| 0.8506 || 0.967 || 0.7134 || 0.8211 || 0.7866 || 0.9775 || 0.8717
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| 0.4403 || 0.1991 || 0.4973 || 0.2788
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.624 || 0.4082 || 0.4936 || 0.9683 || 0.6415 || 0.7718
|-
! MM1
| 0.8072 || 0.257 || 0.3899 || 0.8795 || 0.946 || 0.9115
|-
! MM2
| 0.857 || 0.4026 || 0.5478 || 0.8945 || 0.9241 || 0.909
|-
! MM3
| 0.9873 || 0.1856 || 0.3124 || 0.8383 || 0.9917 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! LN1(GAFMF)
| 0.0727 || 0.0197 || 0.0965 || 0.031 || 0.1918 || 0.0505 || 0.2637 || 0.0889
|-
! LN1(GAFSF)
| 0.0677 || 0.0145 || 0.0977 || 0.0266 || 0.2063 || 0.0524 || 0.2804 || 0.092
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_P
! width="80" | Fg-Music_R
! width="80" | Fg-Music_F
! width="80" | Bg-Music_P
! width="80" | Bg-Music_R
! width="80" | Bg-Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.8025 || 0.774 || 0.788 || 0.8211 || 0.821 || 0.821 || 0.9026 || 0.9103 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T22:52:56Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| 0.8506 || 0.967 || 0.7134 || 0.8211 || 0.7866 || 0.9775 || 0.8717
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| 0.4403 || 0.1991 || 0.4973 || 0.2788
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.624 || 0.4082 || 0.4936 || 0.9683 || 0.6415 || 0.7718
|-
! MM1
| 0.8072 || 0.257 || 0.3899 || 0.8795 || 0.946 || 0.9115
|-
! MM2
| 0.857 || 0.4026 || 0.5478 || 0.8945 || 0.9241 || 0.909
|-
! MM3
| 0.9873 || 0.1856 || 0.3124 || 0.8383 || 0.9917 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! LN1(GAFMF)
| 0.0727 || 0.0197 || 0.0965 || 0.031 || 0.1918 || 0.0505 || 0.2637 || 0.0889
|-
! LN1(GAFSF)
| 0.0677 || 0.0145 || 0.0977 || 0.0266 || 0.2063 || 0.0524 || 0.2804 || 0.092
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_P
! width="80" | Fg-Music_R
! width="80" | Fg-Music_F
! width="80" | Bg-Music_P
! width="80" | Bg-Music_R
! width="80" | Bg-Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.8025 || 0.774 || 0.788 || 0.8211 || 0.821 || 0.821 || 0.9026 || 0.9103 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T22:47:52Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| 0.8506 || 0.967 || 0.7134 || 0.8211 || 0.7866 || 0.9775 || 0.8717
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.624 || 0.4082 || 0.4936 || 0.9683 || 0.6415 || 0.7718
|-
! MM1
| 0.8072 || 0.257 || 0.3899 || 0.8795 || 0.946 || 0.9115
|-
! MM2
| 0.857 || 0.4026 || 0.5478 || 0.8945 || 0.9241 || 0.909
|-
! MM3
| 0.9873 || 0.1856 || 0.3124 || 0.8383 || 0.9917 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! LN1(GAFMF)
| 0.0727 || 0.0197 || 0.0965 || 0.031 || 0.1918 || 0.0505 || 0.2637 || 0.0889
|-
! LN1(GAFSF)
| 0.0677 || 0.0145 || 0.0977 || 0.0266 || 0.2063 || 0.0524 || 0.2804 || 0.092
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_P
! width="80" | Fg-Music_R
! width="80" | Fg-Music_F
! width="80" | Bg-Music_P
! width="80" | Bg-Music_R
! width="80" | Bg-Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.8025 || 0.774 || 0.788 || 0.8211 || 0.821 || 0.821 || 0.9026 || 0.9103 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T18:00:52Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.624 || 0.4082 || 0.4936 || 0.9683 || 0.6415 || 0.7718
|-
! MM1
| 0.8072 || 0.257 || 0.3899 || 0.8795 || 0.946 || 0.9115
|-
! MM2
| 0.857 || 0.4026 || 0.5478 || 0.8945 || 0.9241 || 0.909
|-
! MM3
| 0.9873 || 0.1856 || 0.3124 || 0.8383 || 0.9917 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! LN1(GAFMF)
| 0.0727 || 0.0197 || 0.0965 || 0.031 || 0.1918 || 0.0505 || 0.2637 || 0.0889
|-
! LN1(GAFSF)
| 0.0677 || 0.0145 || 0.0977 || 0.0266 || 0.2063 || 0.0524 || 0.2804 || 0.092
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_P
! width="80" | Fg-Music_R
! width="80" | Fg-Music_F
! width="80" | Bg-Music_P
! width="80" | Bg-Music_R
! width="80" | Bg-Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.8025 || 0.774 || 0.788 || 0.8211 || 0.821 || 0.821 || 0.9026 || 0.9103 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:58:41Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.624 || 0.4082 || 0.4936 || 0.9683 || 0.6415 || 0.7718
|-
! MM1
| 0.8072 || 0.257 || 0.3899 || 0.8795 || 0.946 || 0.9115
|-
! MM2
| 0.857 || 0.4026 || 0.5478 || 0.8945 || 0.9241 || 0.909
|-
! MM3
| 0.9873 || 0.1856 || 0.3124 || 0.8383 || 0.9917 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! LN1(GAFMF)
| 0.0727 || 0.0197 || 0.0965 || 0.031 || 0.1918 || 0.0505 || 0.2637 || 0.0889
|-
! LN1(GAFSF)
| 0.0677 || 0.0145 || 0.0977 || 0.0266 || 0.2063 || 0.0524 || 0.2804 || 0.092
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:55:14Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! LN1(GAFMF)
| 0.0727 || 0.0197 || 0.0965 || 0.031 || 0.1918 || 0.0505 || 0.2637 || 0.0889
|-
! LN1(GAFSF)
| 0.0677 || 0.0145 || 0.0977 || 0.0266 || 0.2063 || 0.0524 || 0.2804 || 0.092
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:52:42Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1(GAFMFSF)
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:52:32Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:51:31Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1(GAFMFSF)
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:51:21Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1(GAFMFSF)
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:50:20Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
|-
! LN1(GAFMFSF)
| 0.813 || 0.7599 || 0.7855 || 0.9671 || 0.7511 || 0.8455
|-
! LN1(GAFMF)
| 0.7682 || 0.7504 || 0.7592 || 0.9747 || 0.6625 || 0.7888
|-
! LN1(GAFSF)
| 0.797 || 0.7965 || 0.7968 || 0.9637 || 0.7178 || 0.8227
|-
! MM1
| 0.9765 || 0.8747 || 0.9228 || 0.9134 || 0.9526 || 0.9326
|-
! MM2
| 0.9246 || 0.9072 || 0.9158 || 0.9328 || 0.8959 || 0.914
|-
! MM3
| 0.9794 || 0.7973 || 0.8791 || 0.8289 || 0.9781 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:44:13Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.909 || 0.9285 || 0.9186 || 0.7751 || 0.7251 || 0.7493
|-
! JHKK3
| 0.8307 || 0.9379 || 0.8279 || 0.8795 || 0.6219 || 0.839 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.9579 || 0.6125 || 0.7472 || 0.4457 || 0.9213 || 0.6007
|-
! MM1
| 0.8626 || 0.8795 || 0.946 || 0.9115 || 0.7953 || 0.6169 || 0.6948
|-
! MM2
| 0.8619 || 0.8945 || 0.9241 || 0.909 || 0.7516 || 0.6782 || 0.713
|-
! MM3
| 0.8508 || 0.8383 || 0.9917 || 0.9086 || 0.9458 || 0.4357 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:39:15Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.7472 || 0.6007
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9603 || 0.9564 || 0.9583 || 0.9633 || 0.9662 || 0.9648
|-
! JHKK3
| 0.8575 || 0.9125 || 0.7619 || 0.8305 || 0.8222 || 0.9384 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.9587 || 0.7339 || 0.8314 || 0.8113 || 0.9733 || 0.885
|-
! LN1(GAFMF)
| 0.8754 || 0.9591 || 0.7604 || 0.8483 || 0.8267 || 0.9726 || 0.8937
|-
! LN1(GAFSF)
| 0.8597 || 0.959 || 0.7249 || 0.8256 || 0.8062 || 0.9739 || 0.8821
|-
! MM1
| 0.9367 || 0.9134 || 0.9526 || 0.9326 || 0.9585 || 0.9232 || 0.9405
|-
! MM2
| 0.9226 || 0.9328 || 0.8959 || 0.914 || 0.9147 || 0.9451 || 0.9296
|-
! MM3
| 0.8973 || 0.8289 || 0.9781 || 0.8973 || 0.978 || 0.829 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:35:50Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.7472 || 0.6007
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.8314 || 0.885
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| 0.1903 || 0.0548 || 0.2606 || 0.0918
|-
! LN1(GAFSF)
| 0.1839 || 0.0452 || 0.2446 || 0.0731
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:30:07Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.7472 || 0.6007
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.8314 || 0.885
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:29:52Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.7472 || 0.6007
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.8314 || 0.885
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:29:26Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.7472 || 0.6007
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.8314 || 0.885
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:28:30Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.7472 || 0.6007
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.8314 || 0.885
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1(GAFMFSF)
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:27:56Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.7472 || 0.6007
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.8314 || 0.885
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:27:29Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.7472 || 0.6007
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.8314 || 0.885
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469 |-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:27:04Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.7472 || 0.6007
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1(GAFMFSF)
| 0.8636 || 0.8314 || 0.885
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469 |-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:26:01Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1(GAFMFSF)
| 0.6908 || 0.7472 || 0.6007
|-
! LN1(GAFMF)
| || ||
|-
! LN1(GAFSF)
| || ||
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1
| 0.8636 || 0.8314 || 0.885
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469 |-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:24:50Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_P
! width="80" | Speech_R
! width="80" | Speech_F
! width="80" | No-Speech_P
! width="80" | No-Speech_R
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1
| 0.6908 || 0.7472 || 0.6007
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1
| 0.8636 || 0.8314 || 0.885
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469 |-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:22:20Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1(GAFMFSF)
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! LN1(GAFMF)
| 0.1037 || 0.0257 || 0.139 || 0.0449
|-
! LN1(GAFSF)
| 0.1026 || 0.0249 || 0.1385 || 0.0425
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| 0.1785 || 0.0298 || 0.2645 || 0.0595
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1
| 0.6908 || 0.7472 || 0.6007
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1
| 0.8636 || 0.8314 || 0.885
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469 |-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T17:17:50Z

Blai Melendez-Catalan: /* Task 1: Music Detection */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
| || || ||
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9751 || 0.8950 || 0.9334 || 0.8694 || 0.9683 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9665 || 0.9315 || 0.9487 || 0.9094 || 0.9553 || 0.9318
|-
! JHKK2
| 0.9153 || 0.885 || 0.9817 || 0.9309 || 0.97 || 0.8233 || 0.8907
|-
! LN1(GAFMFSF)
| 0.7814 || 0.8319 || 0.7804 || 0.8053 || 0.7196 || 0.7828 || 0.7499
|-
! LN1(GAFMF)
| 0.7751 || 0.8481 || 0.7456 || 0.7936 || 0.6978 || 0.8161 || 0.7523
|-
! LN1(GAFSF)
| 0.7996 || 0.836 || 0.8137 || 0.8247 || 0.7507 || 0.78 || 0.7651
|-
! MM1
| 0.915 || 0.9765 || 0.8747 || 0.9228 || 0.8483 || 0.9708 || 0.9054
|-
! MM2
| 0.9032 || 0.9246 || 0.9072 || 0.9158 || 0.8745 || 0.8977 || 0.8859
|-
! MM3
| 0.8725 || 0.9794 || 0.7973 || 0.8791 || 0.7764 || 0.9769 || 0.8652
|-
! MMG1
| 0.9025 || 0.8586 || 0.9961 || 0.9223 || 0.9931 || 0.7726 || 0.8691
|-
! MMG3
| 0.949 || 0.9299 || 0.9865 || 0.9574 || 0.9795 || 0.8969 || 0.9364
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| || ||
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1
| 0.6908 || 0.7472 || 0.6007
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1
| 0.8636 || 0.8314 || 0.885
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469 |-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T16:50:09Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
|
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || || || 0.9334 || || || 0.9162
|-
! JHKK1
| 0.9415 || || || 0.9487 || || || 0.9318
|-
! JHKK2
| 0.9153 || || || 0.9309 || || || 0.8907
|-
! LN1
| 0.7814 || || || 0.8053 || || || 0.7499
|-
! MM1
| 0.915 || || || 0.9228 || || || 0.9054
|-
! MM2
| 0.9032 || || || 0.9158 || || || 0.8859
|-
! MM3
| 0.8725 || || || 0.8791 || || || 0.8652
|-
! MMG1
| 0.9025 || || || 0.9223 || || || 0.8691
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| || ||
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1
| 0.6908 || 0.7472 || 0.6007
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1
| 0.8636 || 0.8314 || 0.885
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469 |-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T16:40:17Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
|
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_F
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9334 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9487 || 0.9318
|-
! JHKK2
| 0.9153 || 0.9309 || 0.8907
|-
! LN1
| 0.7814 || 0.8053 || 0.7499
|-
! MM1
| 0.915 || 0.9228 || 0.9054
|-
! MM2
| 0.9032 || 0.9158 || 0.8859
|-
! MM3
| 0.8725 || 0.8791 || 0.8652
|-
! MMG1
| 0.9025 || 0.9223 || 0.8691
|-
! MMG3
| || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|-
! MMG3
| || ||
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1
| 0.6908 || 0.7472 || 0.6007
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1
| 0.8636 || 0.8314 || 0.885
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469 |-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T16:39:56Z

Blai Melendez-Catalan: /* Segment-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
|
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_F
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9334 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9487 || 0.9318
|-
! JHKK2
| 0.9153 || 0.9309 || 0.8907
|-
! LN1
| 0.7814 || 0.8053 || 0.7499
|-
! MM1
| 0.915 || 0.9228 || 0.9054
|-
! MM2
| 0.9032 || 0.9158 || 0.8859
|-
! MM3
| 0.8725 || 0.8791 || 0.8652
|-
! MMG1
| 0.9025 || 0.9223 || 0.8691
|-
! MMG3
| || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1
| 0.6908 || 0.7472 || 0.6007
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1
| 0.8636 || 0.8314 || 0.885
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469 |-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}

2018:Music and or Speech Detection Results

2018-09-19T16:39:28Z

Blai Melendez-Catalan: /* Event-level Evaluation */

==Introduction==
These are the results for the 2018 running of the Music and/or Speech Detection tasks. For background information about this task set please refer to the [[2018:Music and/or Speech Detection]] page.

==General Legend==
{| border="1" cellspacing="0" style="text-align: left; width: 800px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Abstract
! width="440" | Contributors
|-
! DD1
| PDF || David Doukhan
|-
! JHKK1
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK1.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK2
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK2.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! JHKK3
| [https://www.music-ir.org/mirex/abstracts/2018/JHKK3.pdf PDF] || Byeong-Yong Jang, Woon-Haeng Heo, Jung-Hyun Kim, Oh-Wook Kwon
|-
! LN1
| [https://www.music-ir.org/mirex/abstracts/2018/LN1.pdf PDF] || Minsuk Choi, Jongpil Lee, Juhan Nam
|-
! MM1
| [https://www.music-ir.org/mirex/abstracts/2018/MM1.pdf PDF] || Matija Marolt
|-
! MM2
| [https://www.music-ir.org/mirex/abstracts/2018/MM2.pdf PDF] || Matija Marolt
|-
! MM3
| [https://www.music-ir.org/mirex/abstracts/2018/MM3.pdf PDF] || Matija Marolt
|-
! MMG1
| [https://www.music-ir.org/mirex/abstracts/2018/MMG1.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|-
! MMG2
| [https://www.music-ir.org/mirex/abstracts/2018/MMG2.pdf PDF] || Blai Meléndez-Catalán, Emilio Molina, Emilia Gómez
|}

==Statistics notation==

<class>_F = segment-level F-measure for the class <class>

<class>_F_500_on = onset-only event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_500_onoff = onset-offset event-level F-measure (500 ms tolerance) for the class <class>

<class>_F_1000_on = onset-only event-level F-measure (1000 ms tolerance) for the class <class>

<class>_F_1000_onoff = onset-offset event-level F-measure (1000 ms tolerance) for the class <class>

==Datasets description==

[https://www.music-ir.org/mirex/wiki/2018:Music_and/or_Speech_Detection#Evaluation_Dataset Dataset description]

==Task 1: Music Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_P
! width="80" | Music_R
! width="80" | Music_F
! width="80" | No-Music_P
! width="80" | No-Music_R
! width="80" | No-Music_F
|-
! DD1
| 0.6860 || 0.905 || 0.3873 || 0.5424 || 0.6294 || 0.9624 || 0.7611
|-
! JHKK1
| 0.7798 || 0.9564 || 0.5675 || 0.7123 || 0.7092 || 9761 || 0.8215
|-
! JHKK2
| 0.8005 || 0.9824 || 0.5955 || 0.7415 || 0.7256 || 0.9902 || 0.8375
|-
! LN1
| 0.6251 || 0.6915 || 0.3943 || 0.5022 || 0.5988 || 0.8385 || 0.6987
|-
! MM1
| 0.6135 || 0.8072 || 0.257 || 0.3899 || 0.5786 || 0.9432 || 0.7172
|-
! MM2
| 0.6807 || 0.857 || 0.4026 || 0.5478 || 0.6292 || 0.938 || 0.7531
|-
! MM3
| 0.6075 || 0.9873 || 0.1856 || 0.3124 || 0.5698 || 0.9978 || 0.7254
|-
! MMG1
| 0.9049 || 0.9131 || 0.8865 || 0.8996 || 0.8978 || 0.9219 || 0.9097
|-
! MMG3
| || || || || || ||
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.2877 || 0.093 || 0.312 || 0.1142
|-
! JHKK1
| 0.2303 || 0.0765 || 0.294 || 0.1173
|-
! JHKK2
| 0.2522 || 0.0931 || 0.3245 || 0.1389
|-
! LN1
| 0.1348 || 0.0139 || 0.1704 || 0.0231
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676
|-
! MMG1
| 0.5177 || 0.2693 || 0.5813 || 0.3502
|-
! MMG3
|
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Music_F
! width="80" | No-Music_F
|-
! DD1
| 0.9257 || 0.9334 || 0.9162
|-
! JHKK1
| 0.9415 || 0.9487 || 0.9318
|-
! JHKK2
| 0.9153 || 0.9309 || 0.8907
|-
! LN1
| 0.7814 || 0.8053 || 0.7499
|-
! MM1
| 0.915 || 0.9228 || 0.9054
|-
! MM2
| 0.9032 || 0.9158 || 0.8859
|-
! MM3
| 0.8725 || 0.8791 || 0.8652
|-
! MMG1
| 0.9025 || 0.9223 || 0.8691
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
|-
! DD1
| 0.4089 || 0.2235 || 0.4402 || 0.248
|-
! JHKK1
| 0.1659 || 0.0347 || 0.2334 || 0.0636
|-
! JHKK2
| 0.167 || 0.029 || 0.2015 || 0.0599
|-
! LN1
| 0.0991 || 0.0228 || 0.1319 || 0.0428
|-
! MM1
| 0.1412 || 0.0159 || 0.1843 || 0.0392
|-
! MM2
| 0.1540 || 0.0312 || 0.231 || 0.0791
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535
|-
! MMG1
| 0.1358 || 0.0173 || 0.1936 || 0.0347
|}

==Task 2: Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.877 || 0.9186 || 0.7493
|-
! JHKK3
| 0.8307 || 0.8795 || 0.7143
|-
! LN1
| 0.6908 || 0.7472 || 0.6007
|-
! MM1
| 0.8626 || 0.9115 || 0.6948
|-
! MM2
| 0.8619 || 0.909 || 0.713
|-
! MM3
| 0.8508 || 0.9086 || 0.5966
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.415 || 0.1603 || 0.4477 || 0.2122
|-
! JHKK3
| 0.2882 || 0.0777 || 0.3289 || 0.0962
|-
! LN1
| 0.2686 || 0.0529 || 0.3484 || 0.0883
|-
! MM1
| 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Speech_F
! width="80" | No-Speech_F
|-
! DD1
| 0.9617 || 0.9583 || 0.9648
|-
! JHKK3
| 0.8575 || 0.8305 || 0.8765
|-
! LN1
| 0.8636 || 0.8314 || 0.885
|-
! MM1
| 0.9367 || 0.9326 || 0.9405
|-
! MM2
| 0.9226 || 0.914 || 0.9296
|-
! MM3
| 0.8973 || 0.8973 || 0.8974
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! DD1
| 0.6037 || 0.4139 || 0.6318 || 0.435
|-
! JHKK3
| 0.1585 || 0.0405 || 0.2095 || 0.0563
|-
! LN1
| 0.1775 || 0.0399 || 0.2426 || 0.0738
|-
! MM1
| 0.0632 || 0.0015 || 0.0947 || 0.0150
|-
! MM2
| 0.1162 || 0.0211 || 0.1737 || 0.0469 |-
! MM3
| 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 3: Music and Speech Detection==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.4936 || 0.7718
|-
! MM1
| 0.3899 || 0.9115
|-
! MM2
| 0.5478 || 0.909
|-
! MM3
| 0.3124 || 0.9086
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.1116 || 0.0088 || 0.1459 || 0.0186 || 0.2645 || 0.0462 || 0.348 || 0.0786
|-
! MM1
| 0.2044 || 0.0662 || 0.2137 || 0.0831 || 0.4607 || 0.2068 || 0.4898 || 0.2336
|-
! MM2
| 0.2464 || 0.0817 || 0.2736 || 0.1049 || 0.4422 || 0.1999 || 0.5093 || 0.266
|-
! MM3
| 0.1379 || 0.0525 || 0.1619 || 0.0676 || 0.4439 || 0.1775 || 0.4879 || 0.2122
|}

===Dataset 2===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F
! width="80" | Speech_F
|-
! LN1
| 0.7855 || 0.8455
|-
! MM1
| 0.9228 || 0.9326
|-
! MM2
| 0.9158 || 0.914
|-
! MM3
| 0.8791 || 0.8973
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Music_F_500_on
! width="80" | Music_F_500_onoff
! width="80" | Music_F_1000_on
! width="80" | Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! LN1
| 0.087 || 0.0232 || 0.1133 || 0.0375 || 0.2233 || 0.0766 || 0.3148 || 0.1277
|-
! MM1
| 0.1412 || 0.0157 || 0.1843 || 0.0392 || 0.0632 || 0.0015 || 0.0947 || 0.015
|-
! MM2
| 0.154 || 0.0312 || 0.231 || 0.0791 || 0.1162 || 0.0211 || 0.1737 || 0.0469
|-
! MM3
| 0.1516 || 0.0223 || 0.1962 || 0.0535 || 0.0796 || 0.0152 || 0.123 || 0.0281
|}

==Task 4: Music Relative Loudness Estimation==

===Dataset 1===

====Segment-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Accuracy
! width="80" | Fg-Music_F
! width="80" | Bg-Music_F
! width="80" | No-Music_F
|-
! MMG2
| 0.8615 || 0.788 || 0.821 || 0.9064
|}

====Event-level Evaluation====

{| border="1" cellspacing="0" style="text-align: left; width: 240px;"
|- style="background: yellow;"
! width="80" | Sub code
! width="80" style="text-align: center;" | Fg-Music_F_500_on
! width="80" | Fg-Music_F_500_onoff
! width="80" | Fg-Music_F_1000_on
! width="80" | Fg-Music_F_1000_onoff
! width="80" | Bg-Music_F_500_on
! width="80" | Bg-Music_F_500_onoff
! width="80" | Bg-Music_F_1000_on
! width="80" | Bg-Music_F_1000_onoff
! width="80" | Speech_F_500_on
! width="80" | Speech_F_500_onoff
! width="80" | Speech_F_1000_on
! width="80" | Speech_F_1000_onoff
|-
! MMG2
| 0.3298 || 0.1775 || 0.4106 || 0.2742 || 0.3853 || 0.1388 || 0.4463 || 0.2024 || 0.5254 || 0.3123 || 0.5927 || 0.3925
|}