<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://music-ir.org/mirex/w/index.php?action=history&amp;feed=atom&amp;title=2026%3AAudio_Instrument_Recognition</id>
	<title>2026:Audio Instrument Recognition - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://music-ir.org/mirex/w/index.php?action=history&amp;feed=atom&amp;title=2026%3AAudio_Instrument_Recognition"/>
	<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2026:Audio_Instrument_Recognition&amp;action=history"/>
	<updated>2026-07-01T22:41:47Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.31.1</generator>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2026:Audio_Instrument_Recognition&amp;diff=15033&amp;oldid=prev</id>
		<title>Chestnut at 04:57, 1 July 2026</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2026:Audio_Instrument_Recognition&amp;diff=15033&amp;oldid=prev"/>
		<updated>2026-07-01T04:57:25Z</updated>

		<summary type="html">&lt;p&gt;&lt;/p&gt;
&lt;table class=&quot;diff diff-contentalign-left&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #222; text-align: center;&quot;&gt;Revision as of 04:57, 1 July 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l42&quot; &gt;Line 42:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 42:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Dataset-specific labels may be mapped to the official vocabulary when necessary. For example, labels such as &amp;quot;drum kit&amp;quot;, &amp;quot;drums&amp;quot;, and &amp;quot;drum set&amp;quot; may be mapped to &amp;quot;drums&amp;quot;; labels such as &amp;quot;synth&amp;quot; and &amp;quot;synthesizer&amp;quot; may be mapped to &amp;quot;synthesizer&amp;quot;.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;Dataset-specific labels may be mapped to the official vocabulary when necessary. For example, labels such as &amp;quot;drum kit&amp;quot;, &amp;quot;drums&amp;quot;, and &amp;quot;drum set&amp;quot; may be mapped to &amp;quot;drums&amp;quot;; labels such as &amp;quot;synth&amp;quot; and &amp;quot;synthesizer&amp;quot; may be mapped to &amp;quot;synthesizer&amp;quot;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;final &lt;/del&gt;label &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;list and any &lt;/del&gt;label-mapping &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;rules &lt;/del&gt;will be &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;announced before evaluation&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;official evaluation &lt;/ins&gt;label &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;set is expected to follow the 20-&lt;/ins&gt;label &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;OpenMIC-based vocabulary listed above. Any necessary dataset&lt;/ins&gt;-&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;specific label &lt;/ins&gt;mapping &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;or label exclusion after the final annotation audit &lt;/ins&gt;will be &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;documented and applied uniformly to all submitted systems&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= Datasets =&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= Datasets =&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l174&quot; &gt;Line 174:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 174:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The official leaderboard will be determined by clip-level macro-F1 on the hidden evaluation set. Results on any public reference dataset will be reported separately.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;The official leaderboard will be determined by clip-level macro-F1 on the hidden evaluation set. Results on any public reference dataset will be reported separately.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;For confidence-based outputs, &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;the official evaluation script may use &lt;/del&gt;a fixed threshold&lt;del class=&quot;diffchange diffchange-inline&quot;&gt;, a predefined thresholding rule, &lt;/del&gt;or &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;threshold-independent metrics such &lt;/del&gt;as &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;mAP&lt;/del&gt;. &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;The thresholding rule &lt;/del&gt;will be &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;specified &lt;/del&gt;before &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;evaluation&lt;/del&gt;.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;For confidence-based outputs, &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;scores will be converted to binary predictions using &lt;/ins&gt;a fixed threshold &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;of 0.5. Scores greater than &lt;/ins&gt;or &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;equal to 0.5 will be treated as positive predictions; scores below 0.5 will be treated &lt;/ins&gt;as &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;negative predictions&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;#160;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot;&gt;&amp;#160;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;Minor updates to the evaluation protocol may be made after the final data audit. Any changes &lt;/ins&gt;will be &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;announced on this page &lt;/ins&gt;before &lt;ins class=&quot;diffchange diffchange-inline&quot;&gt;the official results are released&lt;/ins&gt;.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= Time and Hardware Limits =&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;= Time and Hardware Limits =&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l197&quot; &gt;Line 197:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 199:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;2. J. J. Bosch, J. Janer, F. Fuhrmann, and P. Herrera, “A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals,” in ''Proceedings of the 13th International Society for Music Information Retrieval Conference'', Porto, Portugal, 2012, pp. 559–564.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;2. J. J. Bosch, J. Janer, F. Fuhrmann, and P. Herrera, “A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals,” in ''Proceedings of the 13th International Society for Music Information Retrieval Conference'', Porto, Portugal, 2012, pp. 559–564.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;−&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;3. R. M. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and &lt;del class=&quot;diffchange diffchange-inline&quot;&gt;J. &lt;/del&gt;P. Bello, “MedleyDB: A multitrack dataset for annotation-intensive MIR research,” in ''Proceedings of the 15th International Society for Music Information Retrieval Conference'', Taipei, Taiwan, 2014, pp. 155–160.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;+&lt;/td&gt;&lt;td style=&quot;color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;3. R. M. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and P. Bello, “MedleyDB: A multitrack dataset for annotation-intensive MIR research,” in ''Proceedings of the 15th International Society for Music Information Retrieval Conference'', Taipei, Taiwan, 2014, pp. 155–160.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;4. D. Bogdanov, M. Won, P. Tovstogan, A. Porter, and X. Serra, “The MTG-Jamendo dataset for automatic music tagging,” in ''Machine Learning for Music Discovery Workshop, International Conference on Machine Learning'', Long Beach, CA, USA, 2019.&lt;/div&gt;&lt;/td&gt;&lt;td class='diff-marker'&gt;&amp;#160;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #222; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;4. D. Bogdanov, M. Won, P. Tovstogan, A. Porter, and X. Serra, “The MTG-Jamendo dataset for automatic music tagging,” in ''Machine Learning for Music Discovery Workshop, International Conference on Machine Learning'', Long Beach, CA, USA, 2019.&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;</summary>
		<author><name>Chestnut</name></author>
		
	</entry>
	<entry>
		<id>https://music-ir.org/mirex/w/index.php?title=2026:Audio_Instrument_Recognition&amp;diff=14956&amp;oldid=prev</id>
		<title>Chestnut: Created page with &quot;= Description =  This page describes the '''MIREX 2026: Audio Instrument Recognition''' task.  The task is clip-level multi-label instrument recognition. Given a music audio e...&quot;</title>
		<link rel="alternate" type="text/html" href="https://music-ir.org/mirex/w/index.php?title=2026:Audio_Instrument_Recognition&amp;diff=14956&amp;oldid=prev"/>
		<updated>2026-06-12T14:42:11Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot;= Description =  This page describes the &amp;#039;&amp;#039;&amp;#039;MIREX 2026: Audio Instrument Recognition&amp;#039;&amp;#039;&amp;#039; task.  The task is clip-level multi-label instrument recognition. Given a music audio e...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;= Description =&lt;br /&gt;
&lt;br /&gt;
This page describes the '''MIREX 2026: Audio Instrument Recognition''' task.&lt;br /&gt;
&lt;br /&gt;
The task is clip-level multi-label instrument recognition. Given a music audio excerpt, a submitted system should predict which instruments are present in the excerpt.&lt;br /&gt;
&lt;br /&gt;
For an input audio excerpt '''X''', the system outputs confidence scores over a fixed instrument label set:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
Prediction(X) = [s_1, s_2, ..., s_K]&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where '''K''' is the number of labels in the official instrument vocabulary, and '''s_k''' is the predicted confidence score for instrument '''k'''.&lt;br /&gt;
&lt;br /&gt;
= Instrument Label Set =&lt;br /&gt;
&lt;br /&gt;
The task will use a fixed instrument label set for evaluation. The label vocabulary is based on the OpenMIC-2018 instrument taxonomy, which contains the following 20 labels:&lt;br /&gt;
&lt;br /&gt;
* accordion&lt;br /&gt;
* banjo&lt;br /&gt;
* bass&lt;br /&gt;
* cello&lt;br /&gt;
* clarinet&lt;br /&gt;
* cymbals&lt;br /&gt;
* drums&lt;br /&gt;
* flute&lt;br /&gt;
* guitar&lt;br /&gt;
* mallet_percussion&lt;br /&gt;
* mandolin&lt;br /&gt;
* organ&lt;br /&gt;
* piano&lt;br /&gt;
* saxophone&lt;br /&gt;
* synthesizer&lt;br /&gt;
* trombone&lt;br /&gt;
* trumpet&lt;br /&gt;
* ukulele&lt;br /&gt;
* violin&lt;br /&gt;
* voice&lt;br /&gt;
&lt;br /&gt;
The final evaluated labels will be selected from this vocabulary according to coverage and annotation reliability in the official evaluation data. Labels with insufficient positive examples may be excluded from the official ranking.&lt;br /&gt;
&lt;br /&gt;
Dataset-specific labels may be mapped to the official vocabulary when necessary. For example, labels such as &amp;quot;drum kit&amp;quot;, &amp;quot;drums&amp;quot;, and &amp;quot;drum set&amp;quot; may be mapped to &amp;quot;drums&amp;quot;; labels such as &amp;quot;synth&amp;quot; and &amp;quot;synthesizer&amp;quot; may be mapped to &amp;quot;synthesizer&amp;quot;.&lt;br /&gt;
&lt;br /&gt;
The final label list and any label-mapping rules will be announced before evaluation.&lt;br /&gt;
&lt;br /&gt;
= Datasets =&lt;br /&gt;
&lt;br /&gt;
== Training Datasets ==&lt;br /&gt;
&lt;br /&gt;
There are no restrictions on the training data used by participating systems. However, each submission must clearly state the training data used in its system description or extended abstract.&lt;br /&gt;
&lt;br /&gt;
Participants should report:&lt;br /&gt;
&lt;br /&gt;
* the names of all training datasets used;&lt;br /&gt;
* whether OpenMIC-2018 was used for training, validation, threshold tuning, or model selection;&lt;br /&gt;
* any external pretrained models used;&lt;br /&gt;
* any additional data augmentation or post-processing steps.&lt;br /&gt;
&lt;br /&gt;
== Evaluation Datasets ==&lt;br /&gt;
&lt;br /&gt;
The evaluation will include an official hidden evaluation set curated for this task. The hidden evaluation set will consist of music audio excerpts with clip-level instrument-presence annotations.&lt;br /&gt;
&lt;br /&gt;
The hidden evaluation data will not be distributed to participants. Submitted systems will be run by the task organizers or through the MIREX evaluation infrastructure.&lt;br /&gt;
&lt;br /&gt;
A public reference evaluation may also be reported using the official OpenMIC-2018 test partition. Results on this public reference set will be reported separately from the official hidden evaluation results.&lt;br /&gt;
&lt;br /&gt;
= Submission Format =&lt;br /&gt;
&lt;br /&gt;
Submissions should be packaged as a compressed file, such as &amp;lt;code&amp;gt;.zip&amp;lt;/code&amp;gt;, &amp;lt;code&amp;gt;.tar.gz&amp;lt;/code&amp;gt;, or &amp;lt;code&amp;gt;.rar&amp;lt;/code&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Each submission should contain at least the following files:&lt;br /&gt;
&lt;br /&gt;
== A) The main recognition script ==&lt;br /&gt;
&lt;br /&gt;
The main recognition script should be executable from the command line. It may be a bash script, Python script, binary executable, or another clearly documented executable entry point.&lt;br /&gt;
&lt;br /&gt;
The submitted system must take as input a directory of audio files and produce an output file containing predicted instrument scores for each audio excerpt.&lt;br /&gt;
&lt;br /&gt;
Denoting the input audio directory as &amp;lt;code&amp;gt;${input_dir}&amp;lt;/code&amp;gt; and the output file path as &amp;lt;code&amp;gt;${output}&amp;lt;/code&amp;gt;, a program called &amp;lt;code&amp;gt;foobar&amp;lt;/code&amp;gt; may be called as:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
foobar ${input_dir} ${output}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
or with flags:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
foobar -i ${input_dir} -o ${output}&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
If the submission requires additional arguments, such as a model checkpoint path or configuration file, these should be clearly documented in the README file. For example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
python run_instrument_recognition.py -i ${input_dir} -o ${output} --checkpoint model.pt&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== B) The README file ==&lt;br /&gt;
&lt;br /&gt;
Each submission must include a README file containing:&lt;br /&gt;
&lt;br /&gt;
* contact information;&lt;br /&gt;
* installation instructions;&lt;br /&gt;
* software and hardware requirements;&lt;br /&gt;
* instructions for running the submitted system;&lt;br /&gt;
* the exact command line to be used for evaluation;&lt;br /&gt;
* information about required model checkpoints or external files.&lt;br /&gt;
&lt;br /&gt;
The README should include at least one command line containing both &amp;lt;code&amp;gt;${input_dir}&amp;lt;/code&amp;gt; and &amp;lt;code&amp;gt;${output}&amp;lt;/code&amp;gt; so that the evaluation can be run automatically.&lt;br /&gt;
&lt;br /&gt;
== C) System description or extended abstract ==&lt;br /&gt;
&lt;br /&gt;
Participants should submit a short system description or extended abstract. This document should summarize the model architecture, training data, external pretrained models if used, and important preprocessing or post-processing steps.&lt;br /&gt;
&lt;br /&gt;
= Input Data =&lt;br /&gt;
&lt;br /&gt;
Participating systems will receive a directory containing audio files.&lt;br /&gt;
&lt;br /&gt;
The expected input audio format is:&lt;br /&gt;
&lt;br /&gt;
* Audio format: WAV&lt;br /&gt;
* Sample rate: 44.1 kHz, unless otherwise specified&lt;br /&gt;
* Bit depth: 16-bit PCM&lt;br /&gt;
* Number of channels: mono or stereo&lt;br /&gt;
&lt;br /&gt;
The final input format will be confirmed before evaluation.&lt;br /&gt;
&lt;br /&gt;
= Output Data =&lt;br /&gt;
&lt;br /&gt;
The submitted system must produce one output file containing predictions for all input audio files.&lt;br /&gt;
&lt;br /&gt;
The preferred output format is a tab-separated text file. Each line should contain an audio filename, an instrument label, and a confidence score:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
&amp;lt;filename&amp;gt;\t&amp;lt;label&amp;gt;\t&amp;lt;score&amp;gt;&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;code&amp;gt;&amp;lt;score&amp;gt;&amp;lt;/code&amp;gt; is a real-valued confidence score, preferably in the range [0, 1].&lt;br /&gt;
&lt;br /&gt;
Example:&lt;br /&gt;
&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
track_001.wav	piano	0.93&lt;br /&gt;
track_001.wav	violin	0.81&lt;br /&gt;
track_001.wav	drums	0.76&lt;br /&gt;
track_001.wav	guitar	0.12&lt;br /&gt;
track_002.wav	flute	0.88&lt;br /&gt;
track_002.wav	piano	0.64&lt;br /&gt;
track_002.wav	cello	0.21&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The instrument labels in the output must match the official label set exactly. If an audio-file/label pair is missing from the output, it may be treated as having score 0.&lt;br /&gt;
&lt;br /&gt;
Systems are encouraged to output confidence scores for all official labels for each input audio file. If a system only produces binary predictions, it may output 0/1 values instead of continuous confidence scores.&lt;br /&gt;
&lt;br /&gt;
= Evaluation =&lt;br /&gt;
&lt;br /&gt;
The official evaluation is clip-level multi-label instrument recognition.&lt;br /&gt;
&lt;br /&gt;
For each input excerpt, the system predicts a confidence score for each instrument in the official label set. These predictions will be compared with the ground-truth instrument labels for that excerpt.&lt;br /&gt;
&lt;br /&gt;
The primary ranking metric will be:&lt;br /&gt;
&lt;br /&gt;
'''Macro-averaged F1 score'''&lt;br /&gt;
&lt;br /&gt;
Macro-F1 computes F1 separately for each instrument class and then averages across classes.&lt;br /&gt;
&lt;br /&gt;
The following additional metrics may also be reported:&lt;br /&gt;
&lt;br /&gt;
* '''Micro-F1'''&lt;br /&gt;
* '''Mean Average Precision (mAP)''', when confidence scores are available&lt;br /&gt;
* '''Per-instrument precision, recall, and F1'''&lt;br /&gt;
* '''Per-instrument average precision''', when confidence scores are available&lt;br /&gt;
&lt;br /&gt;
The official leaderboard will be determined by clip-level macro-F1 on the hidden evaluation set. Results on any public reference dataset will be reported separately.&lt;br /&gt;
&lt;br /&gt;
For confidence-based outputs, the official evaluation script may use a fixed threshold, a predefined thresholding rule, or threshold-independent metrics such as mAP. The thresholding rule will be specified before evaluation.&lt;br /&gt;
&lt;br /&gt;
= Time and Hardware Limits =&lt;br /&gt;
&lt;br /&gt;
Due to the potentially high number of participants in MIREX audio tasks, runtime and hardware limits may be imposed.&lt;br /&gt;
&lt;br /&gt;
Submissions should be able to run within the limits specified by the task organizers. Submissions that exceed the time limit, require unsupported hardware, or cannot be run according to the provided README may not receive an official result.&lt;br /&gt;
&lt;br /&gt;
Participants should clearly state any special hardware requirements, such as GPU requirements, in the README file.&lt;br /&gt;
&lt;br /&gt;
= Questions? =&lt;br /&gt;
&lt;br /&gt;
For questions about this task, please contact:&lt;br /&gt;
&lt;br /&gt;
* Wenye Ma, Schulich School of Music, McGill University&lt;br /&gt;
* Email: [wenye.ma@mail.mcgill.ca](mailto:wenye.ma@mail.mcgill.ca)&lt;br /&gt;
&lt;br /&gt;
= Bibliography =&lt;br /&gt;
&lt;br /&gt;
1. E. J. Humphrey, S. Durand, and B. McFee, “OpenMIC-2018: An open dataset for multiple instrument recognition,” in ''Proceedings of the 19th International Society for Music Information Retrieval Conference'', Paris, France, 2018, pp. 438–444.&lt;br /&gt;
&lt;br /&gt;
2. J. J. Bosch, J. Janer, F. Fuhrmann, and P. Herrera, “A comparison of sound segregation techniques for predominant instrument recognition in musical audio signals,” in ''Proceedings of the 13th International Society for Music Information Retrieval Conference'', Porto, Portugal, 2012, pp. 559–564.&lt;br /&gt;
&lt;br /&gt;
3. R. M. Bittner, J. Salamon, M. Tierney, M. Mauch, C. Cannam, and J. P. Bello, “MedleyDB: A multitrack dataset for annotation-intensive MIR research,” in ''Proceedings of the 15th International Society for Music Information Retrieval Conference'', Taipei, Taiwan, 2014, pp. 155–160.&lt;br /&gt;
&lt;br /&gt;
4. D. Bogdanov, M. Won, P. Tovstogan, A. Porter, and X. Serra, “The MTG-Jamendo dataset for automatic music tagging,” in ''Machine Learning for Music Discovery Workshop, International Conference on Machine Learning'', Long Beach, CA, USA, 2019.&lt;/div&gt;</summary>
		<author><name>Chestnut</name></author>
		
	</entry>
</feed>