Difference between revisions of "2025:Music Reasoning QA Results"

From MIREX Wiki
(Created page with "=MMAR Results= {| class="wikitable" style="vertical-align:bottom;" |- style="font-weight:bold;" ! System ! Methods Used ! style="text-align:right;" | ACC ! style="text-align:...")
 
(MMAR Results)
 
(4 intermediate revisions by the same user not shown)
Line 12: Line 12:
 
|-
 
|-
 
| Baseline 1
 
| Baseline 1
| SAR-LM
+
| SAR-LM (w/ Qwen2.5-Omni)
 
| style="text-align:right;" | 40.00%
 
| style="text-align:right;" | 40.00%
 
| style="text-align:right;" | 33.98%
 
| style="text-align:right;" | 33.98%
Line 20: Line 20:
 
|-
 
|-
 
| Baseline 2
 
| Baseline 2
| Qwen-Omni2.5
+
| Qwen2.5-Omni
 
| style="text-align:right;" | 56.70%
 
| style="text-align:right;" | 56.70%
 
| style="text-align:right;" | 40.78%
 
| style="text-align:right;" | 40.78%
Line 26: Line 26:
 
| style="text-align:right;" | 67.07%
 
| style="text-align:right;" | 67.07%
 
| style="text-align:right;" | 58.33%
 
| style="text-align:right;" | 58.33%
 +
|-
 +
| Baseline 3
 +
| SAR-LM (w/ Gemini)
 +
| style="text-align:right;" | TBA
 +
| style="text-align:right;" | TBA
 +
| style="text-align:right;" | TBA
 +
| style="text-align:right;" | TBA
 +
| style="text-align:right;" | TBA
 
|}
 
|}
  
Line 38: Line 46:
 
|-
 
|-
 
| Baseline 1
 
| Baseline 1
| SAR-LM
+
| SAR-LM (w/ Qwen2.5-Omni)
 
| style="text-align:right;" | 31.26%
 
| style="text-align:right;" | 31.26%
 
| style="text-align:right;" | 41.50%
 
| style="text-align:right;" | 41.50%

Latest revision as of 17:44, 13 September 2025

MMAR Results

System Methods Used ACC music ACC mix-sound-music mix-music-speech mix-sound-music-speech
Baseline 1 SAR-LM (w/ Qwen2.5-Omni) 40.00% 33.98% 27.27% 48.78% 37.50%
Baseline 2 Qwen2.5-Omni 56.70% 40.78% 54.55% 67.07% 58.33%
Baseline 3 SAR-LM (w/ Gemini) TBA TBA TBA TBA TBA

OMniBench Results

System Methods Used ACC music ACC
Baseline 1 SAR-LM (w/ Qwen2.5-Omni) 31.26% 41.50%
Baseline 2 Qwen2-Audio-7B-Instruct 40.72% 38.68%