Difference between revisions of "2024:Cover Song Identification"
(→Description) |
(→Data) |
||
Line 32: | Line 32: | ||
* Size: 539 tracks | * Size: 539 tracks | ||
* Queries: 539 tracks | * Queries: 539 tracks | ||
+ | |||
+ | == Evaluation == | ||
+ | The following evaluation metrics will be computed for each submission: | ||
+ | * Total number of covers identified in top 10 | ||
+ | * Mean number of covers identified in top 10 (average performance) | ||
+ | * Mean (arithmetic) of Avg. Precisions | ||
+ | * Mean rank of first correctly identified cover | ||
+ | |||
+ | |||
+ | === Ranking and significance testing === | ||
+ | Friedman's ANOVA with Tukey-Kramer HSD will be run against the Average Precision summary data over the individual song groups to assess the significance of differences in performance and to rank the performances. | ||
+ | |||
+ | For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see: | ||
+ | @InProceedings{jones2007hsj, | ||
+ | title={"Human Similarity Judgements: Implications for the Design of Formal Evaluations"}, | ||
+ | author="M.C. Jones and J.S. Downie and A.F. Ehmann", | ||
+ | BOOKTITLE ="Proceedings of ISMIR 2007 International Society of Music Information Retrieval", | ||
+ | year="2007" | ||
+ | } | ||
+ | |||
+ | |||
+ | === Runtime performance === | ||
+ | In addition computation times for feature extraction and training/classification will be measured. |
Revision as of 21:47, 25 August 2024
Contents
Description
This task requires that algorithms identify, for a query audio track, other recordings of the same composition, or "cover songs".
Within the a collection of pieces in the cover song datasets, there are embedded a number of different "original songs" or compositions each represented by a number of different "versions". The "cover songs" or "versions" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.
Using each of these version files in turn as as the "seed/query" file, we examine the returned ranked lists of items from each algorithm for the presence of the other versions of the "seed/query" file.
Two datasets are used in this task, the MIREX 2006 US Pop Music Cover Song dataset Audio Cover Song dataset the Mazurka dataset.
Task specific mailing list
In the past we have use a specific mailing list for the discussion of this task and related tasks. This year, however, we are asking that all discussions take place on the MIREX "EvalFest" list. If you have an question or comment, simply include the task name in the subject heading.
Data
Two datasets will be used to evaluate cover song identification:
US Pop Music Collection Cover Song (aka Mixed Collection)
This is the "original" ACS collection. Within the 1000 pieces in the Audio Cover Song database, there are embedded 30 different "cover songs" each represented by 11 different "versions" for a total of 330 audio files.
Using each of these cover song files in turn as as the "seed/query" file, we will examine the returned lists of items for the presence of the other 10 versions of the "seed/query" file.
Collection statistics:
- 16bit, monophonic, 22.05khz, wav
- The "cover songs" represent a variety of genres (e.g., classical, jazz, gospel, rock, folk-rock, etc.) and the variations span a variety of styles and orchestrations.
- Size: 1000 tracks
- Queries: 330 tracks
Sapp's Mazurka Collection Information
In addition to our original ACS dataset, we used the Mazurka.org dataset put together by Craig Sapp. We randomly chose 11 versions from 49 mazurkas and ran it as a separate ACS subtask. Systems should return a distance matrix of 539x539 from which we located the ranks of each of the associated cover versions.
Collection statistics:
- 16bit, monophonic, 22.05khz, wav
- Size: 539 tracks
- Queries: 539 tracks
Evaluation
The following evaluation metrics will be computed for each submission:
- Total number of covers identified in top 10
- Mean number of covers identified in top 10 (average performance)
- Mean (arithmetic) of Avg. Precisions
- Mean rank of first correctly identified cover
Ranking and significance testing
Friedman's ANOVA with Tukey-Kramer HSD will be run against the Average Precision summary data over the individual song groups to assess the significance of differences in performance and to rank the performances.
For further details on the use of Friedman's ANOVA with Tukey-Kramer HSD in MIR, please see:
@InProceedings{jones2007hsj, title={"Human Similarity Judgements: Implications for the Design of Formal Evaluations"}, author="M.C. Jones and J.S. Downie and A.F. Ehmann", BOOKTITLE ="Proceedings of ISMIR 2007 International Society of Music Information Retrieval", year="2007" }
Runtime performance
In addition computation times for feature extraction and training/classification will be measured.