Difference between revisions of "2025:Live Leaderboards"
(→Best Practices) |
|||
Line 36: | Line 36: | ||
==Best Practices== | ==Best Practices== | ||
For tasks that wish to incorporate a live leaderboard, we recommend the following best practices: | For tasks that wish to incorporate a live leaderboard, we recommend the following best practices: | ||
− | * Use a validation or development set (rather than the final test set) during the leaderboard period to prevent overfitting to the test data; | + | * Use a validation or development set (rather than the final test set) during the live leaderboard period to prevent overfitting to the test data; |
* Limit the number of evaluation samples used during the live leaderboard period to reduce computational load; | * Limit the number of evaluation samples used during the live leaderboard period to reduce computational load; | ||
* Re-evaluate all final submissions on the full test set after the live leaderboard closes to ensure fair and consistent benchmarking; | * Re-evaluate all final submissions on the full test set after the live leaderboard closes to ensure fair and consistent benchmarking; | ||
* Communicate with MIREX organizers if you encounter any issues or have suggestions for improving the process. | * Communicate with MIREX organizers if you encounter any issues or have suggestions for improving the process. |
Revision as of 07:48, 9 April 2025
Contents
Task Captain Guide: Live Leaderboards
To foster a more competitive and/or educational environment, some MIREX tasks have adopted live leaderboards. This setup allows participants to receive immediate feedback on their submissions, view their ranking in real time, and iteratively refine and resubmit their models.
This section provides resources and guidance for task captains who are considering adopting a live leaderboard format for their task.
Live Leaderboard Support from External Platforms
Several free third-party platforms support live leaderboard functionality, such as Codabench. You may refer to the following MIREX tasks for examples of how these platforms can be effectively used:
Limitations: Please note that platforms like Codabench do not support evaluation on private test sets. Participants are required to download the test set, perform inference locally, and then upload their prediction results to the platform for evaluation.
If you intend to use private test sets for evaluation, please continue reading the next section for alternative solutions.
Live Leaderboard Support from MIREX Submission Platform
Task captains can create a task with live leaderboard by
- Allowing resubmissions;
- Constantly monitoring the submission portal for new submissions;
- Constantly updating the real-time leaderboard on the MIREX Wiki page.
To make the process automated, the MIREX submission portal will experimental provide support for
- An online repository to receive submissions and their revisions;
- An API for automatically fetching and downloading new submissions;
- A guide for automatically updating the corresponding MIREX Wiki page.
Task captains are still responsible for setting up and maintaining their own evaluation server, which must handle the following:
- Preparing the test set and evaluation metrics;
- Communicating with the MIREX server to retrieve submissions;
- Running evaluations and computing metrics;
- Uploading results to the MIREX Wiki leaderboard.
Best Practices
For tasks that wish to incorporate a live leaderboard, we recommend the following best practices:
- Use a validation or development set (rather than the final test set) during the live leaderboard period to prevent overfitting to the test data;
- Limit the number of evaluation samples used during the live leaderboard period to reduce computational load;
- Re-evaluate all final submissions on the full test set after the live leaderboard closes to ensure fair and consistent benchmarking;
- Communicate with MIREX organizers if you encounter any issues or have suggestions for improving the process.