Spaces:
Running
Running
Update content.py
Browse files- content.py +2 -2
content.py
CHANGED
|
@@ -12,8 +12,8 @@ Here you can compare models on tasks in Czech language and/or submit your own mo
|
|
| 12 |
- See **About** page for brief description of our evaluation protocol & win score mechanism, citation information, and future directions for this benchmark.
|
| 13 |
- __How scoring works__:
|
| 14 |
- On each task, the __Duel Win Score__ reports proportion of won duels.
|
| 15 |
-
- Category scores are obtained by averaging across category tasks.
|
| 16 |
-
-
|
| 17 |
- All public submissions are shared in [CZLC/LLM_benchmark_data](https://huggingface.co/datasets/CZLC/LLM_benchmark_data) dataset.
|
| 18 |
- In submission page, __you can obtain results on leaderboard without publishing them__.
|
| 19 |
- First step is "pre-submission", and after this is done (significance tests can take up to an hour), the results can be submitted if you'd like to.
|
|
|
|
| 12 |
- See **About** page for brief description of our evaluation protocol & win score mechanism, citation information, and future directions for this benchmark.
|
| 13 |
- __How scoring works__:
|
| 14 |
- On each task, the __Duel Win Score__ reports proportion of won duels.
|
| 15 |
+
- Category scores are obtained by averaging across category tasks. When selecting a category (other then Overall), the "Average" column shows Category Duel Win Scores.
|
| 16 |
+
- __Overall__ Duel Win Scores are an average over category scores. When selecting Overall category, the "Average" column shows Overall Duel Win Score.
|
| 17 |
- All public submissions are shared in [CZLC/LLM_benchmark_data](https://huggingface.co/datasets/CZLC/LLM_benchmark_data) dataset.
|
| 18 |
- In submission page, __you can obtain results on leaderboard without publishing them__.
|
| 19 |
- First step is "pre-submission", and after this is done (significance tests can take up to an hour), the results can be submitted if you'd like to.
|