MMLU - 77
#3
by
orendar
- opened
New open-source SOTA!
Just ran 5-shot MMLU with lm-evaluation-harness, see results:
| Groups | Version | Filter | n-shot | Metric | Value | Stderr | |
|---|---|---|---|---|---|---|---|
| mmlu | N/A | none | 0 | acc | 0.7735 | ± | 0.0034 |
| - humanities | N/A | none | 5 | acc | 0.7337 | ± | 0.0062 |
| - other | N/A | none | 5 | acc | 0.8182 | ± | 0.0067 |
| - social_sciences | N/A | none | 5 | acc | 0.8687 | ± | 0.0060 |
| - stem | N/A | none | 5 | acc | 0.6958 | ± | 0.0078 |
It's even better than mistral-medium. Complete set of benchmarks - https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1/discussions/4