eval
Browse files
README.md
CHANGED
|
@@ -46,20 +46,21 @@ print(tokenizer.decode(out[0]))
|
|
| 46 |
```
|
| 47 |
|
| 48 |
# Evaluation Snapshot
|
| 49 |
-
| Model | **MC9** | **Gen5** | **MMLU** | **MMLU Pro** | **AGIEval** | **BBH** | **Math2** | **NewsG** | **PoemG** | **SciRIFF5** | **Code4** | **Avg.** |
|
| 50 |
-
|
| 51 |
-
| Prev. Public model | 68.7 | 58.8 | 55.9 | 26.2 | 39.9 | 35.7 | 8.2 | 76.0 | 47.8 | 48.1 | 1.1 |
|
| 52 |
| **Individual** |
|
| 53 |
-
| [Math](https://huggingface.co/allenai/Flex-math-2x7B-1T) | 62.5 | 44.3 | 50.6 | 24.1 | 42.0 | 45.6 | **53.1** | 42.6 | 28.0 | 50.7 | 15.8 |
|
| 54 |
-
| [Code](https://huggingface.co/allenai/Flex-code-2x7B-1T) | 40.5 | 39.4 | 29.5 | 14.5 | 27.4 | 38.1 | 6.0 | 45.1 | 28.2 | 48.0 | 21.0 |
|
| 55 |
-
| Textbook | 64.3 | 52.1 | 56.5 | 27.0 | 39.7 | 40.3 | 13.6 | 57.6 | 51.8 | 51.7 | 3.0 |
|
| 56 |
-
| [News](https://huggingface.co/allenai/Flex-news-2x7B-1T) | 46.5 | 48.6 | 36.4 | 15.2 | 25.7 | 30.9 | 2.5 | 77.7 | 26.9 | 47.0 | 0.0 |
|
| 57 |
-
| [Creative Writing](https://huggingface.co/allenai/Flex-creative-2x7B-1T) | 42.7 | 43.9 | 31.5 | 11.6 | 23.3 | 27.6 | 1.7 | 56.9 | **67.5** | 42.4 | 0.0 |
|
| 58 |
-
| [Academic](https://huggingface.co/allenai/Flex-pes2o-2x7B-1T) | 41.0 | 45.2 | 33.8 | 14.8 | 24.1 | 32.4 | 6.5 | 51.8 | 23.0 | 52.0 | 0.0 |
|
| 59 |
-
| [Reddit](https://huggingface.co/allenai/Flex-reddit-2x7B-1T) | 64.7 | 36.5 | 56.1 | 25.5 | 35.5 | 19.7 | 2.5 | 54.1 | 8.6 | 32.7 | 1.7 |
|
| 60 |
-
| **Combined**
|
| 61 |
-
| BTM (top-2) | 68.7 | 57.7 | 59.4 | 28.3 | 43.2 | 44.3 | 23.1 | 73.6 | 54.4 | 46.3 | **24.0** |
|
| 62 |
-
| 🔥 **FlexOlmo-7x7B-1T** | 70.4 | 60.1 | 60.2 | 30.5 |
|
|
|
|
| 63 |
|
| 64 |
|
| 65 |
# Citation
|
|
|
|
| 46 |
```
|
| 47 |
|
| 48 |
# Evaluation Snapshot
|
| 49 |
+
| **Model** | **MC9** | **Gen5** | **MMLU** | **MMLU Pro** | **AGIEval** | **BBH** | **Math2** | **NewsG** | **PoemG** | **SciRIFF5** | **Code4** | **Avg.** |
|
| 50 |
+
|----------|--------|----------|----------|--------------|-------------|---------|-----------|-----------|-----------|--------------|-----------|----------|
|
| 51 |
+
| Prev. Public model | 68.7 | 58.8 | 55.9 | 26.2 | 39.9 | 35.7 | 8.2 | 76.0 | 47.8 | 48.1 | 1.1 | 42.4 |
|
| 52 |
| **Individual** |
|
| 53 |
+
| [Math](https://huggingface.co/allenai/Flex-math-2x7B-1T) | 62.5 | 44.3 | 50.6 | 24.1 | 42.0 | 45.6 | **53.1** | 42.6 | 28.0 | 50.7 | 15.8 | 41.8 |
|
| 54 |
+
| [Code](https://huggingface.co/allenai/Flex-code-2x7B-1T) | 40.5 | 39.4 | 29.5 | 14.5 | 27.4 | 38.1 | 6.0 | 45.1 | 28.2 | 48.0 | 21.0 | 30.7 |
|
| 55 |
+
| Textbook | 64.3 | 52.1 | 56.5 | 27.0 | 39.7 | 40.3 | 13.6 | 57.6 | 51.8 | 51.7 | 3.0 | 41.6 |
|
| 56 |
+
| [News](https://huggingface.co/allenai/Flex-news-2x7B-1T) | 46.5 | 48.6 | 36.4 | 15.2 | 25.7 | 30.9 | 2.5 | 77.7 | 26.9 | 47.0 | 0.0 | 32.5 |
|
| 57 |
+
| [Creative Writing](https://huggingface.co/allenai/Flex-creative-2x7B-1T) | 42.7 | 43.9 | 31.5 | 11.6 | 23.3 | 27.6 | 1.7 | 56.9 | **67.5** | 42.4 | 0.0 | 31.7 |
|
| 58 |
+
| [Academic](https://huggingface.co/allenai/Flex-pes2o-2x7B-1T) | 41.0 | 45.2 | 33.8 | 14.8 | 24.1 | 32.4 | 6.5 | 51.8 | 23.0 | 52.0 | 0.0 | 29.5 |
|
| 59 |
+
| [Reddit](https://huggingface.co/allenai/Flex-reddit-2x7B-1T) | 64.7 | 36.5 | 56.1 | 25.5 | 35.5 | 19.7 | 2.5 | 54.1 | 8.6 | 32.7 | 1.7 | 30.7 |
|
| 60 |
+
| **Combined** |
|
| 61 |
+
| BTM (top-2) | 68.7 | 57.7 | 59.4 | 28.3 | 43.2 | 44.3 | 23.1 | 73.6 | 54.4 | 46.3 | **24.0** | 47.6 |
|
| 62 |
+
| 🔥 **FlexOlmo-7x7B-1T** | **70.4** | **60.1** | **60.2** | **30.5** | 44.8 | 46.8 | 47.9 | 77.2 | 66.2 | 53.8 | 14.6 | 52.0 |
|
| 63 |
+
| **FlexOlmo-7x7B-1T-RT** | 70.3 | 60.0 | **60.2** | 30.3 | **45.2** | **47.2** | 47.7 | **78.3** | **67.6** | **53.9** | 12.6 | **52.1** |
|
| 64 |
|
| 65 |
|
| 66 |
# Citation
|