mjaggi commited on
Commit
a2e02fb
·
verified ·
1 Parent(s): 45da961

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -9
README.md CHANGED
@@ -100,19 +100,36 @@ Apertus by default supports a context length up to 65,536 tokens.
100
 
101
  Apertus supports tool use
102
 
103
- ### vLLM and SGLang
104
 
105
- You can use vLLM and SGLang to deploy the model in an API compatible with OpenAI format.
106
 
107
  ## Evaluation
108
 
109
- In this section, we report the evaluation results of Apertus model.
110
-
111
- ### Base Pre-Trained Model
112
- - see [Apertus_Tech_Report.pdf](https://github.com/swiss-ai/apertus-tech-report/blob/main/Apertus_Tech_Report.pdf)
113
-
114
- ### Instruction Model
115
- - see [Apertus_Tech_Report.pdf](https://github.com/swiss-ai/apertus-tech-report/blob/main/Apertus_Tech_Report.pdf)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
 
117
  ## Training
118
 
 
100
 
101
  Apertus supports tool use
102
 
103
+ ### Deployment
104
 
105
+ Deployment of the models is directly supported by the newest versions of [Transformers](https://github.com/huggingface/transformers), [vLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), and also for running on-device with [MLX](https://github.com/ml-explore/mlx-lm),
106
 
107
  ## Evaluation
108
 
109
+ **Pretraining Evaluation:** Performance (%) of Apertus models on *general language understanding* tasks (higher is better) compared to other pretrained models.
110
+
111
+ | **Model** | **Avg** | **ARC** | **HellaSwag** | **WinoGrande** | **XNLI** | **XCOPA** | **PIQA** |
112
+ | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
113
+ | **Fully Open Models** | | | | | | | |
114
+ | **Apertus-8B** | 65.8 | 72.7 | 59.8 | 70.6 | 45.2 | 66.5 | 79.8 |
115
+ | **Apertus-70B** | 67.5 | 70.6 | 64.0 | 73.3 | 45.3 | 69.8 | 81.9 |
116
+ | OLMo2-7B | 64.0 | 72.9 | 60.4 | 74.5 | 40.4 | 55.2 | 80.9 |
117
+ | OLMo2-32B | 67.7 | 76.2 | 66.7 | 78.6 | 42.9 | 60.1 | 82.1 |
118
+ | EuroLLM-1.7B | 54.8 | 57.2 | 44.9 | 58.1 | 40.7 | 55.7 | 72.4 |
119
+ | EuroLLM-9B | 62.8 | 67.9 | 57.9 | 68.8 | 41.5 | 61.1 | 79.6 |
120
+ | SmolLM2-1.7B | 58.5 | 66.1 | 52.4 | 65.6 | 37.6 | 52.3 | 77.0 |
121
+ | SmolLM3-3B | 61.6 | 68.6 | 56.4 | 68.1 | 40.5 | 58.2 | 77.7 |
122
+ | Poro-34B | 61.7 | 65.7 | 57.9 | 70.6 | 41.6 | 56.0 | 78.5 |
123
+ | **Open-Weight Models** | | | | | | | |
124
+ | Llama3.1-8B | 65.4 | 71.6 | 60.0 | 73.4 | 45.3 | 61.8 | 80.1 |
125
+ | Llama3.1-70B | 67.3 | 74.4 | 56.5 | 79.4 | 44.3 | 66.7 | 82.3 |
126
+ | Qwen2.5-7B | 64.4 | 69.6 | 60.1 | 72.8 | 43.3 | 61.7 | 78.7 |
127
+ | Qwen2.5-72B | 69.8 | 76.2 | 67.5 | 78.0 | 46.9 | 68.2 | 82.0 |
128
+ | Qwen3-32B | 67.8 | 75.6 | 64.0 | 73.8 | 44.4 | 67.9 | 80.9 |
129
+ | Llama4-Scout-16x17B | 67.9 | 74.7 | 66.8 | 73.2 | 43.5 | 67.7 | 81.2 |
130
+ | GPT-OSS-20B | 58.1 | 67.0 | 41.5 | 66.5 | 37.4 | 60.4 | 75.6 |
131
+
132
+ Many additional benchmark evaluations, for pretraining and posttraining phases, multilingual evaluations in around hundred languages, and long context evaluations are provided in Section 5 of the [Apertus_Tech_Report.pdf](https://github.com/swiss-ai/apertus-tech-report/blob/main/Apertus_Tech_Report.pdf)
133
 
134
  ## Training
135