swiss-ai
/

Apertus-70B-Instruct-2509

Text Generation

Model card Files Files and versions

mjaggi commited on Sep 5

Commit

a2e02fb

·

verified ·

1 Parent(s): 45da961

Update README.md

Files changed (1) hide show

README.md +26 -9

README.md CHANGED Viewed

@@ -100,19 +100,36 @@ Apertus by default supports a context length up to 65,536 tokens.
 Apertus supports tool use
-### vLLM and SGLang
-You can use vLLM and SGLang to deploy the model in an API compatible with OpenAI format.
 ## Evaluation
-In this section, we report the evaluation results of Apertus model.
-### Base Pre-Trained Model
-- see [Apertus_Tech_Report.pdf](https://github.com/swiss-ai/apertus-tech-report/blob/main/Apertus_Tech_Report.pdf)
-### Instruction Model
-- see [Apertus_Tech_Report.pdf](https://github.com/swiss-ai/apertus-tech-report/blob/main/Apertus_Tech_Report.pdf)
 ## Training

 Apertus supports tool use
+### Deployment
+Deployment of the models is directly supported by the newest versions of [Transformers](https://github.com/huggingface/transformers), [vLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), and also for running on-device with [MLX](https://github.com/ml-explore/mlx-lm),
 ## Evaluation
+**Pretraining Evaluation:** Performance (%) of Apertus models on *general language understanding* tasks (higher is better) compared to other pretrained models.
+| **Model** | **Avg** | **ARC** | **HellaSwag** | **WinoGrande** | **XNLI** | **XCOPA** | **PIQA** |
+| :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
+| **Fully Open Models** | | | | | | | |
+| **Apertus-8B** | 65.8 | 72.7 | 59.8 | 70.6 | 45.2 | 66.5 | 79.8 |
+| **Apertus-70B** | 67.5 | 70.6 | 64.0 | 73.3 | 45.3 | 69.8 | 81.9 |
+| OLMo2-7B | 64.0 | 72.9 | 60.4 | 74.5 | 40.4 | 55.2 | 80.9 |
+| OLMo2-32B | 67.7 | 76.2 | 66.7 | 78.6 | 42.9 | 60.1 | 82.1 |
+| EuroLLM-1.7B | 54.8 | 57.2 | 44.9 | 58.1 | 40.7 | 55.7 | 72.4 |
+| EuroLLM-9B | 62.8 | 67.9 | 57.9 | 68.8 | 41.5 | 61.1 | 79.6 |
+| SmolLM2-1.7B | 58.5 | 66.1 | 52.4 | 65.6 | 37.6 | 52.3 | 77.0 |
+| SmolLM3-3B | 61.6 | 68.6 | 56.4 | 68.1 | 40.5 | 58.2 | 77.7 |
+| Poro-34B | 61.7 | 65.7 | 57.9 | 70.6 | 41.6 | 56.0 | 78.5 |
+| **Open-Weight Models** | | | | | | | |
+| Llama3.1-8B | 65.4 | 71.6 | 60.0 | 73.4 | 45.3 | 61.8 | 80.1 |
+| Llama3.1-70B | 67.3 | 74.4 | 56.5 | 79.4 | 44.3 | 66.7 | 82.3 |
+| Qwen2.5-7B | 64.4 | 69.6 | 60.1 | 72.8 | 43.3 | 61.7 | 78.7 |
+| Qwen2.5-72B | 69.8 | 76.2 | 67.5 | 78.0 | 46.9 | 68.2 | 82.0 |
+| Qwen3-32B | 67.8 | 75.6 | 64.0 | 73.8 | 44.4 | 67.9 | 80.9 |
+| Llama4-Scout-16x17B | 67.9 | 74.7 | 66.8 | 73.2 | 43.5 | 67.7 | 81.2 |
+| GPT-OSS-20B | 58.1 | 67.0 | 41.5 | 66.5 | 37.4 | 60.4 | 75.6 |
+Many additional benchmark evaluations, for pretraining and posttraining phases, multilingual evaluations in around hundred languages, and long context evaluations are provided in Section 5 of the [Apertus_Tech_Report.pdf](https://github.com/swiss-ai/apertus-tech-report/blob/main/Apertus_Tech_Report.pdf)
 ## Training