EAGLE3-Apertus-8B-Instruct-2509
An Eagle3 draft model for speculative decoding with swiss-ai/Apertus-8B-Instruct-2509.
Model Description
This is a lightweight draft model trained to accelerate inference of Apertus-8B-Instruct through speculative decoding. Eagle3 uses a single-layer architecture that predicts future tokens by leveraging the target model's hidden states.
| Property | Value |
|---|---|
| Architecture | LlamaForCausalLMEagle3 |
| Hidden Size | 4096 |
| Intermediate Size | 21504 |
| Attention Heads | 32 |
| KV Heads | 8 |
| Layers | 1 |
| Vocab Size | 131,072 |
| Draft Vocab Size | 32,000 |
| Precision | bfloat16 |
| Parameters | ~513M |
Training Details
- Framework: SpecForge
- Target Model: swiss-ai/Apertus-8B-Instruct-2509
- Epochs: 10
- Batch Size: 1 per GPU
- Learning Rate: 1e-4
- Max Sequence Length: 4096
- Hardware: 64 GPUs (16 nodes 脳 4 GPUs)
- Precision: bfloat16
Training Data
The model was trained on ~375k samples of regenerated conversation data. The dataset consists of prompts from:
The responses were regenerated using Apertus-8B-Instruct-2509 to ensure the draft model learns from the target model's own output distribution.
See: thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data
Usage
With vLLM
VLLM_USE_V1=1 vllm serve swiss-ai/Apertus-8B-Instruct-2509 \
--speculative-config '{"model": "thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509", "num_speculative_tokens": 3, "method": "eagle3"}'
Or in Python:
from vllm import LLM, SamplingParams
llm = LLM(
model="swiss-ai/Apertus-8B-Instruct-2509",
speculative_config={
"model": "thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509",
"num_speculative_tokens": 3,
"method": "eagle3",
},
)
sampling_params = SamplingParams(temperature=0.7, max_tokens=256)
outputs = llm.generate(["Hello, how are you?"], sampling_params)
print(outputs[0].outputs[0].text)
With SGLang
python -m sglang.launch_server \
--model swiss-ai/Apertus-8B-Instruct-2509 \
--speculative-algorithm EAGLE3 \
--speculative-draft-model-path thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509 \
--speculative-num-steps 5 \
--speculative-eagle-topk 8 \
--speculative-num-draft-tokens 32
Continue Training
To resume training from this checkpoint:
- Clone SpecForge
- Download the training dataset from thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509-Data
- Download this checkpoint and place it in a subdirectory of your output directory (e.g.,
outputs/apertus-8b-eagle3/epoch_9_step_55000/) - Run with
--resume(it will automatically find the last checkpoint in--output-dir):
NUM_GPUS=4
TP_SIZE=1
torchrun \
--standalone \
--nproc_per_node $NUM_GPUS \
scripts/train_eagle3.py \
--target-model-path swiss-ai/Apertus-8B-Instruct-2509 \
--draft-model-config /path/to/configs/apertus-8b-eagle3.json \
--train-data-path /path/to/merged_train_regen.jsonl \
--output-dir /path/to/outputs/apertus-8b-eagle3 \
--num-epochs 15 \
--batch-size 1 \
--tp-size $TP_SIZE \
--learning-rate 1e-4 \
--max-length 4096 \
--chat-template apertus \
--cache-dir /path/to/cache \
--target-model-backend sglang \
--resume
The --resume flag uses get_last_checkpoint() to automatically find the most recent checkpoint in the output directory.
License
Apache 2.0
Citation
If you use this model, please cite Eagle3:
@article{li2025eagle3,
title={Eagle 3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test},
author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
journal={arXiv preprint arXiv:2503.01840},
year={2025}
}
Acknowledgments
Trained on the Alps supercomputer at CSCS (Swiss National Supercomputing Centre).
- Downloads last month
- 85
Model tree for thomaskiefer/EAGLE3-Apertus-8B-Instruct-2509
Base model
swiss-ai/Apertus-8B-2509