Apertus-8B European Multilingual SFT Checkpoint (Step 80,000)

FSDP checkpoint for resuming multilingual SFT training on Apertus-8B.

Training Progress

Metric	Value
Global Step	80,000 / 256,137
Epoch	0.312 (31.2%)
Samples Processed	2,560,000 / 8,139,164
Loss	0.73
Accuracy	78.4%

Per-Language Position

Language	Samples Seen	Total	Progress
German (de)	~640,000	2,018,145	31.7%
Spanish (es)	~640,000	2,050,976	31.2%
French (fr)	~640,000	2,045,181	31.3%
Italian (it)	~640,000	2,024,862	31.6%

Training Configuration

model = "swiss-ai/Apertus-8B-Instruct-2509"
per_device_train_batch_size = 1
gradient_accumulation_steps = 4
learning_rate = 2e-6
num_train_epochs = 1
warmup_ratio = 0.03
lr_scheduler_type = "linear"
bf16 = True
gradient_checkpointing = True

Checkpoint Contents

checkpoint-80000/
├── pytorch_model_fsdp_0/     # FSDP sharded model weights
├── optimizer_0/              # Optimizer states  
├── rng_state_[0-7].pth      # RNG states for 8 GPUs
├── scheduler.pt              # LR scheduler state
└── trainer_state.json        # Step, epoch, metrics

How to Resume Training

from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Resume from checkpoint
trainer.train(resume_from_checkpoint="./checkpoint-80000")

Dataset

Pre-tokenized Arrow datasets with interleaved sampling from 4 European languages:

Total: 8,139,164 samples
Format: input_ids, labels, attention_mask
Sequence Length: Variable (pre-tokenized)

Notes

This is an FSDP sharded checkpoint (8 GPU training)
Includes RNG states for exact dataloader position resumption
~15 days of training remaining to complete epoch 1

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ctauchmann/apertus-8b-eu-sft-ckpt-80k

Base model

swiss-ai/Apertus-8B-2509

Finetuned

swiss-ai/Apertus-8B-Instruct-2509

Finetuned

(13)

this model