Apertus-8B European Multilingual SFT Checkpoint (Step 80,000)

FSDP checkpoint for resuming multilingual SFT training on Apertus-8B.

Training Progress

Metric Value
Global Step 80,000 / 256,137
Epoch 0.312 (31.2%)
Samples Processed 2,560,000 / 8,139,164
Loss 0.73
Accuracy 78.4%

Per-Language Position

Language Samples Seen Total Progress
German (de) ~640,000 2,018,145 31.7%
Spanish (es) ~640,000 2,050,976 31.2%
French (fr) ~640,000 2,045,181 31.3%
Italian (it) ~640,000 2,024,862 31.6%

Training Configuration

model = "swiss-ai/Apertus-8B-Instruct-2509"
per_device_train_batch_size = 1
gradient_accumulation_steps = 4
learning_rate = 2e-6
num_train_epochs = 1
warmup_ratio = 0.03
lr_scheduler_type = "linear"
bf16 = True
gradient_checkpointing = True

Checkpoint Contents

checkpoint-80000/
β”œβ”€β”€ pytorch_model_fsdp_0/     # FSDP sharded model weights
β”œβ”€β”€ optimizer_0/              # Optimizer states  
β”œβ”€β”€ rng_state_[0-7].pth      # RNG states for 8 GPUs
β”œβ”€β”€ scheduler.pt              # LR scheduler state
└── trainer_state.json        # Step, epoch, metrics

How to Resume Training

from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=dataset,
)

# Resume from checkpoint
trainer.train(resume_from_checkpoint="./checkpoint-80000")

Dataset

Pre-tokenized Arrow datasets with interleaved sampling from 4 European languages:

  • Total: 8,139,164 samples
  • Format: input_ids, labels, attention_mask
  • Sequence Length: Variable (pre-tokenized)

Notes

  • This is an FSDP sharded checkpoint (8 GPU training)
  • Includes RNG states for exact dataloader position resumption
  • ~15 days of training remaining to complete epoch 1
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for ctauchmann/apertus-8b-eu-sft-ckpt-80k

Finetuned
(13)
this model