voxtral-finetune-20250913_164641
Fine-tuned Voxtral ASR model
Usage
import torch
from transformers import AutoProcessor, AutoModelForSeq2SeqLM
import soundfile as sf
processor = AutoProcessor.from_pretrained("Tonic/voxtral-finetune-20250913_164641")
model = AutoModelForSeq2SeqLM.from_pretrained(
"Tonic/voxtral-finetune-20250913_164641",
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)
audio, sr = sf.read("sample.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
with torch.no_grad():
generated_ids = model.generate(**inputs, max_new_tokens=256)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)
Training Configuration
- Base model: mistralai/Voxtral-Mini-3B-2507
- Config: Custom Configuration
- Trainer: SFTTrainer
Training Parameters
- Batch size: 2
- Grad accumulation: 4
- Learning rate: 5e-05
- Max epochs: 3
- Sequence length: 2048
Hardware
- GPU: NVIDIA RTX 4000 Ada Generation
Notes
- This repository contains a fine-tuned Voxtral ASR model.
- Downloads last month
- 10
Model tree for Tonic/voxtral-finetune-20250913_164641
Base model
mistralai/Voxtral-Mini-3B-2507