voxtral-finetune-20250913_164641

Fine-tuned Voxtral ASR model

Usage

import torch
from transformers import AutoProcessor, AutoModelForSeq2SeqLM
import soundfile as sf

processor = AutoProcessor.from_pretrained("Tonic/voxtral-finetune-20250913_164641")
model = AutoModelForSeq2SeqLM.from_pretrained(
    "Tonic/voxtral-finetune-20250913_164641",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32
)

audio, sr = sf.read("sample.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
with torch.no_grad():
    generated_ids = model.generate(**inputs, max_new_tokens=256)
text = processor.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(text)

Training Configuration

  • Base model: mistralai/Voxtral-Mini-3B-2507
  • Config: Custom Configuration
  • Trainer: SFTTrainer

Training Parameters

  • Batch size: 2
  • Grad accumulation: 4
  • Learning rate: 5e-05
  • Max epochs: 3
  • Sequence length: 2048

Hardware

  • GPU: NVIDIA RTX 4000 Ada Generation

Notes

  • This repository contains a fine-tuned Voxtral ASR model.
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Tonic/voxtral-finetune-20250913_164641

Adapter
(9)
this model