Whisper Large V3 Turbo - Pruna Smashed

Pruna-optimized version of Whisper Large V3 Turbo.
Compressed with c_whisper compiler for faster inference and lower VRAM usage, while maintaining the same transcription quality.

📌 Usage

Best performance (Pruna runtime):

from pruna import PrunaModel

model = PrunaModel.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
result = model("audio.wav")

Standard Transformers:

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")

✅ Tested on Google Colab T4 GPU

📊 Evaluation Results

Dataset: librispeech_asr test-clean (15%) Device: T4 GPU

Accuracy

WER: 3.49%
CER: 1.32%

Performance

Avg inference time: 0.688s
P95 inference time: 1.057s
Throughput: 1.38 samples/sec

Resource Usage

Peak GPU memory: 2.48 GB
Final GPU utilization: 15%
Final RAM usage: 49.4%

🚀 Scalability Test

Successfully transcribed 2 hours of audio (sam_altman_lex_podcast_367.flac) in under 3 minutes using minimal GPU.

🔧 Notes

Use the Pruna runtime for maximum efficiency.
Works with both transformers and pruna APIs.
Optimized for low VRAM environments without loss in accuracy.

Downloads last month: 1

Safetensors

Model size

0.8B params

Tensor type

F16

Model tree for manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed

Base model

openai/whisper-large-v3

Finetuned

unsloth/whisper-large-v3-turbo

Finetuned

(53)

this model