Whisper Large V3 Turbo - Pruna Smashed

Pruna-optimized version of Whisper Large V3 Turbo.
Compressed with c_whisper compiler for faster inference and lower VRAM usage, while maintaining the same transcription quality.


πŸ“Œ Usage

Best performance (Pruna runtime):

from pruna import PrunaModel

model = PrunaModel.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
result = model("audio.wav")

Standard Transformers:

from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor

model = AutoModelForSpeechSeq2Seq.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")
processor = AutoProcessor.from_pretrained("manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed")

βœ… Tested on Google Colab T4 GPU


πŸ“Š Evaluation Results

Dataset: librispeech_asr test-clean (15%) Device: T4 GPU

Accuracy

  • WER: 3.49%
  • CER: 1.32%

Performance

  • Avg inference time: 0.688s
  • P95 inference time: 1.057s
  • Throughput: 1.38 samples/sec

Resource Usage

  • Peak GPU memory: 2.48 GB
  • Final GPU utilization: 15%
  • Final RAM usage: 49.4%

πŸš€ Scalability Test

Successfully transcribed 2 hours of audio (sam_altman_lex_podcast_367.flac) in under 3 minutes using minimal GPU.


πŸ”§ Notes

  • Use the Pruna runtime for maximum efficiency.
  • Works with both transformers and pruna APIs.
  • Optimized for low VRAM environments without loss in accuracy.
Downloads last month
1
Safetensors
Model size
0.8B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for manohar03/unsloth-whisper-large-v3-turbo-pruna-smashed

Finetuned
(53)
this model