TheStageAI
/

thewhisper-large-v3-turbo

Automatic Speech Recognition

Model card Files Files and versions

quazim commited on 2 days ago

Commit

01441ad

·

verified ·

1 Parent(s): 67c4538

Update README.md

Files changed (1) hide show

README.md +8 -2

README.md CHANGED Viewed

@@ -24,14 +24,20 @@ tags:
 It provides **streaming transcription**, **word timestamps**, and **scalable performance** for use cases like real-time captioning, meetings, and on-device voice interfaces.
-## 📊 Quality Benchmarks
-TheWhisper is a fine-tuned Whisper model that can process audio chunks of any size up to 30 seconds. Unlike the original Whisper models, it doesn't require padding audio with silence to reach 30 seconds. We conducted quality benchmarking across different chunk sizes: 10, 15, 20, and 30 seconds. For quality benchmarks, we used the multilingual benchmarks [Open ASR Leaderboard](https://github.com/huggingface/open_asr_leaderboard#evaluate-a-model).
 <img width="1547" height="531" alt="vanilla whisper (1)" src="https://github.com/user-attachments/assets/f0c86e58-d834-4ac7-a06b-df3a7ae3e9e9" />
 <img width="1547" height="458" alt="TheStage AI Whisper (1)" src="https://github.com/user-attachments/assets/17fb45a3-b33d-4c83-b843-69b0f0aa3f65" />
 ### 10s chunks
 | Model | Mean WER |

 It provides **streaming transcription**, **word timestamps**, and **scalable performance** for use cases like real-time captioning, meetings, and on-device voice interfaces.
+## 📊 Benchmarks
+TheWhisper is a fine-tuned Whisper model that can process audio chunks of any size up to 30 seconds. Unlike the original Whisper models, it doesn't require padding audio with silence to reach 30 seconds. We conducted quality benchmarking across different chunk sizes: 10, 15, 20, and 30 seconds.
+For quality benchmarks, we used the multilingual benchmarks [Open ASR Leaderboard](https://github.com/huggingface/open_asr_leaderboard#evaluate-a-model).
 <img width="1547" height="531" alt="vanilla whisper (1)" src="https://github.com/user-attachments/assets/f0c86e58-d834-4ac7-a06b-df3a7ae3e9e9" />
 <img width="1547" height="458" alt="TheStage AI Whisper (1)" src="https://github.com/user-attachments/assets/17fb45a3-b33d-4c83-b843-69b0f0aa3f65" />
+<img width="1547" height="531" src="https://cdn.thestage.ai/production/cms_file_upload/1764602147-b10162ae-e6f7-4307-bcb0-54b94528221c/NVIDIA, RTX-5090 (1).png">
+For comprehensive performance and quality benchmarks see [TheWhisper](https://github.com/TheStageAI/TheWhisper/blob/main/benchmark/README.md).
 ### 10s chunks
 | Model | Mean WER |