Update README.md
Browse files
README.md
CHANGED
|
@@ -24,14 +24,20 @@ tags:
|
|
| 24 |
It provides **streaming transcription**, **word timestamps**, and **scalable performance** for use cases like real-time captioning, meetings, and on-device voice interfaces.
|
| 25 |
|
| 26 |
|
| 27 |
-
## 📊
|
| 28 |
|
| 29 |
-
TheWhisper is a fine-tuned Whisper model that can process audio chunks of any size up to 30 seconds. Unlike the original Whisper models, it doesn't require padding audio with silence to reach 30 seconds. We conducted quality benchmarking across different chunk sizes: 10, 15, 20, and 30 seconds.
|
|
|
|
| 30 |
|
| 31 |
<img width="1547" height="531" alt="vanilla whisper (1)" src="https://github.com/user-attachments/assets/f0c86e58-d834-4ac7-a06b-df3a7ae3e9e9" />
|
| 32 |
<img width="1547" height="458" alt="TheStage AI Whisper (1)" src="https://github.com/user-attachments/assets/17fb45a3-b33d-4c83-b843-69b0f0aa3f65" />
|
| 33 |
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
### 10s chunks
|
| 36 |
|
| 37 |
| Model | Mean WER |
|
|
|
|
| 24 |
It provides **streaming transcription**, **word timestamps**, and **scalable performance** for use cases like real-time captioning, meetings, and on-device voice interfaces.
|
| 25 |
|
| 26 |
|
| 27 |
+
## 📊 Benchmarks
|
| 28 |
|
| 29 |
+
TheWhisper is a fine-tuned Whisper model that can process audio chunks of any size up to 30 seconds. Unlike the original Whisper models, it doesn't require padding audio with silence to reach 30 seconds. We conducted quality benchmarking across different chunk sizes: 10, 15, 20, and 30 seconds.
|
| 30 |
+
For quality benchmarks, we used the multilingual benchmarks [Open ASR Leaderboard](https://github.com/huggingface/open_asr_leaderboard#evaluate-a-model).
|
| 31 |
|
| 32 |
<img width="1547" height="531" alt="vanilla whisper (1)" src="https://github.com/user-attachments/assets/f0c86e58-d834-4ac7-a06b-df3a7ae3e9e9" />
|
| 33 |
<img width="1547" height="458" alt="TheStage AI Whisper (1)" src="https://github.com/user-attachments/assets/17fb45a3-b33d-4c83-b843-69b0f0aa3f65" />
|
| 34 |
|
| 35 |
|
| 36 |
+
<img width="1547" height="531" src="https://cdn.thestage.ai/production/cms_file_upload/1764602147-b10162ae-e6f7-4307-bcb0-54b94528221c/NVIDIA, RTX-5090 (1).png">
|
| 37 |
+
|
| 38 |
+
For comprehensive performance and quality benchmarks see [TheWhisper](https://github.com/TheStageAI/TheWhisper/blob/main/benchmark/README.md).
|
| 39 |
+
|
| 40 |
+
|
| 41 |
### 10s chunks
|
| 42 |
|
| 43 |
| Model | Mean WER |
|