quazim commited on
Commit
01441ad
·
verified ·
1 Parent(s): 67c4538

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -2
README.md CHANGED
@@ -24,14 +24,20 @@ tags:
24
  It provides **streaming transcription**, **word timestamps**, and **scalable performance** for use cases like real-time captioning, meetings, and on-device voice interfaces.
25
 
26
 
27
- ## 📊 Quality Benchmarks
28
 
29
- TheWhisper is a fine-tuned Whisper model that can process audio chunks of any size up to 30 seconds. Unlike the original Whisper models, it doesn't require padding audio with silence to reach 30 seconds. We conducted quality benchmarking across different chunk sizes: 10, 15, 20, and 30 seconds. For quality benchmarks, we used the multilingual benchmarks [Open ASR Leaderboard](https://github.com/huggingface/open_asr_leaderboard#evaluate-a-model).
 
30
 
31
  <img width="1547" height="531" alt="vanilla whisper (1)" src="https://github.com/user-attachments/assets/f0c86e58-d834-4ac7-a06b-df3a7ae3e9e9" />
32
  <img width="1547" height="458" alt="TheStage AI Whisper (1)" src="https://github.com/user-attachments/assets/17fb45a3-b33d-4c83-b843-69b0f0aa3f65" />
33
 
34
 
 
 
 
 
 
35
  ### 10s chunks
36
 
37
  | Model | Mean WER |
 
24
  It provides **streaming transcription**, **word timestamps**, and **scalable performance** for use cases like real-time captioning, meetings, and on-device voice interfaces.
25
 
26
 
27
+ ## 📊 Benchmarks
28
 
29
+ TheWhisper is a fine-tuned Whisper model that can process audio chunks of any size up to 30 seconds. Unlike the original Whisper models, it doesn't require padding audio with silence to reach 30 seconds. We conducted quality benchmarking across different chunk sizes: 10, 15, 20, and 30 seconds.
30
+ For quality benchmarks, we used the multilingual benchmarks [Open ASR Leaderboard](https://github.com/huggingface/open_asr_leaderboard#evaluate-a-model).
31
 
32
  <img width="1547" height="531" alt="vanilla whisper (1)" src="https://github.com/user-attachments/assets/f0c86e58-d834-4ac7-a06b-df3a7ae3e9e9" />
33
  <img width="1547" height="458" alt="TheStage AI Whisper (1)" src="https://github.com/user-attachments/assets/17fb45a3-b33d-4c83-b843-69b0f0aa3f65" />
34
 
35
 
36
+ <img width="1547" height="531" src="https://cdn.thestage.ai/production/cms_file_upload/1764602147-b10162ae-e6f7-4307-bcb0-54b94528221c/NVIDIA, RTX-5090 (1).png">
37
+
38
+ For comprehensive performance and quality benchmarks see [TheWhisper](https://github.com/TheStageAI/TheWhisper/blob/main/benchmark/README.md).
39
+
40
+
41
  ### 10s chunks
42
 
43
  | Model | Mean WER |