WJ88 commited on
Commit
97ee8eb
Β·
verified Β·
1 Parent(s): 773d3e5

Removed demo file from README and removed INT8 references from README for now

Browse files

INT8 feature was not working fully, need to investigate more before applying again to this repository.

Files changed (1) hide show
  1. README.md +5 -11
README.md CHANGED
@@ -1,5 +1,5 @@
1
  ---
2
- title: NVIDIA Parakeet TDT 0.6B V2 INT8 Real Time Mic Transcription
3
  emoji: πŸ“Š
4
  colorFrom: purple
5
  colorTo: blue
@@ -17,7 +17,6 @@ tags:
17
  - speech-recognition
18
  - asr
19
  - real-time
20
- - int8
21
  - cpu
22
  - nvidia
23
  - parakeet
@@ -30,10 +29,10 @@ tags:
30
  - huggingface
31
  ---
32
 
33
- # 🦜 NVIDIA Parakeet-TDT-0.6B-v2 (INT8) β€” CPU-Only Streaming ASR
34
 
35
  **Real-time English speech-to-text in your browser β€” no GPU required.**
36
- This Space runs the 600 M-parameter [`nvidia/parakeet-tdt-0.6b-v2`](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) model with dynamic **INT8 quantization** so it fits comfortably on the **CPU Basic (2 vCPU)** tier.
37
 
38
  ## πŸš€ Quick Start
39
  1. Click **β€œRecord”**
@@ -42,10 +41,6 @@ This Space runs the 600 M-parameter [`nvidia/parakeet-tdt-0.6b-v2`](https://hugg
42
 
43
  > **Stalled UI?** Refresh the browser tab β€” this fully restarts the Space and clears any stuck threads.
44
 
45
- <video src="https://huggingface.co/spaces/WJ88/NVIDIA-Parakeet-TDT-0.6B-v2-INT8-Real-Time-Mic-Transcription/resolve/main/demo0__5-24-2025.mp4" controls style="max-width: 100%; height: auto;">
46
- Your browser does not support the video tag.
47
- </video>
48
-
49
  ## πŸ”§ Build on This
50
  - **Duplicate** the Space (button at the top-right) to kick-start your own ASR ideas.
51
  - Swap in another NeMo/HF model β€” the quantization + streaming scaffold is ready.
@@ -54,9 +49,8 @@ This Space runs the 600 M-parameter [`nvidia/parakeet-tdt-0.6b-v2`](https://hugg
54
  ## βš™οΈ Under the Hood
55
  | Technique | Why it matters |
56
  |-----------|----------------|
57
- | **Dynamic INT8 quantization** (`torch.quantization.quantize_dynamic`) | ~4Γ— smaller, faster CPU inference with minimal accuracy loss |
58
  | **`OMP_NUM_THREADS=2` & `torch.set_num_threads(2)`** | Matches the 2 vCPUs for optimal throughput |
59
- | **FBGEMM backend** | Fastest INT8 kernels on x86 |
60
  | **4-second streaming window** | Low latency & small memory footprint |
61
  | **Gradio `stream_every=0.5`** | Updates the transcript twice per second for real-time feel |
62
 
@@ -70,4 +64,4 @@ Feel free to browse `app.py` for the full implementation.
70
 
71
  If you redistribute transcripts or fine-tuned weights, please retain the CC-BY-4.0 attribution notice.
72
 
73
- ⭐ **If this Space helps you, please give it a like and share your feedback!**
 
1
  ---
2
+ title: NVIDIA Parakeet TDT 0.6B V2 Real Time Mic Transcription
3
  emoji: πŸ“Š
4
  colorFrom: purple
5
  colorTo: blue
 
17
  - speech-recognition
18
  - asr
19
  - real-time
 
20
  - cpu
21
  - nvidia
22
  - parakeet
 
29
  - huggingface
30
  ---
31
 
32
+ # 🦜 NVIDIA Parakeet-TDT-0.6B-v2 β€” CPU-Only Streaming ASR
33
 
34
  **Real-time English speech-to-text in your browser β€” no GPU required.**
35
+ This Space runs the 600 M-parameter [`nvidia/parakeet-tdt-0.6b-v2`](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v2) model that fits comfortably on the **CPU Basic (2 vCPU)** tier.
36
 
37
  ## πŸš€ Quick Start
38
  1. Click **β€œRecord”**
 
41
 
42
  > **Stalled UI?** Refresh the browser tab β€” this fully restarts the Space and clears any stuck threads.
43
 
 
 
 
 
44
  ## πŸ”§ Build on This
45
  - **Duplicate** the Space (button at the top-right) to kick-start your own ASR ideas.
46
  - Swap in another NeMo/HF model β€” the quantization + streaming scaffold is ready.
 
49
  ## βš™οΈ Under the Hood
50
  | Technique | Why it matters |
51
  |-----------|----------------|
 
52
  | **`OMP_NUM_THREADS=2` & `torch.set_num_threads(2)`** | Matches the 2 vCPUs for optimal throughput |
53
+ | **FBGEMM backend** | Fastest kernels on x86 |
54
  | **4-second streaming window** | Low latency & small memory footprint |
55
  | **Gradio `stream_every=0.5`** | Updates the transcript twice per second for real-time feel |
56
 
 
64
 
65
  If you redistribute transcripts or fine-tuned weights, please retain the CC-BY-4.0 attribution notice.
66
 
67
+ ⭐ **If this Space helps you, please give it a like and share your feedback!**