patricklifixie commited on
Commit
6d540ab
·
verified ·
1 Parent(s): cbb431f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -85,7 +85,7 @@ Data: Synthetic conversational corpora with inserted <eot> tokens, translation-a
85
 
86
  Audio-native fusion (Ultravox projector):
87
  Attach and fine-tune the Ultravox audio projector so the model conditions jointly on audio embeddings and text, aligning prosodic cues with the <eot> objective.
88
- Data: Robust to real-world noise, device/mic variance, overlapping speech; extra single-turn samples emphasize intonation, pitch, drawn-out syllables.
89
 
90
  Calibration:
91
  Choose a decision threshold to balance precision vs. recall per language/domain. Recommended starting threshold: 0.1.
@@ -93,7 +93,7 @@ Raise the threshold if you find the model interrupting too eagerly, and lower th
93
 
94
  ## Performance & Deployment
95
 
96
- Latency (forward pass): ~30-100 ms on an H100.
97
 
98
  Common pattern: Pair with a streaming VAD (e.g., Silero). Invoke UltraVAD on short silences; its latency is often hidden under TTS time-to-first-token.
99
 
 
85
 
86
  Audio-native fusion (Ultravox projector):
87
  Attach and fine-tune the Ultravox audio projector so the model conditions jointly on audio embeddings and text, aligning prosodic cues with the <eot> objective.
88
+ Data: Robust to real-world noise, device/mic variance, overlapping speech.
89
 
90
  Calibration:
91
  Choose a decision threshold to balance precision vs. recall per language/domain. Recommended starting threshold: 0.1.
 
93
 
94
  ## Performance & Deployment
95
 
96
+ Latency (forward pass): ~65-110 ms on an A6000.
97
 
98
  Common pattern: Pair with a streaming VAD (e.g., Silero). Invoke UltraVAD on short silences; its latency is often hidden under TTS time-to-first-token.
99