BabelVox OpenVINO INT8 Models
Pre-exported INT8 OpenVINO IR models for BabelVox — real-time text-to-speech on Intel NPU/CPU.
Based on Qwen3-TTS-12Hz-0.6B-Base by Alibaba Qwen Team.
Usage
pip install babelvox
from babelvox import BabelVox
# Models auto-download on first use (~2.5 GB, cached)
tts = BabelVox(device="NPU", precision="int8",
use_cp_kv_cache=True, talker_buckets=[64, 128, 256])
wav, sr = tts.generate("Don't panic.", language="English")
import soundfile as sf
sf.write("output.wav", wav, sr)
Or from CLI:
babelvox --device NPU --int8 --cp-kv-cache --talker-buckets "64,128,256" \
--text "Hello world" --output hello.wav
What's included
| Directory | Contents | Size |
|---|---|---|
int8/ |
OpenVINO IR models (INT8 quantized) | ~1.7 GB |
weights/ |
Numpy embedding tables + projection weights | ~870 MB |
INT8 models
| Component | File | Size | Device |
|---|---|---|---|
| Talker (28L transformer) | talker.xml/.bin |
444 MB | NPU |
| Talker prefill (KV cache) | talker_prefill.xml/.bin |
444 MB | CPU |
| Talker decode (KV cache) | talker_decode.xml/.bin |
444 MB | NPU |
| Code predictor | code_predictor.xml/.bin |
79 MB | CPU |
| CP prefill (KV cache) | cp_prefill.xml/.bin |
79 MB | CPU |
| CP decode (KV cache) | cp_decode.xml/.bin |
79 MB | CPU |
| Speaker encoder | speaker_encoder.xml/.bin |
9 MB | NPU |
| Tokenizer decoder | tokenizer_decoder.xml/.bin |
114 MB | NPU |
| Tokenizer encoder | tokenizer_encoder.xml/.bin |
48 MB | NPU |
Performance
Tested on Samsung Galaxy Book5 Pro (Intel Core Ultra 7 258V, 32 GB RAM):
| Optimization | RTF | Notes |
|---|---|---|
| FP16 NPU baseline | 3.0x | Full-recompute, 256-token padding |
| + INT8 quantization | 2.1x | These models |
| + CP KV cache | 1.4x | Eliminates redundant CP recomputation |
| + Multi-bucket talker | 1.0x | Real-time speech synthesis |
RTF = Real-Time Factor. 1.0x means 1 second of audio takes 1 second to generate.
How these were made
- Exported from Qwen3-TTS-12Hz-0.6B-Base using OpenVINO ONNX conversion
- Quantized with NNCF INT8_SYM per-channel weight compression
- Embedding tables exported as numpy arrays (no PyTorch at runtime)
See tools/ for the export scripts.
License
Apache-2.0 (same as upstream Qwen3-TTS)