BabelVox OpenVINO INT8 Models

Pre-exported INT8 OpenVINO IR models for BabelVox — real-time text-to-speech on Intel NPU/CPU.

Based on Qwen3-TTS-12Hz-0.6B-Base by Alibaba Qwen Team.

Usage

pip install babelvox

from babelvox import BabelVox

# Models auto-download on first use (~2.5 GB, cached)
tts = BabelVox(device="NPU", precision="int8",
               use_cp_kv_cache=True, talker_buckets=[64, 128, 256])

wav, sr = tts.generate("Don't panic.", language="English")

import soundfile as sf
sf.write("output.wav", wav, sr)

Or from CLI:

babelvox --device NPU --int8 --cp-kv-cache --talker-buckets "64,128,256" \
  --text "Hello world" --output hello.wav

What's included

Directory	Contents	Size
`int8/`	OpenVINO IR models (INT8 quantized)	~1.7 GB
`weights/`	Numpy embedding tables + projection weights	~870 MB

INT8 models

Component	File	Size	Device
Talker (28L transformer)	`talker.xml/.bin`	444 MB	NPU
Talker prefill (KV cache)	`talker_prefill.xml/.bin`	444 MB	CPU
Talker decode (KV cache)	`talker_decode.xml/.bin`	444 MB	NPU
Code predictor	`code_predictor.xml/.bin`	79 MB	CPU
CP prefill (KV cache)	`cp_prefill.xml/.bin`	79 MB	CPU
CP decode (KV cache)	`cp_decode.xml/.bin`	79 MB	CPU
Speaker encoder	`speaker_encoder.xml/.bin`	9 MB	NPU
Tokenizer decoder	`tokenizer_decoder.xml/.bin`	114 MB	NPU
Tokenizer encoder	`tokenizer_encoder.xml/.bin`	48 MB	NPU

Performance

Tested on Samsung Galaxy Book5 Pro (Intel Core Ultra 7 258V, 32 GB RAM):

Optimization	RTF	Notes
FP16 NPU baseline	3.0x	Full-recompute, 256-token padding
+ INT8 quantization	2.1x	These models
+ CP KV cache	1.4x	Eliminates redundant CP recomputation
+ Multi-bucket talker	1.0x	Real-time speech synthesis

RTF = Real-Time Factor. 1.0x means 1 second of audio takes 1 second to generate.

How these were made

Exported from Qwen3-TTS-12Hz-0.6B-Base using OpenVINO ONNX conversion
Quantized with NNCF INT8_SYM per-channel weight compression
Embedding tables exported as numpy arrays (no PyTorch at runtime)

See tools/ for the export scripts.

License

Apache-2.0 (same as upstream Qwen3-TTS)

Downloads last month: -; Downloads are not tracked for this model. How to track