BabelVox OpenVINO INT8 Models

Pre-exported INT8 OpenVINO IR models for BabelVox — real-time text-to-speech on Intel NPU/CPU.

Based on Qwen3-TTS-12Hz-0.6B-Base by Alibaba Qwen Team.

Usage

pip install babelvox
from babelvox import BabelVox

# Models auto-download on first use (~2.5 GB, cached)
tts = BabelVox(device="NPU", precision="int8",
               use_cp_kv_cache=True, talker_buckets=[64, 128, 256])

wav, sr = tts.generate("Don't panic.", language="English")

import soundfile as sf
sf.write("output.wav", wav, sr)

Or from CLI:

babelvox --device NPU --int8 --cp-kv-cache --talker-buckets "64,128,256" \
  --text "Hello world" --output hello.wav

What's included

Directory Contents Size
int8/ OpenVINO IR models (INT8 quantized) ~1.7 GB
weights/ Numpy embedding tables + projection weights ~870 MB

INT8 models

Component File Size Device
Talker (28L transformer) talker.xml/.bin 444 MB NPU
Talker prefill (KV cache) talker_prefill.xml/.bin 444 MB CPU
Talker decode (KV cache) talker_decode.xml/.bin 444 MB NPU
Code predictor code_predictor.xml/.bin 79 MB CPU
CP prefill (KV cache) cp_prefill.xml/.bin 79 MB CPU
CP decode (KV cache) cp_decode.xml/.bin 79 MB CPU
Speaker encoder speaker_encoder.xml/.bin 9 MB NPU
Tokenizer decoder tokenizer_decoder.xml/.bin 114 MB NPU
Tokenizer encoder tokenizer_encoder.xml/.bin 48 MB NPU

Performance

Tested on Samsung Galaxy Book5 Pro (Intel Core Ultra 7 258V, 32 GB RAM):

Optimization RTF Notes
FP16 NPU baseline 3.0x Full-recompute, 256-token padding
+ INT8 quantization 2.1x These models
+ CP KV cache 1.4x Eliminates redundant CP recomputation
+ Multi-bucket talker 1.0x Real-time speech synthesis

RTF = Real-Time Factor. 1.0x means 1 second of audio takes 1 second to generate.

How these were made

  1. Exported from Qwen3-TTS-12Hz-0.6B-Base using OpenVINO ONNX conversion
  2. Quantized with NNCF INT8_SYM per-channel weight compression
  3. Embedding tables exported as numpy arrays (no PyTorch at runtime)

See tools/ for the export scripts.

License

Apache-2.0 (same as upstream Qwen3-TTS)

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support