Kokoro 82M (MLX, BF16)
Kokoro 82M text-to-speech model in MLX BF16 SafeTensors format for Apple Silicon. Fast, high-quality speech synthesis with 32+ voices.
Performance (M3 Max, 64 GB)
| Text Length | Latency | Audio Duration |
|---|---|---|
| Short (~10 words) | 47 ms | 1.7s |
| Medium (~50 words) | 224 ms | 9.3s |
| Long (~100 words) | 483 ms | 21.3s |
| Metric | Value |
|---|---|
| RTFx | 41x realtime |
| TTFC (streaming) | 126 ms |
| Peak memory | ~200 MB |
| Parameters | 82M |
| Sample rate | 24 kHz |
Benchmarked on Apple M3 Max (64 GB), macOS Sequoia 15.7.3, MLX 0.30.4.
Voices
32+ voices across multiple accents and languages:
| Prefix | Accent | Voices |
|---|---|---|
af_* |
American Female | aoede, alloy, bella, heart, jessica, kore, nicole, nova, river, sarah, sky |
am_* |
American Male | adam, echo, eric, liam, michael, onyx |
bf_* |
British Female | alice, emma, lily |
bm_* |
British Male | daniel, fable, george, lewis |
jf_* |
Japanese Female | alpha, gongitsune |
jm_* |
Japanese Male | kumo |
zf_* |
Chinese Female | xiaobei, xiaoni, xiaoxiao |
zm_* |
Chinese Male | yunxi, yunxia, yunyang, yunjian |
Usage
from sonic_tts import SonicTTS
tts = SonicTTS(voice="af_heart")
result = tts.synthesize("Hello, world!")
# Streaming (lower time-to-first-audio)
from sonic_tts.streaming import StreamingTTS
streamer = StreamingTTS(synthesize_fn=tts.synthesize, strategy="clause")
for chunk in streamer.stream("Long text here..."):
play(chunk.audio) # first chunk arrives in ~126ms
Install: pip install sonic-tts
Origin
Weights from mlx-community/Kokoro-82M-bf16, based on hexgrad/Kokoro-82M.
Part of the Sonic Speech model collection for the Sonic local-first voice AI project.
- Downloads last month
- 45
Hardware compatibility
Log In to add your hardware
Quantized