Kokoro 82M (MLX, BF16)

Kokoro 82M text-to-speech model in MLX BF16 SafeTensors format for Apple Silicon. Fast, high-quality speech synthesis with 32+ voices.

Performance (M3 Max, 64 GB)

Text Length Latency Audio Duration
Short (~10 words) 47 ms 1.7s
Medium (~50 words) 224 ms 9.3s
Long (~100 words) 483 ms 21.3s
Metric Value
RTFx 41x realtime
TTFC (streaming) 126 ms
Peak memory ~200 MB
Parameters 82M
Sample rate 24 kHz

Benchmarked on Apple M3 Max (64 GB), macOS Sequoia 15.7.3, MLX 0.30.4.

Voices

32+ voices across multiple accents and languages:

Prefix Accent Voices
af_* American Female aoede, alloy, bella, heart, jessica, kore, nicole, nova, river, sarah, sky
am_* American Male adam, echo, eric, liam, michael, onyx
bf_* British Female alice, emma, lily
bm_* British Male daniel, fable, george, lewis
jf_* Japanese Female alpha, gongitsune
jm_* Japanese Male kumo
zf_* Chinese Female xiaobei, xiaoni, xiaoxiao
zm_* Chinese Male yunxi, yunxia, yunyang, yunjian

Usage

from sonic_tts import SonicTTS

tts = SonicTTS(voice="af_heart")
result = tts.synthesize("Hello, world!")

# Streaming (lower time-to-first-audio)
from sonic_tts.streaming import StreamingTTS
streamer = StreamingTTS(synthesize_fn=tts.synthesize, strategy="clause")
for chunk in streamer.stream("Long text here..."):
    play(chunk.audio)  # first chunk arrives in ~126ms

Install: pip install sonic-tts

Origin

Weights from mlx-community/Kokoro-82M-bf16, based on hexgrad/Kokoro-82M.

Part of the Sonic Speech model collection for the Sonic local-first voice AI project.

Downloads last month
45
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support