Kokoro 82M (MLX, BF16)

Kokoro 82M text-to-speech model in MLX BF16 SafeTensors format for Apple Silicon. Fast, high-quality speech synthesis with 32+ voices.

Performance (M3 Max, 64 GB)

Text Length	Latency	Audio Duration
Short (~10 words)	47 ms	1.7s
Medium (~50 words)	224 ms	9.3s
Long (~100 words)	483 ms	21.3s

Metric	Value
RTFx	41x realtime
TTFC (streaming)	126 ms
Peak memory	~200 MB
Parameters	82M
Sample rate	24 kHz

Benchmarked on Apple M3 Max (64 GB), macOS Sequoia 15.7.3, MLX 0.30.4.

Voices

32+ voices across multiple accents and languages:

Prefix	Accent	Voices
`af_*`	American Female	aoede, alloy, bella, heart, jessica, kore, nicole, nova, river, sarah, sky
`am_*`	American Male	adam, echo, eric, liam, michael, onyx
`bf_*`	British Female	alice, emma, lily
`bm_*`	British Male	daniel, fable, george, lewis
`jf_*`	Japanese Female	alpha, gongitsune
`jm_*`	Japanese Male	kumo
`zf_*`	Chinese Female	xiaobei, xiaoni, xiaoxiao
`zm_*`	Chinese Male	yunxi, yunxia, yunyang, yunjian

Usage

from sonic_tts import SonicTTS

tts = SonicTTS(voice="af_heart")
result = tts.synthesize("Hello, world!")

# Streaming (lower time-to-first-audio)
from sonic_tts.streaming import StreamingTTS
streamer = StreamingTTS(synthesize_fn=tts.synthesize, strategy="clause")
for chunk in streamer.stream("Long text here..."):
    play(chunk.audio)  # first chunk arrives in ~126ms

Install: pip install sonic-tts

Origin

Weights from mlx-community/Kokoro-82M-bf16, based on hexgrad/Kokoro-82M.

Part of the Sonic Speech model collection for the Sonic local-first voice AI project.

Downloads last month: 45

MLX

Hardware compatibility

Quantized