Kanana 1.5 8B Instruct โ MLX MXFP4 (group size 32)
Author: B
Apple Siliconโoptimized MLX build of kakaocorp/kanana-1.5-8b-instruct-2505, quantized to MXFP4 (4-bit) with group size 32. Compact, fast, and ready for MLX and LM Studio.
Quantization & conversion were done with
mlx_lm.convert. See โReproduceโ below.
Highlights
- Format: MLX (Apple Silicon)
- Quantization:
mxfp4(4-bit),q_group_size=32 - Upstream:
kakaocorp/kanana-1.5-8b-instruct-2505(Apache-2.0) - Use cases: KR/EN chat & instruction following, local assistants, RAG backends
- Why MLX: smaller memory footprint and strong throughput on M-series chips
Note: Depending on
mlx-lmversion, the produced weight file may bemodel.safetensorsor MLX-specific.mlx/.mlxf. Both are supported by MLX-LM.
Quickstart (Python)
from mlx_lm import load, generate
model_id = "YOUR_HF_USERNAME/kanana-1.5-8b-instruct-mlx-mxfp4-g32"
model, tokenizer = load(model_id)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "ํ์ด์ฌ ์ ๋๋ ์ดํฐ๋ฅผ ๊ฐ๋จ ์์ ๋ก ์ค๋ช
ํด์ค."}
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
print(generate(model, tokenizer, prompt=prompt, max_tokens=256, temperature=0.7, top_p=0.95))
Use with LM Studio (OpenAI-compatible)
- Add this folder as a Local Model.
- Start the local server and call /v1/chat/completions.
curl -s http://localhost:1234/v1/chat/completions \
-H 'Content-Type: application/json' \
-d '{
"model": "YOUR_HF_USERNAME/kanana-1.5-8b-instruct-mlx-mxfp4-g32",
"messages": [
{"role":"system","content":"You are a helpful assistant."},
{"role":"user","content":"Explain Python generators with a tiny example."}
]
}'
Note: The original chat template could error when tools was null (Jinja selectattr on a NullValue). This repo fixes it by defaulting tools to [] when undefined, so both toolโfree and toolโenabled chats render safely.
Reproduce (Conversion)
Both syntaxes are shown; use the one matching your mlx-lm version.
# Newer CLI style
mlx_lm.convert \
--hf-path kakaocorp/kanana-1.5-8b-instruct-2505 \
-q --q-mode mxfp4 --q-group-size 32 \
--mlx-path ./kanana15-8b-instruct-mlx-mxfp4-g32
# Older CLI style
mlx_lm.convert \
--hf-path kakaocorp/kanana-1.5-8b-instruct-2505 \
--quantize mxfp4 --group-size 32 \
--outfile ./kanana15-8b-instruct-mlx-mxfp4-g32
Specs
- Params: ~8B (Llama-family causal LM)
- Context length: up to 32k tokens
- Typical memory (single stream): ~4โ6 GB on recent M-series (varies by OS/seq length)
- Quantization: MXFP4 (group size 32), weight-only 4-bit
Prompting
For most chat UIs (MLX-LM / LM Studio), the included template will format roles automatically. Raw text generation is also supported: pass a single string prompt to generate().
Files
- model.* (MLX weights)
- config.json, generation_config.json
- tokenizer.json, tokenizer_config.json, special_tokens_map.json
- chat_template.jinja (tool-free ChatML)
License
- Upstream: Apache-2.0 (see Kakaoโs original model card)
- This repo: Converted/quantized distribution following the upstream license. Please comply with upstream terms if you re-convert or redistribute.
Acknowledgements
Thanks to the Kanana team and the MLX/MLX-LM community.
You're Welcome !
- Downloads last month
- 8
Model tree for brabooObrabo/kanana-1.5-8b-instruct-2505-MXFP4-GS32-MLX
Base model
kakaocorp/kanana-1.5-8b-instruct-2505