Kanana 1.5 8B Instruct โ€” MLX MXFP4 (group size 32)

Author: B

Apple Siliconโ€“optimized MLX build of kakaocorp/kanana-1.5-8b-instruct-2505, quantized to MXFP4 (4-bit) with group size 32. Compact, fast, and ready for MLX and LM Studio.

Quantization & conversion were done with mlx_lm.convert. See โ€œReproduceโ€ below.

Highlights

  • Format: MLX (Apple Silicon)
  • Quantization: mxfp4 (4-bit), q_group_size=32
  • Upstream: kakaocorp/kanana-1.5-8b-instruct-2505 (Apache-2.0)
  • Use cases: KR/EN chat & instruction following, local assistants, RAG backends
  • Why MLX: smaller memory footprint and strong throughput on M-series chips

Note: Depending on mlx-lm version, the produced weight file may be model.safetensors or MLX-specific .mlx/.mlxf. Both are supported by MLX-LM.

Quickstart (Python)

from mlx_lm import load, generate

model_id = "YOUR_HF_USERNAME/kanana-1.5-8b-instruct-mlx-mxfp4-g32"
model, tokenizer = load(model_id)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "ํŒŒ์ด์ฌ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ๋ฅผ ๊ฐ„๋‹จ ์˜ˆ์ œ๋กœ ์„ค๋ช…ํ•ด์ค˜."}
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

print(generate(model, tokenizer, prompt=prompt, max_tokens=256, temperature=0.7, top_p=0.95))

Use with LM Studio (OpenAI-compatible)

  1. Add this folder as a Local Model.
  2. Start the local server and call /v1/chat/completions.
curl -s http://localhost:1234/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "YOUR_HF_USERNAME/kanana-1.5-8b-instruct-mlx-mxfp4-g32",
    "messages": [
      {"role":"system","content":"You are a helpful assistant."},
      {"role":"user","content":"Explain Python generators with a tiny example."}
    ]
  }'

Note: The original chat template could error when tools was null (Jinja selectattr on a NullValue). This repo fixes it by defaulting tools to [] when undefined, so both toolโ€‘free and toolโ€‘enabled chats render safely.

Reproduce (Conversion)

Both syntaxes are shown; use the one matching your mlx-lm version.

# Newer CLI style
mlx_lm.convert \
  --hf-path kakaocorp/kanana-1.5-8b-instruct-2505 \
  -q --q-mode mxfp4 --q-group-size 32 \
  --mlx-path ./kanana15-8b-instruct-mlx-mxfp4-g32
# Older CLI style
mlx_lm.convert \
  --hf-path kakaocorp/kanana-1.5-8b-instruct-2505 \
  --quantize mxfp4 --group-size 32 \
  --outfile ./kanana15-8b-instruct-mlx-mxfp4-g32

Specs

  • Params: ~8B (Llama-family causal LM)
  • Context length: up to 32k tokens
  • Typical memory (single stream): ~4โ€“6 GB on recent M-series (varies by OS/seq length)
  • Quantization: MXFP4 (group size 32), weight-only 4-bit

Prompting

For most chat UIs (MLX-LM / LM Studio), the included template will format roles automatically. Raw text generation is also supported: pass a single string prompt to generate().

Files

  • model.* (MLX weights)
  • config.json, generation_config.json
  • tokenizer.json, tokenizer_config.json, special_tokens_map.json
  • chat_template.jinja (tool-free ChatML)

License

  • Upstream: Apache-2.0 (see Kakaoโ€™s original model card)
  • This repo: Converted/quantized distribution following the upstream license. Please comply with upstream terms if you re-convert or redistribute.

Acknowledgements

Thanks to the Kanana team and the MLX/MLX-LM community.

You're Welcome !

Downloads last month
8
Safetensors
Model size
2B params
Tensor type
U8
ยท
U32
ยท
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for brabooObrabo/kanana-1.5-8b-instruct-2505-MXFP4-GS32-MLX

Quantized
(18)
this model