Kanana 1.5 8B Instruct — MLX MXFP4 (group size 32)

Author: B

Apple Silicon–optimized MLX build of kakaocorp/kanana-1.5-8b-instruct-2505, quantized to MXFP4 (4-bit) with group size 32. Compact, fast, and ready for MLX and LM Studio.

Quantization & conversion were done with mlx_lm.convert. See “Reproduce” below.

Highlights

Format: MLX (Apple Silicon)
Quantization: mxfp4 (4-bit), q_group_size=32
Upstream: kakaocorp/kanana-1.5-8b-instruct-2505 (Apache-2.0)
Use cases: KR/EN chat & instruction following, local assistants, RAG backends
Why MLX: smaller memory footprint and strong throughput on M-series chips

Note: Depending on mlx-lm version, the produced weight file may be model.safetensors or MLX-specific .mlx/.mlxf. Both are supported by MLX-LM.

Quickstart (Python)

from mlx_lm import load, generate

model_id = "YOUR_HF_USERNAME/kanana-1.5-8b-instruct-mlx-mxfp4-g32"
model, tokenizer = load(model_id)

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "파이썬 제너레이터를 간단 예제로 설명해줘."}
]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)

print(generate(model, tokenizer, prompt=prompt, max_tokens=256, temperature=0.7, top_p=0.95))

Use with LM Studio (OpenAI-compatible)

Add this folder as a Local Model.
Start the local server and call /v1/chat/completions.

curl -s http://localhost:1234/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "YOUR_HF_USERNAME/kanana-1.5-8b-instruct-mlx-mxfp4-g32",
    "messages": [
      {"role":"system","content":"You are a helpful assistant."},
      {"role":"user","content":"Explain Python generators with a tiny example."}
    ]
  }'

Note: The original chat template could error when tools was null (Jinja selectattr on a NullValue). This repo fixes it by defaulting tools to [] when undefined, so both tool‑free and tool‑enabled chats render safely.

Reproduce (Conversion)

Both syntaxes are shown; use the one matching your mlx-lm version.

# Newer CLI style
mlx_lm.convert \
  --hf-path kakaocorp/kanana-1.5-8b-instruct-2505 \
  -q --q-mode mxfp4 --q-group-size 32 \
  --mlx-path ./kanana15-8b-instruct-mlx-mxfp4-g32

# Older CLI style
mlx_lm.convert \
  --hf-path kakaocorp/kanana-1.5-8b-instruct-2505 \
  --quantize mxfp4 --group-size 32 \
  --outfile ./kanana15-8b-instruct-mlx-mxfp4-g32

Specs

Params: ~8B (Llama-family causal LM)
Context length: up to 32k tokens
Typical memory (single stream): ~4–6 GB on recent M-series (varies by OS/seq length)
Quantization: MXFP4 (group size 32), weight-only 4-bit

Prompting

For most chat UIs (MLX-LM / LM Studio), the included template will format roles automatically. Raw text generation is also supported: pass a single string prompt to generate().

Files

model.* (MLX weights)
config.json, generation_config.json
tokenizer.json, tokenizer_config.json, special_tokens_map.json
chat_template.jinja (tool-free ChatML)

License

Upstream: Apache-2.0 (see Kakao’s original model card)
This repo: Converted/quantized distribution following the upstream license. Please comply with upstream terms if you re-convert or redistribute.

Acknowledgements

Thanks to the Kanana team and the MLX/MLX-LM community.

You're Welcome !

Downloads last month: 8

Safetensors

Model size

2B params

Tensor type

U32

BF16

Model tree for brabooObrabo/kanana-1.5-8b-instruct-2505-MXFP4-GS32-MLX

Base model

kakaocorp/kanana-1.5-8b-instruct-2505

Quantized

(18)

this model