EXAONE-Deep-7.8B-MLX-8bit

MLX-converted and 8-bit quantized version of LGAI-EXAONE/EXAONE-Deep-7.8B, optimized for Apple Silicon Macs.

Model Description

EXAONE Deep is a reasoning-enhanced language model developed by LG AI Research. It exhibits superior capabilities in various reasoning tasks including math, coding, and analysis.

Spec Value
Original Model LGAI-EXAONE/EXAONE-Deep-7.8B
Quantization 8-bit (8.5 bits per weight)
Framework MLX (Apple Silicon native)
Size ~7.7GB (reduced from ~16GB)
Languages English, Korean

Performance

Tested on M2 Max 32GB:

Metric Value
Load time ~1-2s
Generation ~25-35 tok/s
Memory usage ~8GB

Usage

Installation

pip install mlx-lm

Basic Usage

from mlx_lm import load, generate

model, tokenizer = load("sinbal/EXAONE-Deep-7.8B-MLX-8bit")

prompt = "Explain the key factors for AI investment in 2025."
messages = [{"role": "user", "content": prompt}]
formatted = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

response = generate(model, tokenizer, prompt=formatted, max_tokens=500)
print(response)

Memory Management

MLX provides explicit memory control - ideal for resource-constrained environments:

# Explicit cleanup when done
del model, tokenizer
import gc; gc.collect()
import mlx.core as mx; mx.clear_cache()

Conversion Details

Converted using mlx-lm version 0.28.3:

mlx_lm.convert \
    --hf-path LGAI-EXAONE/EXAONE-Deep-7.8B \
    -q \
    --q-bits 8 \
    --mlx-path ./EXAONE-Deep-7.8B-MLX-8bit

License

This model follows the EXAONE AI Model License Agreement 1.1. Please refer to the original model's license for terms of use.

Acknowledgements

Downloads last month
28
Safetensors
Model size
8B params
Tensor type
BF16
·
U32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sinbal/EXAONE-Deep-7.8B-MLX-8bit

Quantized
(12)
this model