EXAONE-Deep-7.8B-MLX-8bit

MLX-converted and 8-bit quantized version of LGAI-EXAONE/EXAONE-Deep-7.8B, optimized for Apple Silicon Macs.

Model Description

EXAONE Deep is a reasoning-enhanced language model developed by LG AI Research. It exhibits superior capabilities in various reasoning tasks including math, coding, and analysis.

Spec	Value
Original Model	LGAI-EXAONE/EXAONE-Deep-7.8B
Quantization	8-bit (8.5 bits per weight)
Framework	MLX (Apple Silicon native)
Size	~7.7GB (reduced from ~16GB)
Languages	English, Korean

Performance

Tested on M2 Max 32GB:

Metric	Value
Load time	~1-2s
Generation	~25-35 tok/s
Memory usage	~8GB

Usage

Installation

pip install mlx-lm

Basic Usage

from mlx_lm import load, generate

model, tokenizer = load("sinbal/EXAONE-Deep-7.8B-MLX-8bit")

prompt = "Explain the key factors for AI investment in 2025."
messages = [{"role": "user", "content": prompt}]
formatted = tokenizer.apply_chat_template(
    messages, tokenize=False, add_generation_prompt=True
)

response = generate(model, tokenizer, prompt=formatted, max_tokens=500)
print(response)

Memory Management

MLX provides explicit memory control - ideal for resource-constrained environments:

# Explicit cleanup when done
del model, tokenizer
import gc; gc.collect()
import mlx.core as mx; mx.clear_cache()

Conversion Details

Converted using mlx-lm version 0.28.3:

mlx_lm.convert \
    --hf-path LGAI-EXAONE/EXAONE-Deep-7.8B \
    -q \
    --q-bits 8 \
    --mlx-path ./EXAONE-Deep-7.8B-MLX-8bit

License

This model follows the EXAONE AI Model License Agreement 1.1. Please refer to the original model's license for terms of use.

Acknowledgements

Original model by LG AI Research
MLX framework by Apple

Downloads last month: 28

Safetensors

Model size

8B params

Tensor type

BF16

U32

Model tree for sinbal/EXAONE-Deep-7.8B-MLX-8bit

Base model

LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct

Finetuned

LGAI-EXAONE/EXAONE-Deep-7.8B

Quantized

(12)

this model