Apriel-1.5-15B-Thinker — MLX Quantized (Apple Silicon)

Format: MLX (Apple Silicon)
Variants: 6-bit (recommended) Base model: ServiceNow-AI/Apriel-1.5-15B-Thinker
Architecture: Pixtral-style LLaVA (vision encoder → 2-layer projector → decoder)
Intended use: image understanding & grounded reasoning; document/chart/OCR-style tasks; math/coding Q&A with visual context.

This repository provides MLX-format weights for Apple Silicon (M-series) built from the original Apriel-1.5-15B-Thinker release. It is optimized for on-device inference with small memory footprints and fast startup on macOS.

🔎 What is Apriel-1.5-15B-Thinker?

Apriel-1.5-15B-Thinker is a 15B open-weights multimodal reasoning model trained via a data-centric mid-training recipe rather than RLHF/RM. Starting from Pixtral-12B as the base, the authors apply:

Depth Upscaling (capacity expansion without pretraining from scratch),
Two-stage multimodal continual pretraining (CPT) to build text + visual reasoning, and
High-quality SFT with explicit reasoning traces across math, coding, science, and tool use.
This approach delivers frontier-level capability on compact compute. :contentReference[oaicite:0]{index=0}

Key reported results (original model)

AAI Index: 52, matching DeepSeek-R1-0528 at far lower compute. :contentReference[oaicite:1]{index=1}
Multimodal: On 10 image benchmarks, within ~5 points of Gemini-2.5-Flash and Claude Sonnet-3.7 on average. :contentReference[oaicite:2]{index=2}
Designed for single-GPU / constrained deployment scenarios. :contentReference[oaicite:3]{index=3}

Notes above summarize the upstream paper; MLX quantization can slightly affect absolute scores. Always validate on your use case.

🏗️ Architecture (high level)

Backbone: Pixtral-12B-Base-2409 adapted to a larger 15B decoder via depth upscaling (layers 40 → 48), then re-aligned with a 2-layer projection network connecting the vision encoder and decoder. :contentReference[oaicite:4]{index=4}
Training stack:
- CPT Stage-1: mixed tokens (≈50% text, 20% replay, 30% multimodal) for foundational reasoning & image understanding; 32k context; cosine LR with warmup; all components unfrozen; checkpoint averaging. :contentReference[oaicite:5]{index=5}
- CPT Stage-2: targeted synthetic visual tasks (reconstruction, visual matching, detection, counting) to strengthen spatial/compositional/fine-grained reasoning; vision encoder frozen; loss on responses for instruct data; 16k context. :contentReference[oaicite:6]{index=6}
- SFT: curated instruction-response pairs with explicit reasoning traces (math, coding, science, tools). :contentReference[oaicite:7]{index=7}

💾 This MLX Release

Why MLX? Native Apple-Silicon inference with small binaries, fast load, and low memory overhead.
What’s included: config.json, mlx_model*.safetensors (sharded), tokenizer & processor files, and metadata for VLM pipelines.
Quantization options:
- 6-bit (recommended): best balance of quality & memory.

Tip: If you’re capacity-constrained on an M1/M2, try 6-bit first;

⚙️ Quickstart (CLI)

# Basic image caption
python -m mlx_vlm.generate \
  --model <this-repo-id> \
  --image /path/to/image.jpg \
  --prompt "Describe this image." \
  --max-tokens 128 --temperature 0.0 --device mps

Downloads last month: 186

Model tree for mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX

Base model

ServiceNow-AI/Apriel-1.5-15b-Thinker

Quantized

(23)

this model

Collection including mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX

ServiceNow-Apriel

Collection

Apriel-1.5-15b-Thinker is a multimodal reasoning model in ServiceNow’s Apriel SLM series which achieves competitive performance against models 10 time • 6 items • Updated 24 days ago • 1