Apriel-1.5-15B-Thinker β€” MLX Quantized (Apple Silicon)

Format: MLX (Apple Silicon)
Variants: 6-bit (recommended) Base model: ServiceNow-AI/Apriel-1.5-15B-Thinker
Architecture: Pixtral-style LLaVA (vision encoder β†’ 2-layer projector β†’ decoder)
Intended use: image understanding & grounded reasoning; document/chart/OCR-style tasks; math/coding Q&A with visual context.

This repository provides MLX-format weights for Apple Silicon (M-series) built from the original Apriel-1.5-15B-Thinker release. It is optimized for on-device inference with small memory footprints and fast startup on macOS.


πŸ”Ž What is Apriel-1.5-15B-Thinker?

Apriel-1.5-15B-Thinker is a 15B open-weights multimodal reasoning model trained via a data-centric mid-training recipe rather than RLHF/RM. Starting from Pixtral-12B as the base, the authors apply:

  1. Depth Upscaling (capacity expansion without pretraining from scratch),
  2. Two-stage multimodal continual pretraining (CPT) to build text + visual reasoning, and
  3. High-quality SFT with explicit reasoning traces across math, coding, science, and tool use.
    This approach delivers frontier-level capability on compact compute. :contentReference[oaicite:0]{index=0}

Key reported results (original model)

  • AAI Index: 52, matching DeepSeek-R1-0528 at far lower compute. :contentReference[oaicite:1]{index=1}
  • Multimodal: On 10 image benchmarks, within ~5 points of Gemini-2.5-Flash and Claude Sonnet-3.7 on average. :contentReference[oaicite:2]{index=2}
  • Designed for single-GPU / constrained deployment scenarios. :contentReference[oaicite:3]{index=3}

Notes above summarize the upstream paper; MLX quantization can slightly affect absolute scores. Always validate on your use case.


πŸ—οΈ Architecture (high level)

  • Backbone: Pixtral-12B-Base-2409 adapted to a larger 15B decoder via depth upscaling (layers 40 β†’ 48), then re-aligned with a 2-layer projection network connecting the vision encoder and decoder. :contentReference[oaicite:4]{index=4}
  • Training stack:
    • CPT Stage-1: mixed tokens (β‰ˆ50% text, 20% replay, 30% multimodal) for foundational reasoning & image understanding; 32k context; cosine LR with warmup; all components unfrozen; checkpoint averaging. :contentReference[oaicite:5]{index=5}
    • CPT Stage-2: targeted synthetic visual tasks (reconstruction, visual matching, detection, counting) to strengthen spatial/compositional/fine-grained reasoning; vision encoder frozen; loss on responses for instruct data; 16k context. :contentReference[oaicite:6]{index=6}
    • SFT: curated instruction-response pairs with explicit reasoning traces (math, coding, science, tools). :contentReference[oaicite:7]{index=7}

πŸ’Ύ This MLX Release

  • Why MLX? Native Apple-Silicon inference with small binaries, fast load, and low memory overhead.
  • What’s included: config.json, mlx_model*.safetensors (sharded), tokenizer & processor files, and metadata for VLM pipelines.
  • Quantization options:
    • 6-bit (recommended): best balance of quality & memory.

Tip: If you’re capacity-constrained on an M1/M2, try 6-bit first;


βš™οΈ Quickstart (CLI)

# Basic image caption
python -m mlx_vlm.generate \
  --model <this-repo-id> \
  --image /path/to/image.jpg \
  --prompt "Describe this image." \
  --max-tokens 128 --temperature 0.0 --device mps
Downloads last month
186
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX

Quantized
(23)
this model

Collection including mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX