Apriel-1.5-15B-Thinker β MLX Quantized (Apple Silicon)
Format: MLX (Apple Silicon)
Variants: 6-bit (recommended)
Base model: ServiceNow-AI/Apriel-1.5-15B-Thinker
Architecture: Pixtral-style LLaVA (vision encoder β 2-layer projector β decoder)
Intended use: image understanding & grounded reasoning; document/chart/OCR-style tasks; math/coding Q&A with visual context.
This repository provides MLX-format weights for Apple Silicon (M-series) built from the original Apriel-1.5-15B-Thinker release. It is optimized for on-device inference with small memory footprints and fast startup on macOS.
π What is Apriel-1.5-15B-Thinker?
Apriel-1.5-15B-Thinker is a 15B open-weights multimodal reasoning model trained via a data-centric mid-training recipe rather than RLHF/RM. Starting from Pixtral-12B as the base, the authors apply:
- Depth Upscaling (capacity expansion without pretraining from scratch),
- Two-stage multimodal continual pretraining (CPT) to build text + visual reasoning, and
- High-quality SFT with explicit reasoning traces across math, coding, science, and tool use.
This approach delivers frontier-level capability on compact compute. :contentReference[oaicite:0]{index=0}
Key reported results (original model)
- AAI Index: 52, matching DeepSeek-R1-0528 at far lower compute. :contentReference[oaicite:1]{index=1}
- Multimodal: On 10 image benchmarks, within ~5 points of Gemini-2.5-Flash and Claude Sonnet-3.7 on average. :contentReference[oaicite:2]{index=2}
- Designed for single-GPU / constrained deployment scenarios. :contentReference[oaicite:3]{index=3}
Notes above summarize the upstream paper; MLX quantization can slightly affect absolute scores. Always validate on your use case.
ποΈ Architecture (high level)
- Backbone: Pixtral-12B-Base-2409 adapted to a larger 15B decoder via depth upscaling (layers 40 β 48), then re-aligned with a 2-layer projection network connecting the vision encoder and decoder. :contentReference[oaicite:4]{index=4}
- Training stack:
- CPT Stage-1: mixed tokens (β50% text, 20% replay, 30% multimodal) for foundational reasoning & image understanding; 32k context; cosine LR with warmup; all components unfrozen; checkpoint averaging. :contentReference[oaicite:5]{index=5}
- CPT Stage-2: targeted synthetic visual tasks (reconstruction, visual matching, detection, counting) to strengthen spatial/compositional/fine-grained reasoning; vision encoder frozen; loss on responses for instruct data; 16k context. :contentReference[oaicite:6]{index=6}
- SFT: curated instruction-response pairs with explicit reasoning traces (math, coding, science, tools). :contentReference[oaicite:7]{index=7}
πΎ This MLX Release
- Why MLX? Native Apple-Silicon inference with small binaries, fast load, and low memory overhead.
- Whatβs included:
config.json,mlx_model*.safetensors(sharded), tokenizer & processor files, and metadata for VLM pipelines. - Quantization options:
- 6-bit (recommended): best balance of quality & memory.
Tip: If youβre capacity-constrained on an M1/M2, try 6-bit first;
βοΈ Quickstart (CLI)
# Basic image caption
python -m mlx_vlm.generate \
--model <this-repo-id> \
--image /path/to/image.jpg \
--prompt "Describe this image." \
--max-tokens 128 --temperature 0.0 --device mps
- Downloads last month
- 186
Model tree for mlx-community/Apriel-1.5-15b-Thinker-6bit-MLX
Base model
ServiceNow-AI/Apriel-1.5-15b-Thinker