jina-vlm-mlx

Native MLX port of jina-vlm for Apple Silicon with 4-bit quantization.

Model Size

2.0 GB (down from 9.2 GB fp32, 79% compression)

Quantization Strategy

  • 4-bit weights (group_size=64): lm_head, vision encoder, VL connector, language model layers 1-27
  • bfloat16 weights: Embeddings, layer norms, language model layer 0

Installation

jina-vlm support is already merged into mlx-vlm master but not yet released. Until the next release, install from the main branch:

pip install git+https://github.com/Blaizzy/mlx-vlm.git@main

Usage

from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config

# Load model
model, processor = load("jinaai/jina-vlm-mlx")
config = load_config("jinaai/jina-vlm-mlx")

# Prepare input
image = ["photo.jpg"]
prompt = "Describe this image."

# Apply chat template
formatted_prompt = apply_chat_template(
    processor, config, prompt, num_images=1
)

# Generate
output = generate(model, processor, formatted_prompt, image, max_tokens=200)
print(output.text)

CLI Usage

python -m mlx_vlm.generate \
    --model jinaai/jina-vlm-mlx \
    --image photo.jpg \
    --prompt "Describe this image." \
    --max-tokens 200

License

CC BY-NC 4.0. Commercial use: contact Jina AI.

Citation

@misc{koukounas2025jinavlm,
    title={Jina-VLM: Small Multilingual Vision Language Model},
    author={Andreas Koukounas and Georgios Mastrapas and Florian Hönicke and Sedigheh Eslami and Guillaume Roncari and Scott Martens and Han Xiao},
    year={2025},
    eprint={2512.04032},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2512.04032},
}
Downloads last month
309
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jinaai/jina-vlm-mlx

Finetuned
jinaai/jina-vlm
Quantized
(1)
this model

Collection including jinaai/jina-vlm-mlx