jina-vlm
Collection
Jina-VLM: Small Multilingual Vision Language Model
•
3 items
•
Updated
•
7
Native MLX port of jina-vlm for Apple Silicon with 4-bit quantization.
2.0 GB (down from 9.2 GB fp32, 79% compression)
jina-vlm support is already merged into mlx-vlm master but not yet released. Until the next release, install from the main branch:
pip install git+https://github.com/Blaizzy/mlx-vlm.git@main
from mlx_vlm import load, generate
from mlx_vlm.prompt_utils import apply_chat_template
from mlx_vlm.utils import load_config
# Load model
model, processor = load("jinaai/jina-vlm-mlx")
config = load_config("jinaai/jina-vlm-mlx")
# Prepare input
image = ["photo.jpg"]
prompt = "Describe this image."
# Apply chat template
formatted_prompt = apply_chat_template(
processor, config, prompt, num_images=1
)
# Generate
output = generate(model, processor, formatted_prompt, image, max_tokens=200)
print(output.text)
python -m mlx_vlm.generate \
--model jinaai/jina-vlm-mlx \
--image photo.jpg \
--prompt "Describe this image." \
--max-tokens 200
CC BY-NC 4.0. Commercial use: contact Jina AI.
@misc{koukounas2025jinavlm,
title={Jina-VLM: Small Multilingual Vision Language Model},
author={Andreas Koukounas and Georgios Mastrapas and Florian Hönicke and Sedigheh Eslami and Guillaume Roncari and Scott Martens and Han Xiao},
year={2025},
eprint={2512.04032},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.04032},
}