--- language: - en license: apache-2.0 tags: - mlx - vision - multimodal base_model: janhq/Jan-v2-VL-med --- # Jan-v2-VL-med BF16 MLX This is a BF16 (bfloat16) precision MLX conversion of [janhq/Jan-v2-VL-med](https://huggingface.co/janhq/Jan-v2-VL-med). ## Model Description Jan-v2-VL is an 8-billion parameter vision-language model designed for long-horizon, multi-step tasks in real software environments. This "med" variant provides a balanced trade-off between inference speed and reasoning capability, offering strong performance for agentic automation and UI control tasks. **Key Features:** - Vision-language understanding for browser and desktop applications - Screenshot grounding and tool call capabilities - Stable multi-step execution with minimal performance drift - Error recovery and intermediate state maintenance ## Precision This model was converted to MLX format with bfloat16 precision (no quantization) using [MLX-VLM](https://github.com/Blaizzy/mlx-vlm) by Prince Canuma. BF16 provides near full-precision quality with reduced memory footprint. **Conversion command:** ```bash mlx_vlm.convert --hf-path janhq/Jan-v2-VL-med --dtype bfloat16 --mlx-path Jan-v2-VL-med-bf16-mlx ``` ## Usage ### Installation ```bash pip install mlx-vlm ``` ### Python ```python from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model_path = "mlx-community/Jan-v2-VL-med-bf16-mlx" model, processor = load(model_path) config = load_config(model_path) # Prepare input image = ["path/to/image.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=len(image) ) # Generate output output = generate(model, processor, formatted_prompt, image, verbose=False) print(output) ``` ### Command Line ```bash mlx_vlm.generate --model mlx-community/Jan-v2-VL-med-bf16-mlx --max-tokens 100 --prompt "Describe this image" --image path/to/image.jpg ``` ## Intended Use This model is designed for: - Agentic automation and UI control - Stepwise operation in browsers and desktop applications - Screenshot grounding and tool calls - Long-horizon multi-step task execution ## License This model is released under the Apache 2.0 license. ## Original Model For more information, please refer to the original model: [janhq/Jan-v2-VL-med](https://huggingface.co/janhq/Jan-v2-VL-med) ## Acknowledgments - Original model by [Jan](https://huggingface.co/janhq) - [MLX](https://github.com/ml-explore/mlx) framework by Apple - MLX conversion framework by [Prince Canuma](https://github.com/Blaizzy/mlx-vlm) - Model conversion by [Incept5](https://incept5.ai)