Unable to load Q5_K_M model in Ollama 0.12.5 - Error 500: unable to load model blob
#6
by
acrosley
- opened
Issue Summary
Getting a persistent 500 Internal Server Error: unable to load model when attempting to run the Q5_K_M quantization of this model with Ollama.
Environment
- Ollama Version: 0.12.5
- OS: Windows 10/11 (Build 26100)
- Model:
hf.co/yairpatch/Qwen3-VL-30B-A3B-Instruct-GGUF:Q5_K_M - Model Size: 22 GB (listed), 20.23 GB (blob file)
Steps to Reproduce
- Pull the model:
ollama pull hf.co/yairpatch/Qwen3-VL-30B-A3B-Instruct-GGUF:Q5_K_M - Model downloads successfully (all blobs at 100%)
- Run:
ollama run hf.co/yairpatch/Qwen3-VL-30B-A3B-Instruct-GGUF:Q5_K_M
Expected Behavior
Model should load and start an interactive chat session.
Actual Behavior
Error: 500 Internal Server Error: unable to load model: D:\OllamaModels\blobs\sha256-a3dcf99539e09f8a9f5578508bc0b834f62b0bd85e4764d56e942a9d89def85b
Troubleshooting Attempted
- β Verified blob file exists and is the correct size (20.23 GB)
- β Removed and re-downloaded the model (checksums verified)
- β Restarted Ollama service
- β
Confirmed model appears in
ollama list
Questions
- Is the Q5_K_M quantization compatible with Ollama 0.12.5?
- Are there known issues with Qwen3-VL vision models in Ollama?
- What are the minimum system requirements (RAM/VRAM) for this quantization?
- Would a different quantization (Q4_K_M, Q6_K, etc.) work better with Ollama?
Any guidance would be appreciated!
ps- ty cursor for writing this
It is not compatible with Ollama at this point.
How to run it locally?
I couldn't figure out how to run it locally and decided to just use 8B-instruct which is actually still really good
llama.cpp-tr-qwen3-vl-3-b6981-ab45b1a supports conversion to GGUF format and can be tested using llama-mtmd-cli.