Unable to load Q5_K_M model in Ollama 0.12.5 - Error 500: unable to load model blob

#6
by acrosley - opened

Issue Summary

Getting a persistent 500 Internal Server Error: unable to load model when attempting to run the Q5_K_M quantization of this model with Ollama.

Environment

  • Ollama Version: 0.12.5
  • OS: Windows 10/11 (Build 26100)
  • Model: hf.co/yairpatch/Qwen3-VL-30B-A3B-Instruct-GGUF:Q5_K_M
  • Model Size: 22 GB (listed), 20.23 GB (blob file)

Steps to Reproduce

  1. Pull the model: ollama pull hf.co/yairpatch/Qwen3-VL-30B-A3B-Instruct-GGUF:Q5_K_M
  2. Model downloads successfully (all blobs at 100%)
  3. Run: ollama run hf.co/yairpatch/Qwen3-VL-30B-A3B-Instruct-GGUF:Q5_K_M

Expected Behavior

Model should load and start an interactive chat session.

Actual Behavior

Error: 500 Internal Server Error: unable to load model: D:\OllamaModels\blobs\sha256-a3dcf99539e09f8a9f5578508bc0b834f62b0bd85e4764d56e942a9d89def85b

Troubleshooting Attempted

  • βœ… Verified blob file exists and is the correct size (20.23 GB)
  • βœ… Removed and re-downloaded the model (checksums verified)
  • βœ… Restarted Ollama service
  • βœ… Confirmed model appears in ollama list

Questions

  1. Is the Q5_K_M quantization compatible with Ollama 0.12.5?
  2. Are there known issues with Qwen3-VL vision models in Ollama?
  3. What are the minimum system requirements (RAM/VRAM) for this quantization?
  4. Would a different quantization (Q4_K_M, Q6_K, etc.) work better with Ollama?

Any guidance would be appreciated!

ps- ty cursor for writing this

yair patch org

It is not compatible with Ollama at this point.

How to run it locally?

I couldn't figure out how to run it locally and decided to just use 8B-instruct which is actually still really good

llama.cpp-tr-qwen3-vl-3-b6981-ab45b1a supports conversion to GGUF format and can be tested using llama-mtmd-cli.

Sign up or log in to comment