GGUFs have buggy Japanese OCR performance
Edit: The latest llama.cpp seems to have fixed this.
I'm posting this here instead of a specific GGUF repo as this affects all GGUFs.
I've been testing the Qwen3 VL 32B GGUFs with Japanese OCR tasks and they've been having issues while the same model on Qwen Chat is able to OCR the text correctly. Here's an example to demonstrate:
Prompt:OCR the text in this image.
Edit: Try the below prompt instead, it makes the difference more apparentTranscribe the text in this image.
First, the correct OCR from Qwen3 VL 32B on Qwen Chat:
γγ
γγ
γγγγ
γγ γγͺγεε¨
γγγγ γ
However, when the GGUF conversion of the model is given the same task it produces incorrect OCR for the first 2 text bubbles. There are a few characters it might say it is instead, but they're all wrong.
Qwen3 VL 32B Instruct:
γγγ
γγγ
γγγγ
γγ γγͺγεε¨
γγγγ γ
or
γ
γ
γγγγ
γγ γγͺγεε¨
γγγγ γ
Giving the same task to Qwen3 VL 32B Thinking is even more revealing:
So, let's look at the image. It's a pink background with white speech bubbles containing Japanese text. Let's read each bubble one by one.
First bubble: γγ (but wait, the characters are γγ? Wait no, let's check. The first bubble has γγ? Wait, no, the characters are γγ? Wait, no, looking at the image: first bubble is γγ? Wait, no, the first two characters are γ and γ? Wait, no, let's see. Wait, the first bubble: the text is γγ? Wait, no, the first bubble's text is γγ? Wait, no, the first bubble has two characters: γ and γ? Wait, no, let's check again.
Wait, the first speech bubble: the text is γγ? Wait, no, the first bubble: the characters are γ and γ, but written as γγ? Wait, no, in the image, the first bubble has γγ? Wait, no, let's look carefully. The first bubble: the text is γγ? Wait, no, the first two characters are γ and γ, but maybe it's γγ. Wait, no, let's list each bubble:
1. First bubble: γγ (but wait, the order: in Japanese, the characters are written in the bubble. Let's check each bubble's text.
First bubble: γγ β but wait, no, the first bubble's text is γγ? Wait, no, looking at the image, the first speech bubble has γ and γ, so γγ. Wait, no, the first bubble: the characters are γ and γ, so γγ. Wait, but let's check the actual image. The first bubble: the text is γγ? Wait, no, the first bubble's text is γγ. Wait, no, the first two characters: γ and γ, so γγ. Wait, no, the first bubble's text is
As you can see in the reasoning block the model is really struggling to parse the first characters and loops infinitely.
Key Points:
- I've also tested GGUFs of other models in the Qwen3 VL series, such as 30B-A3B and 8B dense. They're able to OCR the text perfectly, so this is likely an issue specific to the 32B version.
- This occurs on all GGUFs by all GGUF providers (both text model and mmproj), including those I converted myself. I even tried to use different llama.cpp binaries, including official releases and those I built myself. There is no improvement. So this is likely an issue further up.
- All testing was done with Q8_0 quants (f16/bf16 mmproj) except for once when I downloaded unsloth's Q5_XL quant to see if that helped (it didn't).
- I used a variety of sampler settings when testing.
- I only tested Japanese OCR, but my intuition is that if there are issues with this then there are probably issues with other things as well.
Those are my findings, and I hope they are able to help the Qwen team in some way. This is a pretty quick test to perform, so I encourage others to try it and see for themselves.
Update:
I've tested the model with a bf16 GGUF + bf16 mmproj, which should be lossless. It still failed to OCR the first 2 text bubbles correctly and made the same errors, so this is not an error induced by quantization.
Just tried your prompt and image in Qwenchat 3 times and they all got the same results as the results you've been getting for the GGUFs. You may have been lucky to get the correct result that time. The model seems to be just like that
CC: @TPH441
Thank you for testing! @danielhanchen
I tested it on qwen chat a bit more, and out of 10 attempts it got it right 2 times and wrong 8 times. Example of it getting it right:
I think there is still a bit of a difference though, since the version on qwen chat is able to get it right sometimes while I have not been able to get a gguf to get it right a single time despite probably giving it 50 prompts at this point across ggufs from different providers.
Also I tested the thinking version on qwen chat and it got it right 4/5 times, while the gguf I tested simply looped.
To be fair though I did the testing on the thinking version before you released your thinking chat template fixes. So I'll test those quants at q8_0 and report back.
@MrDragonFox Neat! That looks correct to me.
Also, I looked through my old Qwen Chat history. I tested it much more a couple of days ago, although I used a slightly different prompt then.
Back then the instruct 32B got it right 16/18 times.
I tested it again 3 times on qwen chat and it got it right 3/3 times, so it seems like that one word change in the prompt makes a huge difference.
On qwen chat at least. Locally on Q8_0 it still gets it wrong even with the prompt difference.
Edited OP to reflect this prompt instead as it makes the difference more apparent.
@shimmyshimmer The correct OCR above was produced on a gguf I converted before qwen3 support was even officially merged. A newly converted qwen3 vl gguf also had the same hash as a gguf I converted days ago, so no you don't need to reconvert ggufs.








