(vLLM) Tool calling broken after update to tokenizer_config.json
I'm serving the model with vLLM, but commit 66c370b modified the chat template in a way that removed tool support, breaking features such as tool calling via tool_choice='auto' with the OpenAI Chat Completion Client.
Workaround: use the previous version (hash: 05440b7) of tokenizer_config.json.
For vLLM specifically, serve your model using the the following argument:
--tokenizer-revision 05440b7
Link to commit change:
https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct-AWQ/commit/66c370b74a18e7b1e871c97918f032ed3578dfef
Qwen 2.5 VL is a great model but tool calling works only with "required" in vllm.
The option suggested here (adding --tokenizer-revision 05440b7005147091006f2d72024a2d86801a4418) doesn't work anymore and throws an error:
ValueError: Unrecognized model in Qwen/Qwen2.5-VL-72B-Instruct-AWQ. Should have a `model_type` key in its config.json, or contain one of the following strings in its name: ...
... qwen2_5_omni, qwen2_5_vl, qwen2_5_vl_text, qwen2_audio ...
I also tried to copy the template from that commit and paste it manually in tokenizer_config.json but in this case vllm simply hangs and the request is never answered (while you can see the memory usage in nvidia-smi spiking, as well as the power, indicating that it's doing something).
Any suggestion at this point? Is Qwen 2.5 VL (both AWQ and non-quantized) still usable with tool call auto? I'm using vllm 0.10.1.1.