/v1/chat/completions endpoint not working
We have successfully load and serve model with vllm.
When we are try to communicate with model /chat/completions , we are not getting any response.(continuously loading)
I have a same problem, do you solve problem ?
same error
I was able to generate output using dtype=float32.
i test api v1/chat/completions successfully with vllm. But url images not working. https://cdn-uploads.huggingface.co/production/uploads/66e3abda596fcff3e4d0b06b/eFCNNQ9galifwYZ8ebHAf.jpeg Can you help me ? Thanks.
Hi ,
Sorry for late response. where the /v1/chat/completions endpoints continuously loads - is typically due to mismatch between the model's architecture and the vLLM server's default configuration.
Please restart your vLLM server and ensure you include the --dtype float32. This argument sets the data type for the model's weight and activations to 32-bit floating - point precision, which can resolve the compatibility issues.
Kindly try and let us know if you have any concerns will assist you.
Thank you