/v1/chat/completions endpoint not working

#32
by PremkumarChandak - opened

We have successfully load and serve model with vllm.

When we are try to communicate with model /chat/completions , we are not getting any response.(continuously loading)

PremkumarChandak changed discussion title from /v1C endpoint not working to /v1/chat/completions endpoint not working

I have a same problem, do you solve problem ?

same error

I was able to generate output using dtype=float32.

i test api v1/chat/completions successfully with vllm. But url images not working. https://cdn-uploads.huggingface.co/production/uploads/66e3abda596fcff3e4d0b06b/eFCNNQ9galifwYZ8ebHAf.jpeg Can you help me ? Thanks.

Google org

Hi ,
Sorry for late response. where the /v1/chat/completions endpoints continuously loads - is typically due to mismatch between the model's architecture and the vLLM server's default configuration.

Please restart your vLLM server and ensure you include the --dtype float32. This argument sets the data type for the model's weight and activations to 32-bit floating - point precision, which can resolve the compatibility issues.

Kindly try and let us know if you have any concerns will assist you.

Thank you

Sign up or log in to comment