/v1/chat/completions endpoint not working

#32

by PremkumarChandak - opened Mar 21

Mar 21

We have successfully load and serve model with vllm.

When we are try to communicate with model /chat/completions , we are not getting any response.(continuously loading)

PremkumarChandak changed discussion title from /v1C endpoint not working to /v1/chat/completions endpoint not working Mar 21

ver2hay

Apr 25

I have a same problem, do you solve problem ?

WinterDry

May 6

same error

x21530317x

May 7

I was able to generate output using dtype=float32.

giayphuyen

May 9

i test api v1/chat/completions successfully with vllm. But url images not working. https://cdn-uploads.huggingface.co/production/uploads/66e3abda596fcff3e4d0b06b/eFCNNQ9galifwYZ8ebHAf.jpeg Can you help me ? Thanks.

lkv

Google org Sep 10

Hi ,
Sorry for late response. where the /v1/chat/completions endpoints continuously loads - is typically due to mismatch between the model's architecture and the vLLM server's default configuration.

Please restart your vLLM server and ensure you include the --dtype float32. This argument sets the data type for the model's weight and activations to 32-bit floating - point precision, which can resolve the compatibility issues.

Kindly try and let us know if you have any concerns will assist you.

Thank you

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment