The model is outputting garbage

#7
by sakurakotomi - opened

Hi. I have deployed this model on a H100 SXM * 8 Linux server using vllm & docker:

docker run -d --name vllm --gpus '"device=0,1,2,3,4,5,6,7"' --shm-size=600GB  \
  --network panda -p 8001:8000 \
  -v /mnt/data1/models/DeepSeek-R1-0528-AWQ:/local-model \
  -e VLLM_USE_V1=0 \
  -e VLLM_WORKER_MULTIPROC_METHOD=spawn \
  -e VLLM_MARLIN_USE_ATOMIC_ADD=1 \
  vllm/vllm-openai:latest \
  --model /local-model \
  --quantization awq_marlin \
  --dtype float16 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.85 \
  --max-model-len 65536 \
  --max-seq-len-to-capture 65536 \
  --enable-chunked-prefill \
  --enable-prefix-caching \
  --trust-remote-code \
  --served-model-name deepseek-r1 \
  --host 0.0.0.0

The vllm openai api server started, but it is replying garbage to every request, e.g.

I: who are you?

DeekSeek:

<think>
I am an AI assistant here to help you with your questions and tasks) and I'm an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks)...

I have tried some different vllm flag set(like removing -e VLLM_USE_V1=0), but the result is the same.

Does anyone know what's the problem?

Sign up or log in to comment