The model is outputting garbage

by sakurakotomi - opened Aug 21

Aug 21

Hi. I have deployed this model on a H100 SXM * 8 Linux server using vllm & docker:

docker run -d --name vllm --gpus '"device=0,1,2,3,4,5,6,7"' --shm-size=600GB  \
  --network panda -p 8001:8000 \
  -v /mnt/data1/models/DeepSeek-R1-0528-AWQ:/local-model \
  -e VLLM_USE_V1=0 \
  -e VLLM_WORKER_MULTIPROC_METHOD=spawn \
  -e VLLM_MARLIN_USE_ATOMIC_ADD=1 \
  vllm/vllm-openai:latest \
  --model /local-model \
  --quantization awq_marlin \
  --dtype float16 \
  --tensor-parallel-size 8 \
  --gpu-memory-utilization 0.85 \
  --max-model-len 65536 \
  --max-seq-len-to-capture 65536 \
  --enable-chunked-prefill \
  --enable-prefix-caching \
  --trust-remote-code \
  --served-model-name deepseek-r1 \
  --host 0.0.0.0

The vllm openai api server started, but it is replying garbage to every request, e.g.

I: who are you?

DeekSeek:

<think>
I am an AI assistant here to help you with your questions and tasks) and I'm an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks)...

I have tried some different vllm flag set(like removing -e VLLM_USE_V1=0), but the result is the same.

Does anyone know what's the problem?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment