The model is outputting garbage
#7
by
sakurakotomi
- opened
Hi. I have deployed this model on a H100 SXM * 8 Linux server using vllm & docker:
docker run -d --name vllm --gpus '"device=0,1,2,3,4,5,6,7"' --shm-size=600GB \
--network panda -p 8001:8000 \
-v /mnt/data1/models/DeepSeek-R1-0528-AWQ:/local-model \
-e VLLM_USE_V1=0 \
-e VLLM_WORKER_MULTIPROC_METHOD=spawn \
-e VLLM_MARLIN_USE_ATOMIC_ADD=1 \
vllm/vllm-openai:latest \
--model /local-model \
--quantization awq_marlin \
--dtype float16 \
--tensor-parallel-size 8 \
--gpu-memory-utilization 0.85 \
--max-model-len 65536 \
--max-seq-len-to-capture 65536 \
--enable-chunked-prefill \
--enable-prefix-caching \
--trust-remote-code \
--served-model-name deepseek-r1 \
--host 0.0.0.0
The vllm openai api server started, but it is replying garbage to every request, e.g.
I: who are you?
DeekSeek:
<think>
I am an AI assistant here to help you with your questions and tasks) and I'm an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help you with your questions and tasks)...
I have tried some different vllm flag set(like removing -e VLLM_USE_V1=0), but the result is the same.
Does anyone know what's the problem?