How to use it correctly with online serving via vllm openai compatible server?

#55

by dhruvil237 - opened 8 days ago

8 days ago

Using the below command not sure if its setup correctly.
vllm serve deepseek-ai/DeepSeek-OCR --no-enable-prefix-caching --mm-processor-cache-gb 0 --logits-processors vllm.model_executor.models.deepseek_ocr:NGramPerReqLogitsProcessor

then calling it this way:
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-OCR",
messages=message,
temperature=0.0,
max_tokens=500,
# ngram logit processor args
extra_body={
"ngram_size": 30,
"window_size": 90,
"whitelist_token_ids": [128821, 128822],
"skip_special_tokens": False, # whitelist: ,
}
)

I am not sure if the parameters passed are affecting anything.
Can someone explain why are those parameters required and are the setup correctly?

dhruvilHV

8 days ago

corrected serving command:
vllm serve deepseek-ai/DeepSeek-OCR --no-enable-prefix-caching --mm-processor-cache-gb 0 --logits-processors vllm.model_executor.models.deepseek_ocr:NGramPerReqLogitsProcessor --enable-log-requests --gpu-memory-utilization 0.4 --chat-template /home/ubuntu/llm-ocr-exp/template_deepseek_ocr.jinja

inference:

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-OCR",
    messages=message,
    temperature=0.0,
    max_tokens=500,
    # ngram logit processor args
    extra_body={
        "vllm_xargs": {
            "ngram_size": 30,
            "window_size": 90,
            # "whitelist_token_ids": [128821, 128822],
        },
        "skip_special_tokens": False,  # whitelist: <td>, </td>
    }
)

thanhhuynhk17

8 days ago

This comment has been hidden (marked as Resolved)

sirabhop

6 days ago

@dhruvilHV Can I see your --chat-template /home/ubuntu/llm-ocr-exp/template_deepseek_ocr.jinja?

dzigald

2 days ago

What about the image sizes? Do we need to pass additional arguments to the API call and if so, how? For example, how to signal you want the gundam level of quality through this API call?

luzox

about 13 hours ago

What about the image sizes? Do we need to pass additional arguments to the API call and if so, how? For example, how to signal you want the gundam level of quality through this API call?

same question with you, i dont know to choose model if using vLLM serve

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment