leejunhyeok commited on
Commit
7f8f6e3
·
verified ·
1 Parent(s): aede6c2

add vllm example in readme

Browse files
Files changed (1) hide show
  1. README.md +27 -1
README.md CHANGED
@@ -112,4 +112,30 @@ The capital city of South Korea is Seoul.
112
  ```
113
 
114
  ## How to use in vllm
115
- TBD
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
112
  ```
113
 
114
  ## How to use in vllm
115
+ Currently, [PR]() for supporting motif model in official vllm package is under review.
116
+ To use our model with vllm, use this [image](https://github.com/motiftechnologies/vllm/pkgs/container/vllm)
117
+ Our model supports 32K seq length
118
+
119
+ The [PR](https://github.com/vllm-project/vllm/pull/27396) adding support for the Motif model in the official vLLM package is currently under review.
120
+
121
+ In the meantime, to use our model with vLLM, please use the following container [image](https://github.com/motiftechnologies/vllm/pkgs/container/vllm).
122
+ Our model supports a sequence length of up to 32K tokens.
123
+ ```bash
124
+ # run vllm api server
125
+ VLLM_ATTENTION_BACKEND="DIFFERENTIAL_FLASH_ATTN" vllm serve Motif-Technologies/Motif-2-12.7B-Instruct --trust-remote-code --data-parallel-size <gpu_count>
126
+
127
+ # sending requests with curl
128
+ curl http://localhost:8000/v1/chat/completions \
129
+ -H "Content-Type: application/json" \
130
+ -d '{
131
+ "messages": [
132
+ {"role": "system", "content": "You are a helpful assistant."},
133
+ {"role": "user", "content": "What is the capital city of South Korea?"}
134
+ ],
135
+ "temperature": 0.6,
136
+ "skip_special_tokens": false,
137
+ "chat_template_kwargs": {
138
+ "enable_thinking": true
139
+ }
140
+ }'
141
+ ```