Update README.md
Browse files
README.md
CHANGED
|
@@ -234,7 +234,7 @@ messages = [
|
|
| 234 |
sampling_params = SamplingParams(max_tokens=128_000)
|
| 235 |
|
| 236 |
# note that running this model on GPU requires over 300 GB of GPU RAM
|
| 237 |
-
llm = LLM(model=model_name, tokenizer_mode="mistral",
|
| 238 |
|
| 239 |
outputs = llm.chat(messages, sampling_params=sampling_params)
|
| 240 |
|
|
@@ -249,7 +249,7 @@ You can also use Pixtral-Large-Instruct-2411 in a server/client setting.
|
|
| 249 |
1. Spin up a server:
|
| 250 |
|
| 251 |
```
|
| 252 |
-
vllm serve mistralai/Pixtral-Large-Instruct-2411 --tokenizer_mode mistral --limit_mm_per_prompt 'image=4'
|
| 253 |
```
|
| 254 |
|
| 255 |
2. And ping the client:
|
|
|
|
| 234 |
sampling_params = SamplingParams(max_tokens=128_000)
|
| 235 |
|
| 236 |
# note that running this model on GPU requires over 300 GB of GPU RAM
|
| 237 |
+
llm = LLM(model=model_name, tokenizer_mode="mistral", tensor_parallel_size=8, limit_mm_per_prompt={"image": 4})
|
| 238 |
|
| 239 |
outputs = llm.chat(messages, sampling_params=sampling_params)
|
| 240 |
|
|
|
|
| 249 |
1. Spin up a server:
|
| 250 |
|
| 251 |
```
|
| 252 |
+
vllm serve mistralai/Pixtral-Large-Instruct-2411 --tokenizer_mode mistral --limit_mm_per_prompt 'image=4' --tensor_parallel_size 8
|
| 253 |
```
|
| 254 |
|
| 255 |
2. And ping the client:
|