patrickvonplaten commited on
Commit
120679d
·
verified ·
1 Parent(s): b339928

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -234,7 +234,7 @@ messages = [
234
  sampling_params = SamplingParams(max_tokens=128_000)
235
 
236
  # note that running this model on GPU requires over 300 GB of GPU RAM
237
- llm = LLM(model=model_name, tokenizer_mode="mistral", tensor_parallel=8, limit_mm_per_prompt={"image": 4})
238
 
239
  outputs = llm.chat(messages, sampling_params=sampling_params)
240
 
@@ -249,7 +249,7 @@ You can also use Pixtral-Large-Instruct-2411 in a server/client setting.
249
  1. Spin up a server:
250
 
251
  ```
252
- vllm serve mistralai/Pixtral-Large-Instruct-2411 --tokenizer_mode mistral --limit_mm_per_prompt 'image=4'
253
  ```
254
 
255
  2. And ping the client:
 
234
  sampling_params = SamplingParams(max_tokens=128_000)
235
 
236
  # note that running this model on GPU requires over 300 GB of GPU RAM
237
+ llm = LLM(model=model_name, tokenizer_mode="mistral", tensor_parallel_size=8, limit_mm_per_prompt={"image": 4})
238
 
239
  outputs = llm.chat(messages, sampling_params=sampling_params)
240
 
 
249
  1. Spin up a server:
250
 
251
  ```
252
+ vllm serve mistralai/Pixtral-Large-Instruct-2411 --tokenizer_mode mistral --limit_mm_per_prompt 'image=4' --tensor_parallel_size 8
253
  ```
254
 
255
  2. And ping the client: