YongganFu commited on
Commit
f0b7268
·
verified ·
1 Parent(s): 1c60b17

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -38,7 +38,7 @@ We provide a <a href="https://huggingface.co/nvidia/Nemotron-Flash-3B/resolve/ma
38
 
39
  ## Chat with Nemotron-Flash
40
 
41
- We integrate the Triton attention kernel from <a href="https://nvidia.github.io/TensorRT-LLM/torch/auto_deploy/auto-deploy.html">TRT-LLM AutoDeploy</a> to enable generation with CUDA Graph:
42
 
43
  ```
44
  from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -82,7 +82,7 @@ outputs = model.generate_with_cache(
82
  temperature=0,
83
  eos_token_id=tokenizer.eos_token_id,
84
  )
85
- ``
86
 
87
  ## Finetune Nemotron-Flash
88
 
 
38
 
39
  ## Chat with Nemotron-Flash
40
 
41
+ We integrated the attention kernel from <a href="https://nvidia.github.io/TensorRT-LLM/torch/auto_deploy/auto-deploy.html">TRT-LLM AutoDeploy</a> to enable generation with CUDA Graph:
42
 
43
  ```
44
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
82
  temperature=0,
83
  eos_token_id=tokenizer.eos_token_id,
84
  )
85
+ ```
86
 
87
  ## Finetune Nemotron-Flash
88