nvidia
/

Nemotron-Flash-3B

Text Generation

Model card Files Files and versions

YongganFu commited on 19 days ago

Commit

f0b7268

·

verified ·

1 Parent(s): 1c60b17

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -38,7 +38,7 @@ We provide a <a href="https://huggingface.co/nvidia/Nemotron-Flash-3B/resolve/ma
 ## Chat with Nemotron-Flash
-We integrate the Triton attention kernel from <a href="https://nvidia.github.io/TensorRT-LLM/torch/auto_deploy/auto-deploy.html">TRT-LLM AutoDeploy</a> to enable generation with CUDA Graph:
 ```
 from transformers import AutoModelForCausalLM, AutoTokenizer
@@ -82,7 +82,7 @@ outputs = model.generate_with_cache(
     temperature=0,
     eos_token_id=tokenizer.eos_token_id,
     )
-``
 ## Finetune Nemotron-Flash

 ## Chat with Nemotron-Flash
+We integrated the attention kernel from <a href="https://nvidia.github.io/TensorRT-LLM/torch/auto_deploy/auto-deploy.html">TRT-LLM AutoDeploy</a> to enable generation with CUDA Graph:
 ```
 from transformers import AutoModelForCausalLM, AutoTokenizer
     temperature=0,
     eos_token_id=tokenizer.eos_token_id,
     )
+```
 ## Finetune Nemotron-Flash