Update README.md
Browse files
README.md
CHANGED
|
@@ -6,7 +6,7 @@ tags:
|
|
| 6 |
|
| 7 |
|
| 8 |
Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.
|
| 9 |
-
This model checkpoint also includes per-tensor scales for FP8 quantized KV Cache, accessed through the `--kv-cache-dtype fp8` argument in vLLM.
|
| 10 |
|
| 11 |
```python
|
| 12 |
from vllm import LLM
|
|
|
|
| 6 |
|
| 7 |
|
| 8 |
Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.5.0.
|
| 9 |
+
This model checkpoint also includes experimental per-tensor scales for FP8 quantized KV Cache, accessed through the `--kv-cache-dtype fp8` argument in vLLM.
|
| 10 |
|
| 11 |
```python
|
| 12 |
from vllm import LLM
|