Quark Quantized Auto Mixed Precision (AMP) Models
Collection
6 items
•
Updated
•
2
This model was created by applying Quark with calibration samples from Pile dataset.
The Quark quantized Auto Mixed Precision (AMP) models are now supported to be easily deployed in vLLM backend (vLLM-compatible).
The quantization evaluation results are conducted in pseudo-quantization mode, which may slightly differ from the actual quantized inference accuracy. These results are provided for reference only.
| Quant scheme | arc challenge (↑) (acc) |
gsm8k (↑) (strict-match) |
mmlu (↑) (acc) |
winogrande (↑) (acc) |
||||
|---|---|---|---|---|---|---|---|---|
| absolute value | recovery rate | absolute value | recovery rate | absolute value | recovery rate | absolute value | recovery rate | |
| FP16 | 0.5290 | 100.0% | 0.5049 | 100.0% | 0.6110 | 100.0% | 0.7490 | 100.0% |
| FP8 | 0.5265 | 99.5% | 0.5262 | 104.2% | 0.6107 | 100.0% | 0.7451 | 99.5% |
| AMP | 0.5273 | 99.7% | 0.5125 | 101.5% | 0.6007 | 98.3% | 0.7324 | 97.8% |
| MXFP4 | 0.5094 | 96.3% | 0.4572 | 90.6% | 0.5869 | 96.1% | 0.7316 | 97.7% |
Modifications copyright(c) 2024 Advanced Micro Devices,Inc. All rights reserved.
Built with Meta Llama.
Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.
Base model
meta-llama/Llama-2-70b-chat-hf