Activation Quantization Process
#1
by
wantsleep
- opened
When I load this quantized model from HuggingFace, am I only loading quantized weights? How does activation quantization work during inference since I didn't change any forward method??
- Also, how can I verify whether activation tensors are actually quantized at runtime?
wantsleep
changed discussion status to
closed
This comment has been hidden (marked as Spam)
wantsleep
changed discussion status to
open
W8A8 quantization is a weights and activation quantization method. Activation are computed at INT8. When using vLLM not additional configuration is required.