jusjinuk
/

Llama-2-13b-hf-4bit-GuidedQuant-QTIP

Model card Files Files and versions

Llama-2-13b-hf-4bit-GuidedQuant-QTIP / README.md

jusjinuk's picture

Create README.md

d9a89de verified 6 months ago

|

history blame contribute delete

615 Bytes

	---
	base_model:
	- meta-llama/Llama-2-13b-hf
	base_model_relation: quantized
	license: llama2
	---
	# Model Card

	- Base model: `meta-llama/Llama-2-13b-hf`
	- Quantization method: BlockLDLQ with GuidedQuant Hessian
	- Target bit-width: 4
	- Backend kernel: QTIP kernel (HYB variant)
	- Calibration data: RedPajama (1024 sentences / 4096 tokens)
	- Calibration objective: Next-token prediction
	- num_groups (for GuidedQuant Hessian): 4

	# How to run
	- Follow the instruction in https://github.com/snu-mllab/GuidedQuant and https://github.com/Cornell-RelaxML/qtip

	# References
	- [Model Paper](https://arxiv.org/abs/2505.07004)