LLaMA 3.7B - Bfloat16

πŸ“š Paper β€’ 🏠 GitHub

This is one of the checkpoints supplementing the paper 1-Bit-Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization. Instructions on how to use the model for inference can be found in the corresponding repository.

⚠️ IMPORTANT: This model is intended for research purposes only. It is provided as-is without warranties for production use.

Model Details

  • Architecture: LLaMA
  • Size: 3.7B (3,747,523,584 parameters)

Directory Structure

.
β”œβ”€β”€ config.json                  # HuggingFace model config
β”œβ”€β”€ generation_config.json       # Default generation settings
β”œβ”€β”€ tokenizer.json              # Tokenizer files
└── model.safetensors           # Weights (in Bfloat16)

License

See LICENSE file in the repository.

Downloads last month
22
Safetensors
Model size
4B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support