LLaMA 3.7B - Bfloat16

📚 Paper • 🏠 GitHub

This is one of the checkpoints supplementing the paper 1-Bit-Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization. Instructions on how to use the model for inference can be found in the corresponding repository.

⚠️ IMPORTANT: This model is intended for research purposes only. It is provided as-is without warranties for production use.

Model Details

Architecture: LLaMA
Size: 3.7B (3,747,523,584 parameters)

Directory Structure

.
├── config.json                  # HuggingFace model config
├── generation_config.json       # Default generation settings
├── tokenizer.json              # Tokenizer files
└── model.safetensors           # Weights (in Bfloat16)

License

See LICENSE file in the repository.

Downloads last month: 22

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support