INT8 LLMs for vLLM - a neuralmagic Collection

neuralmagic 's Collections

DeepSeek-R1-Distill Quantized

Granite 3.1 Quantization

Sparse-Llama-3.1-2of4

Vision Language Models Quantization

FP8 LLMs for vLLM

Llama-3.2 Quantization

Llama-3.1 Quantization

INT8 LLMs for vLLM

INT4 LLMs for vLLM

Sparse Foundational Llama 2 Models

Compression Papers

DeepSparse Sparse LLMs

Sparse Finetuning MPT

Compressed LLMs from the Community

INT8 LLMs for vLLM

updated Sep 26, 2024

Accurate INT8 quantized models by Neural Magic, ready for use with vLLM!

RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w8a8

Text Generation • 71B • Updated Feb 11 • 9.5k • 21
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a8

Text Generation • 8B • Updated Sep 22 • 34.4k • 17
RedHatAI/Meta-Llama-3.1-405B-Instruct-quantized.w8a8

Text Generation • 406B • Updated Dec 3, 2024 • 24 • 1
RedHatAI/Phi-3-medium-128k-instruct-quantized.w8a8

Text Generation • 14B • Updated Oct 9, 2024 • 6 • 2
RedHatAI/Phi-3-mini-128k-instruct-quantized.w8a8

Text Generation • 4B • Updated Oct 9, 2024 • 3
RedHatAI/gemma-2-9b-it-quantized.w8a8

Text Generation • 10B • Updated Oct 9, 2024 • 5 • 2
RedHatAI/Meta-Llama-3-70B-Instruct-quantized.w8a16

Text Generation • 19B • Updated Jul 18, 2024 • 16 • 5
RedHatAI/Qwen2-72B-Instruct-quantized.w8a16

Text Generation • 20B • Updated Jul 18, 2024 • 10 • 1
RedHatAI/Llama-2-7b-chat-quantized.w8a16

Text Generation • 2B • Updated Jul 18, 2024 • 18
RedHatAI/Meta-Llama-3-8B-Instruct-quantized.w8a16

Text Generation • 3B • Updated Jul 18, 2024 • 290 • 3
RedHatAI/Qwen2-0.5B-Instruct-quantized.w8a16

Text Generation • 0.2B • Updated Jul 18, 2024 • 123
RedHatAI/Qwen2-1.5B-Instruct-quantized.w8a16

Text Generation • 0.6B • Updated Jul 18, 2024 • 6
RedHatAI/Qwen2-7B-Instruct-quantized.w8a16

Text Generation • 3B • Updated Jul 18, 2024 • 6
RedHatAI/Mistral-7B-Instruct-v0.3-quantized.w8a16

Text Generation • 2B • Updated Jul 18, 2024 • 72
RedHatAI/Phi-3-mini-128k-instruct-quantized.w8a16

Text Generation • 1B • Updated Oct 9, 2024 • 7
RedHatAI/Phi-3-medium-128k-instruct-quantized.w8a16

Text Generation • 4B • Updated Oct 9, 2024 • 2 • 2
RedHatAI/Meta-Llama-3-8B-Instruct-quantized.w8a8

Text Generation • 8B • Updated Oct 9, 2024 • 3.95k • 2
RedHatAI/Llama-2-7b-chat-quantized.w8a8

Text Generation • 7B • Updated Oct 9, 2024 • 348 • 1
RedHatAI/Qwen2-0.5B-Instruct-quantized.w8a8

Text Generation • 0.6B • Updated Oct 9, 2024 • 16
RedHatAI/Qwen2-1.5B-Instruct-quantized.w8a8

Text Generation • 2B • Updated Oct 9, 2024 • 66
RedHatAI/Qwen2-7B-Instruct-quantized.w8a8

Text Generation • 8B • Updated Oct 9, 2024 • 7
RedHatAI/Qwen2-72B-Instruct-quantized.w8a8

Text Generation • 73B • Updated Oct 9, 2024 • 6 • 1
RedHatAI/Meta-Llama-3-70B-Instruct-quantized.w8a8

Text Generation • 71B • Updated Oct 9, 2024 • 4
RedHatAI/Mistral-7B-Instruct-v0.3-quantized.w8a8

Text Generation • 7B • Updated Oct 9, 2024 • 20 • 2
RedHatAI/Meta-Llama-3.1-8B-Instruct-quantized.w8a16

Text Generation • 3B • Updated Oct 23, 2024 • 2.46k • 11
RedHatAI/Meta-Llama-3.1-70B-Instruct-quantized.w8a16

Text Generation • 19B • Updated Oct 9, 2024 • 9 • 5
RedHatAI/Meta-Llama-3.1-8B-quantized.w8a16

Text Generation • 3B • Updated Oct 9, 2024 • 41 • 1
RedHatAI/Meta-Llama-3.1-8B-quantized.w8a8

Text Generation • 8B • Updated Oct 23, 2024 • 253 • 5
RedHatAI/Mistral-Nemo-Instruct-2407-quantized.w8a16

Text Generation • 4B • Updated Oct 9, 2024 • 4
RedHatAI/starcoder2-7b-quantized.w8a16

Text Generation • 2B • Updated Oct 9, 2024 • 7
RedHatAI/starcoder2-15b-quantized.w8a16

Text Generation • 4B • Updated Oct 9, 2024 • 2
RedHatAI/starcoder2-3b-quantized.w8a16

Text Generation • 1B • Updated Oct 9, 2024 • 1
RedHatAI/starcoder2-15b-quantized.w8a8

Text Generation • 16B • Updated Oct 9, 2024 • 4
RedHatAI/starcoder2-7b-quantized.w8a8

Text Generation • 7B • Updated Oct 9, 2024 • 4
RedHatAI/starcoder2-3b-quantized.w8a8

Text Generation • 3B • Updated Oct 9, 2024 • 1
RedHatAI/gemma-2-2b-it-quantized.w8a16

Text Generation • 2B • Updated Oct 9, 2024 • 12 • 1
RedHatAI/Phi-3-small-128k-instruct-quantized.w8a16

Text Generation • 3B • Updated Oct 9, 2024 • 3
RedHatAI/SmolLM-1.7B-Instruct-quantized.w8a16

Text Generation • 0.6B • Updated Oct 9, 2024 • 3
RedHatAI/gemma-2-2b-quantized.w8a16

Text Generation • 2B • Updated Oct 9, 2024 • 48
RedHatAI/gemma-2-9b-it-quantized.w8a16

Text Generation • 4B • Updated Oct 9, 2024 • 43 • 1
RedHatAI/gemma-2-2b-it-quantized.w8a8

Text Generation • 3B • Updated Oct 9, 2024 • 10
RedHatAI/Meta-Llama-3.1-405B-Instruct-quantized.w8a16

Text Generation • 105B • Updated Oct 9, 2024 • 3 • 1
RedHatAI/SmolLM-135M-Instruct-quantized.w8a16

Text Generation • 83.4M • Updated Oct 9, 2024 • 107
RedHatAI/gemma-2-27b-it-quantized.w8a16

Text Generation • 9B • Updated Oct 9, 2024 • 4
RedHatAI/SmolLM-360M-Instruct-quantized.w8a8

Text Generation • 0.4B • Updated Oct 9, 2024 • 2
RedHatAI/SmolLM-135M-Instruct-quantized.w8a8

Text Generation • 0.2B • Updated Oct 9, 2024 • 4 • 1
RedHatAI/Llama-3.2-3B-Instruct-quantized.w8a8

Text Generation • 4B • Updated Jul 10 • 334 • 1