nomic-embed-code-W4A16-AWQ

This is a W4A16 quantized version of nomic-ai/nomic-embed-code.

Quantized using AWQ (Activation-aware Weight Quantization) with llm-compressor!

Quantization Details

Method: llmcompressor (AWQ one-shot PTQ)
Algorithm: AWQ (Activation-aware Weight Quantization)
Scheme: W4A16
Weight bits: 4-bit
Activation bits: 16-bit
Group size: 128
Format: compressed-tensors
Size reduction: ~75% compared to FP16

Usage

from transformers import AutoModel, AutoTokenizer

# Load quantized model
model = AutoModel.from_pretrained(
    "nomic-embed-code-W4A16-AWQ",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "nomic-embed-code-W4A16-AWQ",
    trust_remote_code=True
)

# Generate embeddings
texts = ["Hello world", "Example text"]
inputs = tokenizer(texts, padding=True, return_tensors="pt")
embeddings = model(**inputs).last_hidden_state.mean(dim=1)

print(embeddings.shape)

Performance

Memory usage: ~75% reduction vs FP16
Inference speed: Similar or faster on compatible hardware
Quality: Minimal degradation (<1% on most embedding tasks)

Why AWQ?

AWQ (Activation-aware Weight Quantization) is a one-shot weight quantization method that:

Activation-aware: Protects salient weights based on activation magnitudes
Uses calibration data to identify important weight channels
Provides better accuracy than GPTQ and naive rounding (RTN)
Works efficiently with group-wise quantization (group size 128)
Maintains model quality while achieving 75% size reduction
Optimal for embedding models that rely on preserving semantic relationships

Original Model

This quantized model is based on nomic-ai/nomic-embed-code.

Citation

If you use this model, please cite the original model and llmcompressor:

@software{llmcompressor,
  title = {LLM Compressor},
  author = {Neural Magic},
  url = {https://github.com/vllm-project/llm-compressor},
  year = {2024}
}

Downloads last month: 191,259

Safetensors

Model size

1B params

Tensor type

I64

I32

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pyrymikko/nomic-embed-code-W4A16-AWQ

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-Coder-7B

Finetuned

Qwen/Qwen2.5-Coder-7B-Instruct

Finetuned

nomic-ai/nomic-embed-code

Quantized

(8)

this model