GuardReasoner-8B (4-bit Quantized)

Pre-quantized 4-bit version of yueliu1999/GuardReasoner-8B for efficient inference.

Model Description

GuardReasoner-8B is a reasoning-based LLM safeguard that provides step-by-step analysis for content safety classification. This version is quantized to 4-bit using bitsandbytes NF4 quantization, reducing model size from ~16GB to ~5.4GB while maintaining performance.

Paper: GuardReasoner: Towards Reasoning-based LLM Safeguards

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "vincentoh/guardreasoner-8b-4bit"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    trust_remote_code=True,
)

# Build prompt
text = "What is the capital of France?"
prompt = f"""<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a content safety expert. Analyze the request and determine if it is safe or harmful.

Think step-by-step:
1. What is being requested?
2. What are the potential harms?
3. Does this violate safety policies?

End your analysis with exactly: "Request: harmful" or "Request: unharmful".<|eot_id|><|start_header_id|>user<|end_header_id|>

{text}<|eot_id|><|start_header_id|>assistant<|end_header_id|>

"""

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=False)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

Quantization Details

  • Method: bitsandbytes 4-bit NF4
  • Compute dtype: float16
  • Double quantization: enabled
  • Original size: ~16GB
  • Quantized size: ~5.4GB

Performance

Expected ~84% F1 on safety benchmarks (same as original model).

License

This model inherits the Llama 3 license from the base model.

Citation

@article{liu2025guardreasoner,
  title={GuardReasoner: Towards Reasoning-based LLM Safeguards},
  author={Liu, Yue and others},
  journal={arXiv preprint arXiv:2501.18492},
  year={2025}
}
Downloads last month
35
Safetensors
Model size
8B params
Tensor type
F16
F32
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for vincentoh/guardreasoner-8b-4bit

Quantized
(2)
this model