Gemma Judge
Collection
This is a collection of compact yet highly capable LLM-as-a-judge models fine-tuned from Gemma3 4B.
•
5 items
•
Updated
This model is a fine-tuned preference evaluation model based on unsloth/gemma-3-4b-it, trained on the prometheus-eval/Preference-Collection dataset.
It is designed to perform pairwise preference comparison and alignment evaluation tasks, inspired by the Prometheus framework (Kim et al., 2023).
| Model | Benchmark | Accuracy (%) (Pairwise) |
|---|---|---|
| 🟦 This model | Preference Bench | 95.6 |
| 🟨 Prometheus 2 (8×7B) (Kim et al., 2024) | Preference Bench | 90.65 |
Highlights:
This model is released under the Apache 2.0 License.
However, because it is derived from Google’s Gemma 3, your use of this model must also comply with the Gemma Terms of Use.
By using this model, you agree to:
For full details, see: https://ai.google.dev/gemma/terms