prithivMLmods
/

GRAM-LLaMA3.2-3B-RewardModel-GGUF

@@ -18,4 +18,31 @@ library_name: transformers
 > GRAM-LLaMA3.2-3B-RewardModel is a generative reward model fine-tuned from the Llama-3.2-3B-Instruct base model released by NiuTrans. It is designed to improve reward generalization for large language models (LLMs) by leveraging a novel training approach that first pre-trains on large unlabeled datasets and then fine-tunes using supervised labeled data. The training uses label smoothing and optimizes a regularized ranking loss, bridging generative and discriminative reward modeling techniques. This enables the model to be applied flexibly across a variety of tasks without the usual need for extensive fine-tuning on task-specific datasets.
-> GRAM-LLaMA3.2-3B-RewardModel is evaluated on the JudgeBench benchmark, which covers domains such as Chat, Code, Math, and Safety. It achieves a competitive average score of 69.9 across these categories, demonstrating strong capability for use as an open-source plug-and-play reward model that can align LLMs effectively without retraining reward models from scratch. The repository includes usage examples that let users directly apply this reward model for assessing and ranking the quality of AI-generated responses in an impartial manner.

 > GRAM-LLaMA3.2-3B-RewardModel is a generative reward model fine-tuned from the Llama-3.2-3B-Instruct base model released by NiuTrans. It is designed to improve reward generalization for large language models (LLMs) by leveraging a novel training approach that first pre-trains on large unlabeled datasets and then fine-tunes using supervised labeled data. The training uses label smoothing and optimizes a regularized ranking loss, bridging generative and discriminative reward modeling techniques. This enables the model to be applied flexibly across a variety of tasks without the usual need for extensive fine-tuning on task-specific datasets.
+> GRAM-LLaMA3.2-3B-RewardModel is evaluated on the JudgeBench benchmark, which covers domains such as Chat, Code, Math, and Safety. It achieves a competitive average score of 69.9 across these categories, demonstrating strong capability for use as an open-source plug-and-play reward model that can align LLMs effectively without retraining reward models from scratch. The repository includes usage examples that let users directly apply this reward model for assessing and ranking the quality of AI-generated responses in an impartial manner.
+## Model Files
+| Model File name | Size | QuantType |
+|---|---|---|
+| GRAM-LLaMA3.2-3B-RewardModel.BF16.gguf | 6.43 GB | BF16 |
+| GRAM-LLaMA3.2-3B-RewardModel.F16.gguf | 6.43 GB | F16 |
+| GRAM-LLaMA3.2-3B-RewardModel.F32.gguf | 12.9 GB | F32 |
+| GRAM-LLaMA3.2-3B-RewardModel.Q2_K.gguf | 1.36 GB | Q2_K |
+| GRAM-LLaMA3.2-3B-RewardModel.Q3_K_L.gguf | 1.82 GB | Q3_K_L |
+| GRAM-LLaMA3.2-3B-RewardModel.Q3_K_M.gguf | 1.69 GB | Q3_K_M |
+| GRAM-LLaMA3.2-3B-RewardModel.Q3_K_S.gguf | 1.54 GB | Q3_K_S |
+| GRAM-LLaMA3.2-3B-RewardModel.Q4_K_M.gguf | 2.02 GB | Q4_K_M |
+| GRAM-LLaMA3.2-3B-RewardModel.Q4_K_S.gguf | 1.93 GB | Q4_K_S |
+| GRAM-LLaMA3.2-3B-RewardModel.Q5_K_M.gguf | 2.32 GB | Q5_K_M |
+| GRAM-LLaMA3.2-3B-RewardModel.Q5_K_S.gguf | 2.27 GB | Q5_K_S |
+| GRAM-LLaMA3.2-3B-RewardModel.Q6_K.gguf | 2.64 GB | Q6_K |
+| GRAM-LLaMA3.2-3B-RewardModel.Q8_0.gguf | 3.42 GB | Q8_0 |
+## Quants Usage
+(sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
+Here is a handy graph by ikawrakow comparing some lower-quality quant
+types (lower is better):
+![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)