prithivMLmods commited on
Commit
130eaf6
·
verified ·
1 Parent(s): fa63344

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -1
README.md CHANGED
@@ -18,4 +18,31 @@ library_name: transformers
18
 
19
  > GRAM-LLaMA3.2-3B-RewardModel is a generative reward model fine-tuned from the Llama-3.2-3B-Instruct base model released by NiuTrans. It is designed to improve reward generalization for large language models (LLMs) by leveraging a novel training approach that first pre-trains on large unlabeled datasets and then fine-tunes using supervised labeled data. The training uses label smoothing and optimizes a regularized ranking loss, bridging generative and discriminative reward modeling techniques. This enables the model to be applied flexibly across a variety of tasks without the usual need for extensive fine-tuning on task-specific datasets.
20
 
21
- > GRAM-LLaMA3.2-3B-RewardModel is evaluated on the JudgeBench benchmark, which covers domains such as Chat, Code, Math, and Safety. It achieves a competitive average score of 69.9 across these categories, demonstrating strong capability for use as an open-source plug-and-play reward model that can align LLMs effectively without retraining reward models from scratch. The repository includes usage examples that let users directly apply this reward model for assessing and ranking the quality of AI-generated responses in an impartial manner.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  > GRAM-LLaMA3.2-3B-RewardModel is a generative reward model fine-tuned from the Llama-3.2-3B-Instruct base model released by NiuTrans. It is designed to improve reward generalization for large language models (LLMs) by leveraging a novel training approach that first pre-trains on large unlabeled datasets and then fine-tunes using supervised labeled data. The training uses label smoothing and optimizes a regularized ranking loss, bridging generative and discriminative reward modeling techniques. This enables the model to be applied flexibly across a variety of tasks without the usual need for extensive fine-tuning on task-specific datasets.
20
 
21
+ > GRAM-LLaMA3.2-3B-RewardModel is evaluated on the JudgeBench benchmark, which covers domains such as Chat, Code, Math, and Safety. It achieves a competitive average score of 69.9 across these categories, demonstrating strong capability for use as an open-source plug-and-play reward model that can align LLMs effectively without retraining reward models from scratch. The repository includes usage examples that let users directly apply this reward model for assessing and ranking the quality of AI-generated responses in an impartial manner.
22
+
23
+ ## Model Files
24
+
25
+ | Model File name | Size | QuantType |
26
+ |---|---|---|
27
+ | GRAM-LLaMA3.2-3B-RewardModel.BF16.gguf | 6.43 GB | BF16 |
28
+ | GRAM-LLaMA3.2-3B-RewardModel.F16.gguf | 6.43 GB | F16 |
29
+ | GRAM-LLaMA3.2-3B-RewardModel.F32.gguf | 12.9 GB | F32 |
30
+ | GRAM-LLaMA3.2-3B-RewardModel.Q2_K.gguf | 1.36 GB | Q2_K |
31
+ | GRAM-LLaMA3.2-3B-RewardModel.Q3_K_L.gguf | 1.82 GB | Q3_K_L |
32
+ | GRAM-LLaMA3.2-3B-RewardModel.Q3_K_M.gguf | 1.69 GB | Q3_K_M |
33
+ | GRAM-LLaMA3.2-3B-RewardModel.Q3_K_S.gguf | 1.54 GB | Q3_K_S |
34
+ | GRAM-LLaMA3.2-3B-RewardModel.Q4_K_M.gguf | 2.02 GB | Q4_K_M |
35
+ | GRAM-LLaMA3.2-3B-RewardModel.Q4_K_S.gguf | 1.93 GB | Q4_K_S |
36
+ | GRAM-LLaMA3.2-3B-RewardModel.Q5_K_M.gguf | 2.32 GB | Q5_K_M |
37
+ | GRAM-LLaMA3.2-3B-RewardModel.Q5_K_S.gguf | 2.27 GB | Q5_K_S |
38
+ | GRAM-LLaMA3.2-3B-RewardModel.Q6_K.gguf | 2.64 GB | Q6_K |
39
+ | GRAM-LLaMA3.2-3B-RewardModel.Q8_0.gguf | 3.42 GB | Q8_0 |
40
+
41
+ ## Quants Usage
42
+
43
+ (sorted by size, not necessarily quality. IQ-quants are often preferable over similar sized non-IQ quants)
44
+
45
+ Here is a handy graph by ikawrakow comparing some lower-quality quant
46
+ types (lower is better):
47
+
48
+ ![image.png](https://www.nethype.de/huggingface_embed/quantpplgraph.png)