chrisliu298 commited on
Commit
425a84e
Β·
verified Β·
1 Parent(s): 119f351

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -1
README.md CHANGED
@@ -21,12 +21,25 @@ pipeline_tag: text-classification
21
 
22
  ## πŸ”₯ Highlights
23
 
24
- **Skywork-Reward-V2** is a series of reward models designed for versatility across a wide range of tasks, trained on a mixture of 26 million carefully curated preference pairs. While the Skywork-Reward-V2 series remains based on the Bradley-Terry model, we push the boundaries of training data scale and quality to achieve superior performance. Compared to the first generation of Skywork-Reward, the Skywork-Reward-V2 series offers the following major improvements:
25
 
26
  - **Trained on a significantly larger and higher-quality preference data mixture**, consisting of **26 million preference pairs** curated via a large-scale human-LLM synergistic pipeline.
27
  - **State-of-the-art performance on seven major reward model benchmarks**, including RewardBench v1, RewardBench v2, PPE Preference, PPE Correctness, RMB, RM-Bench, and JudgeBench.
28
  - **Available in eight models across multiple sizes**, with the smallest 0.6B variant, *Skywork-Reward-V2-Qwen3-0.6B*, nearly matching the average performance of our previous best model, Skywork-Reward-Gemma-2-27B-v0.2. The largest 8B version, *Skywork-Reward-V2-Llama-3.1-8B*, surpasses all existing reward models across all benchmarks on average. Our top experimental model, *Skywork-Reward-V2-Llama-3.1-8B-40M*, **outperforms all existing reward models on every benchmark**.
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  ## πŸ“Š Evaluation
31
 
32
  In the following table, we categorize the models into two types: Bradley-Terry (BT) reward models and Generative reward models. The Skywork-Reward-V2 series outperforms models in both categories with much smaller model sizes.
 
21
 
22
  ## πŸ”₯ Highlights
23
 
24
+ **Skywork-Reward-V2** is a series of eight reward models designed for versatility across a wide range of tasks, trained on a mixture of 26 million carefully curated preference pairs. While the Skywork-Reward-V2 series remains based on the Bradley-Terry model, we push the boundaries of training data scale and quality to achieve superior performance. Compared to the first generation of Skywork-Reward, the Skywork-Reward-V2 series offers the following major improvements:
25
 
26
  - **Trained on a significantly larger and higher-quality preference data mixture**, consisting of **26 million preference pairs** curated via a large-scale human-LLM synergistic pipeline.
27
  - **State-of-the-art performance on seven major reward model benchmarks**, including RewardBench v1, RewardBench v2, PPE Preference, PPE Correctness, RMB, RM-Bench, and JudgeBench.
28
  - **Available in eight models across multiple sizes**, with the smallest 0.6B variant, *Skywork-Reward-V2-Qwen3-0.6B*, nearly matching the average performance of our previous best model, Skywork-Reward-Gemma-2-27B-v0.2. The largest 8B version, *Skywork-Reward-V2-Llama-3.1-8B*, surpasses all existing reward models across all benchmarks on average. Our top experimental model, *Skywork-Reward-V2-Llama-3.1-8B-40M*, **outperforms all existing reward models on every benchmark**.
29
 
30
+ | Model | Basa Model | Link |
31
+ |:-----------------------------------|:--------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------:|
32
+ | Skywork-Reward-V2-Llama-3.1-8B | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [πŸ€— Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B) |
33
+ | Skywork-Reward-V2-Llama-3.1-8B-40M | [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) | [πŸ€— Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.1-8B-40M) |
34
+ | Skywork-Reward-V2-Llama-3.2-1B | [meta-llama/Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) | [πŸ€— Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.2-1B) |
35
+ | Skywork-Reward-V2-Llama-3.2-3B | [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) | [πŸ€— Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Llama-3.2-3B) |
36
+ | Skywork-Reward-V2-Qwen3-0.6B | [Qwen/Qwen3-0.6B-Instruct](https://huggingface.co/Qwen/Qwen3-0.6B-Instruct) | [πŸ€— Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-0.6B) |
37
+ | Skywork-Reward-V2-Qwen3-1.7B | [Qwen/Qwen3-1.7B-Instruct](https://huggingface.co/Qwen/Qwen3-1.7B-Instruct) | [πŸ€— Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-1.7B) |
38
+ | Skywork-Reward-V2-Qwen3-4B | [Qwen/Qwen3-4B-Instruct](https://huggingface.co/Qwen/Qwen3-4B-Instruct) | [πŸ€— Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-4B) |
39
+ | Skywork-Reward-V2-Qwen3-8B | [Qwen/Qwen3-8B-Instruct](https://huggingface.co/Qwen/Qwen3-8B-Instruct) | [πŸ€— Hugging Face](https://huggingface.co/Skywork/Skywork-Reward-V2-Qwen3-8B) |
40
+
41
+ For the complete collection of models, please refer to the [Skywork-Reward-V2](https://huggingface.co/collections/Skywork/skywork-reward-v2-685cc86ce5d9c9e4be500c84) collection.
42
+
43
  ## πŸ“Š Evaluation
44
 
45
  In the following table, we categorize the models into two types: Bradley-Terry (BT) reward models and Generative reward models. The Skywork-Reward-V2 series outperforms models in both categories with much smaller model sizes.