Model Extrapolation Expedites Alignment
					Collection
				
Better aligned models obtained by model extrapolation (ExPO)
					โข 
				25 items
				โข 
				Updated
					
				โข
					
					17
The extrapolated (ExPO) model based on abacusai/Smaug-34B-v0.1 and jondurbin/bagel-34b-v0.2, as in the "Weak-to-Strong Extrapolation Expedites Alignment" paper.
Specifically, we obtain this model by extrapolating (alpha = 0.3) from the weights of the SFT and DPO/RLHF checkpoints, achieving superior alignment with human preference.
Evaluation results on the AlpacaEval 2.0 benchmark (you can find the evaluation outputs on the official GitHub repo):
| Win Rate (Ori) | LC Win Rate (Ori) | Win Rate (+ ExPO) | LC Win Rate (+ ExPO) | |
|---|---|---|---|---|
| HuggingFaceH4/zephyr-7b-alpha | 6.7% | 10.0% | 10.6% | 13.6% | 
| HuggingFaceH4/zephyr-7b-beta | 10.2% | 13.2% | 11.1% | 14.0% | 
| berkeley-nest/Starling-LM-7B-alpha | 15.0% | 18.3% | 18.2% | 19.5% | 
| Nexusflow/Starling-LM-7B-beta | 26.6% | 25.8% | 29.6% | 26.4% | 
| snorkelai/Snorkel-Mistral-PairRM | 24.7% | 24.0% | 28.8% | 26.4% | 
| RLHFlow/LLaMA3-iterative-DPO-final | 29.2% | 36.0% | 32.7% | 37.8% | 
| internlm/internlm2-chat-1.8b | 3.8% | 4.0% | 5.2% | 4.3% | 
| internlm/internlm2-chat-7b | 20.5% | 18.3% | 28.1% | 22.7% | 
| internlm/internlm2-chat-20b | 36.1% | 24.9% | 46.2% | 27.2% | 
| allenai/tulu-2-dpo-7b | 8.5% | 10.2% | 11.5% | 11.7% | 
| allenai/tulu-2-dpo-13b | 11.2% | 15.5% | 15.6% | 17.6% | 
| allenai/tulu-2-dpo-70b | 15.4% | 21.2% | 23.0% | 25.7% | 
Evaluation results on the MT-Bench benchmark (you can find the evaluation outputs on the official GitHub repo):
| Original | + ExPO | |
|---|---|---|
| HuggingFaceH4/zephyr-7b-alpha | 6.85 | 6.87 | 
| HuggingFaceH4/zephyr-7b-beta | 7.02 | 7.06 | 
| berkeley-nest/Starling-LM-7B-alpha | 7.82 | 7.91 | 
| Nexusflow/Starling-LM-7B-beta | 8.10 | 8.18 | 
| snorkelai/Snorkel-Mistral-PairRM | 7.63 | 7.69 | 
| RLHFlow/LLaMA3-iterative-DPO-final | 8.08 | 8.45 | 
| internlm/internlm2-chat-1.8b | 5.17 | 5.26 | 
| internlm/internlm2-chat-7b | 7.72 | 7.80 | 
| internlm/internlm2-chat-20b | 8.13 | 8.26 | 
| allenai/tulu-2-dpo-7b | 6.35 | 6.38 | 
| allenai/tulu-2-dpo-13b | 7.00 | 7.26 | 
| allenai/tulu-2-dpo-70b | 7.79 | 8.03 |