view article Article Simplifying Alignment: From RLHF to Direct Preference Optimization (DPO) Jan 19 • 37