The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published 14 days ago • 6
view post Post 653 I ran the Anthropic Misalignment Framework for a few top models and added it to a dataset: cfahlgren1/anthropic-agentic-misalignment-resultsYou can read the reasoning traces of the models trying to blackmail the user and perform other actions. It's very interesting!! See translation 👍 1 1 + Reply
view post Post 404 Really nice to see AllenAI drop the Reward-Bench-2 dataset and leaderboard from their new paper all on the hub! 👏 allenai/reward-bench allenai/reward-bench-2 allenai/reward-bench-2-resultsGreat work @natolambert , allenai and others!! 🤗 See translation 🤗 1 1 + Reply