Code Agent can be an End-to-end System Hacker: Benchmarking Real-world Threats of Computer-use Agent Paper • 2510.06607 • Published 28 days ago • 3
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information Paper • 2510.03632 • Published Oct 4 • 41
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26 • 133
JailBreakV-28K: A Benchmark for Assessing the Robustness of MultiModal Large Language Models against Jailbreak Attacks Paper • 2404.03027 • Published Apr 3, 2024 • 3