EvoArena: Tracking Memory Evolution for Robust LLM Agents in Dynamic Environments Paper • 2606.13681 • Published 4 days ago • 121
Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior Paper • 2606.12730 • Published 5 days ago • 5
CausaLab: A Scalable Environment for Interactive Causal Discovery Toward AI Scientists Paper • 2605.26029 • Published 18 days ago • 18
Useful Memories Become Faulty When Continuously Updated by LLMs Paper • 2605.12978 • Published May 13 • 18
SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers Paper • 2602.05115 • Published Feb 4 • 20
Steer2Adapt: Dynamically Composing Steering Vectors Elicits Efficient Adaptation of LLMs Paper • 2602.07276 • Published Feb 7 • 11
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published Feb 1 • 45