SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models Paper • 2506.01062 • Published Jun 1 • 5
LightMem: Lightweight and Efficient Memory-Augmented Generation Paper • 2510.18866 • Published 12 days ago • 106
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research Paper • 2507.13300 • Published Jul 17 • 19
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21 • 85
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21 • 85