AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models Paper • 2511.14295 • Published 7 days ago • 69
Huxley-Gödel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine Paper • 2510.21614 • Published Oct 24 • 22
AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models Paper • 2511.14295 • Published 7 days ago • 69
AraLingBench A Human-Annotated Benchmark for Evaluating Arabic Linguistic Capabilities of Large Language Models Paper • 2511.14295 • Published 7 days ago • 69 • 3
Multimodal Safety Evaluation in Generative Agent Social Simulations Paper • 2510.07709 • Published Oct 9 • 13