Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published 8 days ago • 71
Large Language Models Meet Extreme Multi-label Classification: Scaling and Multi-modal Framework Paper • 2511.13189 • Published 10 days ago • 37
REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding Paper • 2511.13026 • Published 10 days ago • 24
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark Paper • 2511.13853 • Published 10 days ago • 34
A Style is Worth One Code: Unlocking Code-to-Style Image Generation with Discrete Style Space Paper • 2511.10555 • Published 14 days ago • 58
Think-at-Hard: Selective Latent Iterations to Improve Reasoning Language Models Paper • 2511.08577 • Published 16 days ago • 102
Agent READMEs: An Empirical Study of Context Files for Agentic Coding Paper • 2511.12884 • Published 10 days ago • 5
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls Paper • 2511.09148 • Published 15 days ago • 16
Lumine: An Open Recipe for Building Generalist Agents in 3D Open Worlds Paper • 2511.08892 • Published 15 days ago • 183
Adaptive Multi-Agent Response Refinement in Conversational Systems Paper • 2511.08319 • Published 16 days ago • 40
Too Good to be Bad: On the Failure of LLMs to Role-Play Villains Paper • 2511.04962 • Published 20 days ago • 51
JanusCoder: Towards a Foundational Visual-Programmatic Interface for Code Intelligence Paper • 2510.23538 • Published about 1 month ago • 95
gpt-oss-safeguard Collection gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are safety reasoning models built-upon gpt-oss • 2 items • Updated 29 days ago • 57
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory Paper • 2504.19413 • Published Apr 28 • 33
Enterprise Deep Research: Steerable Multi-Agent Deep Research for Enterprise Analytics Paper • 2510.17797 • Published Oct 20 • 9
Pico-Banana-400K: A Large-Scale Dataset for Text-Guided Image Editing Paper • 2510.19808 • Published Oct 22 • 28
olmOCR: Unlocking Trillions of Tokens in PDFs with Vision Language Models Paper • 2502.18443 • Published Feb 25 • 9
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published Oct 6 • 113