MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26 • 134
Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory Paper • 2504.19413 • Published Apr 28 • 34
Emu3.5: Native Multimodal Models are World Learners Paper • 2510.26583 • Published Oct 30 • 104
LlamaFactory: Unified Efficient Fine-Tuning of 100+ Language Models Paper • 2403.13372 • Published Mar 20, 2024 • 173
SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models Paper • 2510.06917 • Published Oct 8 • 34
EpiCache: Episodic KV Cache Management for Long Conversational Question Answering Paper • 2509.17396 • Published Sep 22 • 19
RPG: A Repository Planning Graph for Unified and Scalable Codebase Generation Paper • 2509.16198 • Published Sep 19 • 127
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21 • 256
Running 3.53k The Ultra-Scale Playbook 🌌 3.53k The ultimate guide to training LLM on large GPU Clusters
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published Sep 10, 2024 • 59
DMOSpeech 2: Reinforcement Learning for Duration Prediction in Metric-Optimized Speech Synthesis Paper • 2507.14988 • Published Jul 20 • 7
DriftMoE: A Mixture of Experts Approach to Handle Concept Drifts Paper • 2507.18464 • Published Jul 24 • 11
Hierarchical Budget Policy Optimization for Adaptive Reasoning Paper • 2507.15844 • Published Jul 21 • 16
EarthCrafter: Scalable 3D Earth Generation via Dual-Sparse Latent Diffusion Paper • 2507.16535 • Published Jul 22 • 20
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper • 2507.14958 • Published Jul 20 • 46
LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization Paper • 2507.15758 • Published Jul 21 • 35