Learnable Multipliers: Freeing the Scale of Language Model Matrix Layers Paper • 2601.04890 • Published 11 days ago • 40
Nested Learning: The Illusion of Deep Learning Architectures Paper • 2512.24695 • Published 19 days ago • 36
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss Paper • 2512.23447 • Published 21 days ago • 94
Dynamic Large Concept Models: Latent Reasoning in an Adaptive Semantic Space Paper • 2512.24617 • Published 20 days ago • 58