Moonlight-A3B Collection Moonshot's Compute-efficient MoE LLM, first Scaling Up of Muon Optimizer • 3 items • Updated about 5 hours ago • 7
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published 3 days ago • 65
Kimi-Linear-A3B Collection Moonshot's experimental MoE model with Kimi Delta Attention • 3 items • Updated 1 day ago • 7