moonshotai
/

Kimi-Linear-48B-A3B-Instruct

@@ -36,7 +36,7 @@ We open-source the KDA kernel in [FLA](https://github.com/fla-org/flash-linear-a
 - **Kimi Delta Attention (KDA):** A linear attention mechanism that refines the gated delta rule with finegrained gating.
 - **Hybrid Architecture:** A 3:1 KDA-to-global MLA ratio reduces memory usage while maintaining or surpassing the quality of full attention.
 - **Superior Performance:** Outperforms full attention in a variety of tasks, including long-context and RL-style benchmarks on 1.4T token training runs with fair comparisons.
-- **High Throughput:** Achieves up to $6\times$ faster decoding and significantly reduces time per output token (TPOT).
 <div align="center">
   <img width="60%" src="figures/arch.png">

 - **Kimi Delta Attention (KDA):** A linear attention mechanism that refines the gated delta rule with finegrained gating.
 - **Hybrid Architecture:** A 3:1 KDA-to-global MLA ratio reduces memory usage while maintaining or surpassing the quality of full attention.
 - **Superior Performance:** Outperforms full attention in a variety of tasks, including long-context and RL-style benchmarks on 1.4T token training runs with fair comparisons.
+- **High Throughput:** Achieves up to 6&times; faster decoding and significantly reduces time per output token (TPOT).
 <div align="center">
   <img width="60%" src="figures/arch.png">