Fix LaTeX rendering issue on README.md
#2
by
C10X
- opened
README.md
CHANGED
|
@@ -36,7 +36,7 @@ We open-source the KDA kernel in [FLA](https://github.com/fla-org/flash-linear-a
|
|
| 36 |
- **Kimi Delta Attention (KDA):** A linear attention mechanism that refines the gated delta rule with finegrained gating.
|
| 37 |
- **Hybrid Architecture:** A 3:1 KDA-to-global MLA ratio reduces memory usage while maintaining or surpassing the quality of full attention.
|
| 38 |
- **Superior Performance:** Outperforms full attention in a variety of tasks, including long-context and RL-style benchmarks on 1.4T token training runs with fair comparisons.
|
| 39 |
-
- **High Throughput:** Achieves up to
|
| 40 |
|
| 41 |
<div align="center">
|
| 42 |
<img width="60%" src="figures/arch.png">
|
|
|
|
| 36 |
- **Kimi Delta Attention (KDA):** A linear attention mechanism that refines the gated delta rule with finegrained gating.
|
| 37 |
- **Hybrid Architecture:** A 3:1 KDA-to-global MLA ratio reduces memory usage while maintaining or surpassing the quality of full attention.
|
| 38 |
- **Superior Performance:** Outperforms full attention in a variety of tasks, including long-context and RL-style benchmarks on 1.4T token training runs with fair comparisons.
|
| 39 |
+
- **High Throughput:** Achieves up to 6× faster decoding and significantly reduces time per output token (TPOT).
|
| 40 |
|
| 41 |
<div align="center">
|
| 42 |
<img width="60%" src="figures/arch.png">
|