Improve model card: Add metadata tags, explicit links, and update citation

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +22 -11
README.md CHANGED
@@ -1,17 +1,26 @@
1
  ---
2
  license: mit
 
 
3
  ---
 
 
 
 
 
 
4
  <div align="center">
5
- <a href="https://github.com/MoonshotAI/Kimi-Linear/blob/master/tech_report.pdf"><img width="80%" src="figures/banner.png"></a>
6
  </div>
7
 
8
  <div align="center">
9
- <a href="https://github.com/MoonshotAI/Kimi-Linear/blob/master/tech_report.pdf" ><img src="figures/logo.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> Tech Report</b></a> |
 
10
  <a href="https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Base"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> HuggingFace</b></a>
11
  </div>
12
 
13
  <div align="center">
14
- <img width="90%" src="figures/perf_speed.png">
15
  <p><em><b>(a)</b> On MMLU-Pro (4k context length), Kimi Linear achieves 51.0 performance with similar speed as full attention. On RULER (128k context length), it shows Pareto-optimal performance (84.3) and 3.98x speedup. <b>(b)</b> Kimi Linear achieves 6.3x faster TPOT compared to MLA, offering significant speedups at long sequence lengths (1M tokens).</em></p>
16
  </div>
17
 
@@ -38,7 +47,7 @@ We open-source the KDA kernel in [FLA](https://github.com/fla-org/flash-linear-a
38
  - **High Throughput:** Achieves up to $6\times$ faster decoding and significantly reduces time per output token (TPOT).
39
 
40
  <div align="center">
41
- <img width="60%" src="figures/arch.png">
42
  </div>
43
 
44
  ## Usage
@@ -94,14 +103,16 @@ vllm serve moonshotai/Kimi-Linear-48B-A3B-Instruct \
94
  --trust-remote-code
95
  ```
96
 
97
- ### Citation
98
 
99
- If you found our work useful, please cite
100
  ```bibtex
101
- @article{kimi2025kda,
102
- title = {Kimi Linear: An Expressive, Efficient Attention Architecture},
103
- author = {kimi Team},
104
- year = {2025},
105
- url = {https://github.com/MoonshotAI/Kimi-Linear/blob/master/tech_report.pdf}
 
 
106
  }
107
  ```
 
1
  ---
2
  license: mit
3
+ pipeline_tag: text-generation
4
+ library_name: transformers
5
  ---
6
+
7
+ # Kimi Linear: An Expressive, Efficient Attention Architecture
8
+
9
+ This model is presented in the paper [Kimi Linear: An Expressive, Efficient Attention Architecture](https://huggingface.co/papers/2510.26692).
10
+ The official code can be found at: [https://github.com/MoonshotAI/Kimi-Linear](https://github.com/MoonshotAI/Kimi-Linear)
11
+
12
  <div align="center">
13
+ <a href="https://huggingface.co/papers/2510.26692"><img width="80%" src="https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct/resolve/main/figures/banner.png"></a>
14
  </div>
15
 
16
  <div align="center">
17
+ <a href="https://huggingface.co/papers/2510.26692" ><img src="https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct/resolve/main/figures/logo.png" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> Paper</b></a> |
18
+ <a href="https://github.com/MoonshotAI/Kimi-Linear"><img src="https://img.shields.io/badge/Github-Code-blue.svg?logo=github&style=flat-square" height="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> Code</b></a> |
19
  <a href="https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Base"><img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" height="16" width="16" style="display: inline-block; vertical-align: middle; margin: 2px;"><b style="display: inline-block;"> HuggingFace</b></a>
20
  </div>
21
 
22
  <div align="center">
23
+ <img width="90%" src="https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct/resolve/main/figures/perf_speed.png">
24
  <p><em><b>(a)</b> On MMLU-Pro (4k context length), Kimi Linear achieves 51.0 performance with similar speed as full attention. On RULER (128k context length), it shows Pareto-optimal performance (84.3) and 3.98x speedup. <b>(b)</b> Kimi Linear achieves 6.3x faster TPOT compared to MLA, offering significant speedups at long sequence lengths (1M tokens).</em></p>
25
  </div>
26
 
 
47
  - **High Throughput:** Achieves up to $6\times$ faster decoding and significantly reduces time per output token (TPOT).
48
 
49
  <div align="center">
50
+ <img width="60%" src="https://huggingface.co/moonshotai/Kimi-Linear-48B-A3B-Instruct/resolve/main/figures/arch.png">
51
  </div>
52
 
53
  ## Usage
 
103
  --trust-remote-code
104
  ```
105
 
106
+ ## Citation
107
 
108
+ If you found our work useful, please cite:
109
  ```bibtex
110
+ @misc{team2025kimi,
111
+ title = {Kimi Linear: An Expressive, Efficient Attention Architecture},
112
+ author = {Zhang, Yu and Lin, Zongyu and Yao, Xingcheng and Hu, Jiaxi and Meng, Fanqing and Liu, Chengyin and Men, Xin and Yang, Songlin and Li, Zhiyuan and Li, Wentao and Lu, Enzhe and Liu, Weizhou and Chen, Yanru and Xu, Weixin and Yu, Longhui and Wang, Yejie and Fan, Yu and Zhong, Longguang and Yuan, Enming and Zhang, Dehao and Zhang, Yizhi and T. Liu, Y. and Wang, Haiming and Fang, Shengjun and He, Weiran and Liu, Shaowei and Li, Yiwei and Su, Jianlin and Qiu, Jiezhong and Pang, Bo and Yan, Junjie and Jiang, Zhejun and Huang, Weixiao and Yin, Bohong and You, Jiacheng and Wei, Chu and Wang, Zhengtao and Hong, Chao and Chen, Yutian and Chen, Guanduo and Wang, Yucheng and Zheng, Huabin and Wang, Feng and Liu, Yibo and Dong, Mengnan and Zhang, Zheng and Pan, Siyuan and Wu, Wenhao and Wu, Yuhao and Guan, Longyu and Tao, Jiawen and Fu, Guohong and Xu, Xinran and Wang, Yuzhi and Lai, Guokun and Wu, Yuxin and Zhou, Xinyu and Yang, Zhilin and Du, Yulun},
113
+ year = {2025},
114
+ eprint = {2510.26692},
115
+ archivePrefix = {arXiv},
116
+ primaryClass = {cs.CL}
117
  }
118
  ```