PyTorch
English
llama
desaifan-mbzuai commited on
Commit
292819f
Β·
verified Β·
1 Parent(s): bdb83f5

Update README.md (#8)

Browse files

- Update README.md (2bbe448b850e983a400c833a636f977a0af79e3a)

Files changed (1) hide show
  1. README.md +49 -39
README.md CHANGED
@@ -1,4 +1,3 @@
1
-
2
  ---
3
  license: apache-2.0
4
  language:
@@ -7,20 +6,17 @@ language:
7
 
8
  # **K2-V2**
9
 
10
- πŸ“š [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf ) - πŸ“ [Code](github_url) - 🏒 [Project Page](https://huggingface.co/LLM360/K2-V2)
11
-
12
- <img src="figures/banner.png" alt="k2-banner-placeholder"/>
13
-
14
- <br>
15
 
16
- K2-V2 is our best fully open source model to date and ranked among the best open weight models of its class. As the latest base model in the LLM360's strongest project family, K2 features a dense architecture with 70 billion parameters.
17
 
18
- <img src="figures/sft-models.png" width="400" alt="k2-sft-aime"/>
19
 
20
- Beyond standard competencies like knowledge and conversation, K2 provides advanced capabilities, including long context consistency, deep mathematical knowledge, and reasoning behaviors. These serve as foundational building blocks that enable sophisticated downstream use cases, such as solving complex math problems and executing agentic workflows.
21
 
 
22
 
23
- <img src="figures/base-models.png" width="400" alt="k2-base-gpqa"/>
24
 
25
  ---
26
 
@@ -29,8 +25,8 @@ Beyond standard competencies like knowledge and conversation, K2 provides advanc
29
  ```python
30
  from transformers import AutoModelForCausalLM, AutoTokenizer
31
 
32
- model = AutoModelForCausalLM.from_pretrained("llm360/k2-v2", device_map="auto")
33
- tokenizer = AutoTokenizer.from_pretrained("llm360/k2-v2")
34
 
35
  prompt = "Explain why the derivative of sin(x) is cos(x)."
36
  inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
@@ -42,11 +38,10 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
42
 
43
  ## **Evaluation Summary**
44
 
 
 
45
  | Task / Model | base | mid-1 | mid-2 | mid-3 | mid-4 | Qwen2.5-72B | Llama3.0-70B | Llama3.1-70B | Olmo3-32B |
46
  |--------------|------|-------|-------|-------|-------|--------------|---------------|---------------|------------|
47
- | **Architecture** | Dense | Dense | Dense | Dense | Dense | Dense | Dense | Dense | Dense |
48
- | **# Total Params** | 70B | 70B | 70B | 70B | 70B | 72B | 70B | 70B | 32B |
49
- | **# Activated Params** | 70B | 70B | 70B | 70B | 70B | 72B | 70B | 70B | 32B |
50
  | **General Tasks** | | | | | | | | | |
51
  | **MMLU** | 74.3 | 74.4 | 73.5 | 75.0 | 75.2 | **86.1** | <u>79.5</u> | 79.3 | 75.2 |
52
  | **MMLU-Pro** | 43.7 | 46.8 | 48.1 | **59.8** | 57.0 | <u>58.1</u> | 52.8 | 53.8 | 49.6 |
@@ -64,30 +59,20 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
64
  | **Coding Tasks** | | | | | | | | | |
65
  | **MBPP** | 57.6 | 57.8 | 58.2 | 59.8 | 61.8 | **75.4** | <u>69.2</u> | 64.4 | 60.2 |
66
  | **HUMANEVAL** | 50.0 | 51.2 | <u>53.7</u> | **54.3** | **54.3** | **54.3** | 42.1 | 50.6 | 36.0 |
67
- | **Logic Puzzles** | | | | | | | | | |
68
- | **COUNTDOWN** | 1.3 | <u>53.3</u> | 53.1 | 35.9 | **75.6** | 6.0 | 1.0 | 0.5 | 23.2 |
69
- | **KK-4 PEOPLE** | 4.8 | 44.9 | <u>68.0</u> | 64.5 | **92.9** | 26.1 | 4.2 | 7.6 | 42.4 |
70
- | **KK-8 PEOPLE** | 0.5 | 23.2 | 41.3 | <u>51.6</u> | **82.8** | 5.7 | 1.1 | 1.3 | 13.0 |
71
- | **ORDER-15 ITEMS** | 4.7 | 30.7 | 47.2 | <u>55.8</u> | **87.6** | 37.0 | 3.5 | 4.5 | 25.0 |
72
- | **ORDER-30 ITEMS** | 0.0 | 0.3 | 3.0 | <u>34.1</u> | **40.3** | 0.7 | 0.2 | 0.1 | 0.6 |
73
- | **Instruction Following** | | | | | | | | | |
74
- | **IFEVAL** | 17.4 | 26.2 | 28.5 | <u>34.5</u> | 26.7 | **40.3** | 15.1 | 17.4 | 13.2 |
75
- | **Arabic** | | | | | | | | | |
76
- | **MMLU-Arabic** | 65.4 | 66.1 | 64.5 | 66.6 | 65.5 | **74.1** | 65.0 | <u>66.8</u> | 47.8 |
77
 
78
 
79
- Please refer to our [Tech Report](arxiv_url) for detailed evaluation results.
80
 
81
  ---
82
 
83
  ## **Datasets & Mixtures**
84
 
85
- K2 training is organized into three stages, each using a transparent, publicly released mixture:
86
 
87
  ### **Pretraining Mix**
88
 
89
- * Large-scale natural text corpus (web, books, code, multilingual)
90
- * Balanced mixture optimized for stable scaling and broad knowledge
91
  * ~12T tokens
92
 
93
  ### **Mid-Training Mix**
@@ -102,10 +87,13 @@ K2 training is organized into three stages, each using a transparent, publicly r
102
 
103
  All mixtures, filtering rules, and data sources are fully released for reproducibility.
104
 
 
 
105
  ---
106
 
107
  ## **Model Description**
108
- - **Model type:** Language model with transformer architecture
 
109
  - **Language(s) (NLP):** English
110
  - **License:** Apache 2.0
111
 
@@ -114,22 +102,46 @@ All mixtures, filtering rules, and data sources are fully released for reproduci
114
  | ----------- | ----------- |
115
  | Total Parameters | 70B |
116
  | Hidden Size | 8,192 |
117
- | Intermediate Size (MLPs) | 28,672 |
118
  | Number of Attention Heads | 64 |
119
- | Number of Hidden Layers | 80 |
120
- | RMSNorm Ι› | 1e^-5 |
121
- | Max Pre-training Seq Length | 8,192 |
122
  | Max Mid-training Seq Length | 524,288 |
123
  | Vocab Size | 250,000 |
124
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
125
  ---
126
 
127
- ## Citation & Acknowledgment
128
 
129
- If you use our dataset in your research, please cite our [K2-V2 paper](LINK):
 
 
 
 
 
 
 
 
 
 
130
 
131
  ```
132
- @misc{llm360@k2v2,
133
  title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
134
  author = {K2 Team},
135
  year = {2025},
@@ -138,5 +150,3 @@ If you use our dataset in your research, please cite our [K2-V2 paper](LINK):
138
  primaryClass = {cs.CL}
139
  }
140
  ```
141
-
142
-
 
 
1
  ---
2
  license: apache-2.0
3
  language:
 
6
 
7
  # **K2-V2**
8
 
9
+ <img src="figures/K2.LOGO.PRIMARY.RGB.png" width="100" alt="K2-V2 model logo"/>
 
 
 
 
10
 
11
+ πŸ“š [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) - πŸ“ [Code](https://github.com/llm360/k2v2_train) - 🏒 [Project Page](https://huggingface.co/LLM360/K2-V2)
12
 
13
+ K2-V2 is our most capable fully open model to date, and one of the strongest open-weight models in its class. It uses a 70B-parameter dense transformer architecture and represents the latest advancement in the LLM360 model family.
14
 
15
+ <img src="figures/sft-models.png" width="400" alt="K2-V2 SFT results"/>
16
 
17
+ Beyond standard competencies such as factual knowledge and conversational ability, K2-V2 demonstrates strong long-context consistency, deep mathematical understanding, and robust reasoning skills. These capabilities serve as building blocks for sophisticated downstream applications, such as solving complex math problems and executing agentic workflows.
18
 
19
+ <img src="figures/base-models.png" width="400" alt="K2-V2 GPQA results"/>
20
 
21
  ---
22
 
 
25
  ```python
26
  from transformers import AutoModelForCausalLM, AutoTokenizer
27
 
28
+ model = AutoModelForCausalLM.from_pretrained("LLM360/K2-V2", device_map="auto")
29
+ tokenizer = AutoTokenizer.from_pretrained("LLM360/K2-V2")
30
 
31
  prompt = "Explain why the derivative of sin(x) is cos(x)."
32
  inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
 
38
 
39
  ## **Evaluation Summary**
40
 
41
+ Below we report performance across general, reasoning, mathematical, and coding benchmarks. Scores for K2-V2 checkpoints (base β†’ mid-4) demonstrate the impact of staged mid-training on reasoning quality.
42
+
43
  | Task / Model | base | mid-1 | mid-2 | mid-3 | mid-4 | Qwen2.5-72B | Llama3.0-70B | Llama3.1-70B | Olmo3-32B |
44
  |--------------|------|-------|-------|-------|-------|--------------|---------------|---------------|------------|
 
 
 
45
  | **General Tasks** | | | | | | | | | |
46
  | **MMLU** | 74.3 | 74.4 | 73.5 | 75.0 | 75.2 | **86.1** | <u>79.5</u> | 79.3 | 75.2 |
47
  | **MMLU-Pro** | 43.7 | 46.8 | 48.1 | **59.8** | 57.0 | <u>58.1</u> | 52.8 | 53.8 | 49.6 |
 
59
  | **Coding Tasks** | | | | | | | | | |
60
  | **MBPP** | 57.6 | 57.8 | 58.2 | 59.8 | 61.8 | **75.4** | <u>69.2</u> | 64.4 | 60.2 |
61
  | **HUMANEVAL** | 50.0 | 51.2 | <u>53.7</u> | **54.3** | **54.3** | **54.3** | 42.1 | 50.6 | 36.0 |
 
 
 
 
 
 
 
 
 
 
62
 
63
 
64
+ Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed evaluation results.
65
 
66
  ---
67
 
68
  ## **Datasets & Mixtures**
69
 
70
+ K2-V2 training is organized into three stages, each using a transparent, publicly released mixture:
71
 
72
  ### **Pretraining Mix**
73
 
74
+ * Large-scale natural text corpus spanning web content, books, code, and multilingual sources
75
+ * Mixture designed for stable scaling and broad general-knowledge coverage
76
  * ~12T tokens
77
 
78
  ### **Mid-Training Mix**
 
87
 
88
  All mixtures, filtering rules, and data sources are fully released for reproducibility.
89
 
90
+ Please refer to our [Tech Report](https://www.llm360.ai/reports/K2_V2_report.pdf) for detailed datasets and mixtures information.
91
+
92
  ---
93
 
94
  ## **Model Description**
95
+ - **Model type:** K2-V2 follows a standard decoder-only transformer with grouped-query attention and RMSNorm.
96
+ - **Training stage:** Pre-training
97
  - **Language(s) (NLP):** English
98
  - **License:** Apache 2.0
99
 
 
102
  | ----------- | ----------- |
103
  | Total Parameters | 70B |
104
  | Hidden Size | 8,192 |
105
+ | Intermediate Size (FFN) | 28,672 |
106
  | Number of Attention Heads | 64 |
107
+ | Number of Layers | 80 |
108
+ | RMSNorm Ι› | 1e-5 |
109
+ | Pre-training Seq Length | 8,192 |
110
  | Max Mid-training Seq Length | 524,288 |
111
  | Vocab Size | 250,000 |
112
 
113
+
114
+ ---
115
+
116
+ ## **Intended Use**
117
+
118
+ K2-V2 is designed for:
119
+
120
+ * research on large language models and reasoning
121
+ * downstream fine-tuning (e.g., instruction following, agents, domain models)
122
+ * experimentation with long-context architectures
123
+ * open, transparent benchmarking of LLM scaling
124
+
125
+ K2-V2 is **not** instruction-tuned. For aligned conversational use, please see **K2-V2-Instruct**.
126
+
127
  ---
128
 
129
+ ## **Limitations**
130
 
131
+ * May generate incorrect or hallucinated content, especially when asked about facts not seen during training
132
+ * Not optimized for safety, moderation, or refusal behavior (base model)
133
+ * Long-context performance depends on prompt quality and retrieval structure
134
+ * Primarily trained on English; multilingual capabilities are limited
135
+ * Inference cost is high due to the 70B parameter size
136
+
137
+ ---
138
+
139
+ ## Citation
140
+
141
+ If you use K2-V2 in your research, please cite the following:
142
 
143
  ```
144
+ @misc{llm360_k2v2_2025,
145
  title = {K2-V2: A 360-Open, Reasoning-Enhanced Open Foundation Model},
146
  author = {K2 Team},
147
  year = {2025},
 
150
  primaryClass = {cs.CL}
151
  }
152
  ```