lbourdois commited on
Commit
53ed4af
·
verified ·
1 Parent(s): e8b5827

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +180 -169
README.md CHANGED
@@ -1,170 +1,181 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-7B-Instruct
4
- datasets:
5
- - HoangHa/Pensez-v0.1
6
- language:
7
- - en
8
- - fr
9
- library_name: transformers
10
- license: apache-2.0
11
- pipeline_tag: text-generation
12
- ---
13
-
14
- <div align="center">
15
-
16
- # Pensez: Less Data, Better Reasoning – Rethinking French LLM
17
-
18
- [**About**](#about) | [**How to Run Locally**](#run-locally) | [**Models and Datasets**](#models-and-datasets) | [**Benchmarks**](#benchmarks) | [**Training Details**](#training-details)
19
-
20
-
21
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630a5ef0e81e1dea2cedcec0/-QnXjQ3SRkGgYpYK9wvff.png)
22
-
23
- </div>
24
-
25
- ## About
26
-
27
- Paper: [Pensez: Less Data, Better Reasoning - Rethinking French LLM](https://huggingface.co/papers/2503.13661)
28
-
29
- Pensez is a bilingual (French-English) reasoning model designed to maximize efficiency with significantly reduced training data. The model leverages a curated dataset focusing on daily reasoning tasks and scientific questions to enhance performance.
30
-
31
- Key strategies for improved reasoning:
32
- - **Concise reasoning** for simple tasks to prevent overthinking.
33
- - **Extended reasoning** for complex domains like mathematics, coding, and science.
34
- - **Special tokens (`<think>...</think>`)** to explicitly guide the model’s reasoning process.
35
-
36
- These optimizations result in superior reasoning capabilities while maintaining robust general understanding compared to models like [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B).
37
-
38
- ## Models and Datasets
39
-
40
- ### Model Versions
41
-
42
- Pensez is built upon [Qwen 2.5 Instruct 7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) and trained over five epochs.
43
-
44
- | Model | Backbone | Size | Download Link |
45
- |---------------|----------------------------------------|------|---------------|
46
- | Pensez-v0.1-e1 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e1](https://huggingface.co/HoangHa/Pensez-v0.1-e1) |
47
- | Pensez-v0.1-e2 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e2](https://huggingface.co/HoangHa/Pensez-v0.1-e2) |
48
- | Pensez-v0.1-e3 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e3](https://huggingface.co/HoangHa/Pensez-v0.1-e3) |
49
- | Pensez-v0.1-e4 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e4](https://huggingface.co/HoangHa/Pensez-v0.1-e4) |
50
- | Pensez-v0.1-e5 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e5](https://huggingface.co/HoangHa/Pensez-v0.1-e5) |
51
-
52
- ### Dataset
53
-
54
- Pensez was trained on the hand-curated [Pensez v0.1](https://huggingface.co/datasets/HoangHa/Pensez-v0.1) dataset containing 2,000 samples (1,000 French, 1,000 English).
55
-
56
- | Dataset | Description | Size | Link |
57
- |--------------|----------------------|-------|-------|
58
- | Pensez v0.1 | SFT Training Dataset | 2K samples | [🤗 Pensez v0.1](https://huggingface.co/datasets/HoangHa/Pensez-v0.1) |
59
-
60
- ## Benchmarks
61
-
62
- Pensez was evaluated on French-specific benchmarks, demonstrating strong reasoning ability and improved task-specific performance:
63
-
64
- | Benchmark | Pensez-v0.1-e5 | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-7B-Instruct |
65
- |-----------|---------------|-----------------------------|----------------------|
66
- | Math-hard (fr) | 0.3458 | 0.3403 | 0.2253 |
67
- | MMLU (fr) | 0.5766 | 0.4961 | 0.6612 |
68
- | BoolQA (fr) | 0.9157 | 0.7079 | 0.9382 |
69
- | Trivia (en) | 0.4421 | 0.2711 | 0.5316 |
70
- | HellaSwag (en) | 0.5050 | 0.3540 | 0.5258 |
71
-
72
- **Key Observations:**
73
- - Pensez outperforms Qwen2.5-7B-Instruct in reasoning tasks.
74
- - Comparable to DeepSeek-R1-Distill-Qwen-7B in reasoning while maintaining strong understanding.
75
- - Reduced degradation in knowledge-based tasks.
76
-
77
- <details>
78
- <summary>Click for detailed benchmark results</summary>
79
-
80
- | Tasks | Pensez v0.1 e1 | Pensez v0.1 e2 | Pensez v0.1 e3 | Pensez v0.1 e4 | Pensez v0.1 e5 | Qwen 7B instruct | R1 distil |
81
- |------------------------------------------------|---------------|---------------|---------------|---------------|---------------|-----------------|-----------|
82
- | leaderboard_math_hard_fr | 0.0918 | 0.2547 | 0.2783 | 0.3035 | 0.3458 | 0.2253 | 0.3403 |
83
- | leaderboard_math_algebra_hard_fr | 0.1029 | 0.3914 | 0.3971 | 0.5114 | 0.5000 | 0.4229 | 0.4771 |
84
- | leaderboard_math_counting_and_prob_hard_fr | 0.0765 | 0.1378 | 0.1939 | 0.2041 | 0.2398 | 0.1224 | 0.2347 |
85
- | leaderboard_math_geometry_hard_fr | 0.0388 | 0.1019 | 0.1408 | 0.1359 | 0.1748 | 0.1019 | 0.2330 |
86
- | leaderboard_math_num_theory_hard_fr | 0.1198 | 0.2581 | 0.3502 | 0.3548 | 0.4332 | 0.3180 | 0.3963 |
87
- | leaderboard_math_prealgebra_hard_fr | 0.1681 | 0.4425 | 0.4690 | 0.4956 | 0.5841 | 0.3274 | 0.4867 |
88
- | leaderboard_math_precalculus_hard_fr | 0.0357 | 0.0714 | 0.1190 | 0.1190 | 0.1429 | 0.0595 | 0.2143 |
89
- | leaderboard_mmlu_fr | 0.3806 | 0.3329 | - | - | 0.5766 | 0.6612 | 0.4961 |
90
- | french_bench_arc_challenge | 0.5047 | 0.5021 | 0.4919 | 0.4859 | 0.4842 | 0.5518 | 0.3447 |
91
- | french_bench_boolqa | 0.9326 | 0.9326 | 0.9326 | 0.9270 | 0.9157 | 0.9382 | 0.7079 |
92
- | french_bench_fquadv2 | 0.4325 | 0.4400 | 0.4412 | 0.4375 | 0.4387 | 0.4800 | 0.2988 |
93
- | french_bench_hellaswag | 0.4970 | 0.5055 | 0.5092 | 0.5058 | 0.5050 | 0.5258 | 0.3540 |
94
- | french_bench_trivia | 0.4763 | 0.4763 | 0.4553 | 0.4395 | 0.4421 | 0.5316 | 0.2711 |
95
-
96
- </details>
97
-
98
- ## Run Locally
99
-
100
- You can run Pensez using Hugging Face’s `transformers` library:
101
-
102
- ```python
103
- import torch
104
- from transformers import AutoTokenizer, AutoModelForCausalLM
105
-
106
- model_path = "HoangHa/Pensez-v0.1-e5"
107
-
108
- # Load tokenizer and model
109
- tokenizer = AutoTokenizer.from_pretrained(model_path)
110
- model = AutoModelForCausalLM.from_pretrained(
111
- model_path, torch_dtype=torch.float16, device_map="auto"
112
- )
113
-
114
- # Example input
115
- messages = [{"role": "user", "content": "Bonjour!"}]
116
- input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
117
-
118
- generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
119
- response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
120
- print(f"Réponse: {response}")
121
- ```
122
-
123
- ## Training Details
124
-
125
- Pensez was trained with:
126
- - **Packing Inputs Without Cross-Contamination Attention** ([Reference](https://github.com/MeetKai/functionary/tree/main/functionary/train/packing))
127
- - **Liger Kernel** ([Reference](https://github.com/linkedin/Liger-Kernel))
128
- - **DeepSpeed 3** ([Reference](https://github.com/deepspeedai/DeepSpeed))
129
- - **NEFTune Noise** ([Reference](https://arxiv.org/abs/2310.05914)) for robustness.
130
-
131
- | **Parameter** | **Value** |
132
- |--------------|----------|
133
- | Epochs | 5 |
134
- | Global Batch Size | 200 |
135
- | Learning Rate | 1e-5 |
136
- | Scheduler | Cosine |
137
- | Optimizer | AdamW |
138
- | Warmup Ratio | 0.05 |
139
- | Weight Decay | 0.01 |
140
- | Max Sequence Length | 16,384 |
141
-
142
- More details: [Training Config](https://huggingface.co/HoangHa/Pensez-v0.1-e5/blob/main/fr_full_sft.yaml) | Loss curves: [Wandb](https://wandb.ai/hahuyhoanghhh41/llamafactory?nw=nwuserhahuyhoanghhh41)
143
-
144
- ## Citation
145
-
146
- ```bibtex
147
- @misc{ha2025pensezreasoningfrenchllm,
148
- title={Pensez: Less Data, Better Reasoning – Rethinking French LLM},
149
- author={Ha Huy Hoang},
150
- year={2025},
151
- archivePrefix={arXiv},
152
- primaryClass={cs.CL},
153
- url={https://arxiv.org/abs/2503.13661},
154
- }
155
- ```
156
-
157
-
158
- ## Acknowledgement
159
-
160
- - [llama-factory](https://github.com/hiyouga/LLaMA-Factory)
161
- - [Deepseek R1](https://github.com/deepseek-ai/DeepSeek-R1)
162
- - [Qwen 2.5](https://github.com/QwenLM/Qwen2.5)
163
- - [NEFTune Noise](https://arxiv.org/abs/2310.05914)
164
- - [Packing Inputs Without Cross-Contamination Attention](https://github.com/MeetKai/functionary/tree/main/functionary/train/packing)
165
- - [Liger Kernel](https://github.com/linkedin/Liger-Kernel)
166
- - [Deepspeed](https://github.com/deepspeedai/DeepSpeed)
167
- - [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
168
- - [Hyperbolic](https://hyperbolic.xyz/)
169
- - [Modal](https://modal.com/)
 
 
 
 
 
 
 
 
 
 
 
170
  ```
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-7B-Instruct
4
+ datasets:
5
+ - HoangHa/Pensez-v0.1
6
+ language:
7
+ - zho
8
+ - eng
9
+ - fra
10
+ - spa
11
+ - por
12
+ - deu
13
+ - ita
14
+ - rus
15
+ - jpn
16
+ - kor
17
+ - vie
18
+ - tha
19
+ - ara
20
+ library_name: transformers
21
+ license: apache-2.0
22
+ pipeline_tag: text-generation
23
+ ---
24
+
25
+ <div align="center">
26
+
27
+ # Pensez: Less Data, Better Reasoning Rethinking French LLM
28
+
29
+ [**About**](#about) | [**How to Run Locally**](#run-locally) | [**Models and Datasets**](#models-and-datasets) | [**Benchmarks**](#benchmarks) | [**Training Details**](#training-details)
30
+
31
+
32
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630a5ef0e81e1dea2cedcec0/-QnXjQ3SRkGgYpYK9wvff.png)
33
+
34
+ </div>
35
+
36
+ ## About
37
+
38
+ Paper: [Pensez: Less Data, Better Reasoning - Rethinking French LLM](https://huggingface.co/papers/2503.13661)
39
+
40
+ Pensez is a bilingual (French-English) reasoning model designed to maximize efficiency with significantly reduced training data. The model leverages a curated dataset focusing on daily reasoning tasks and scientific questions to enhance performance.
41
+
42
+ Key strategies for improved reasoning:
43
+ - **Concise reasoning** for simple tasks to prevent overthinking.
44
+ - **Extended reasoning** for complex domains like mathematics, coding, and science.
45
+ - **Special tokens (`<think>...</think>`)** to explicitly guide the model’s reasoning process.
46
+
47
+ These optimizations result in superior reasoning capabilities while maintaining robust general understanding compared to models like [DeepSeek-R1-Distill-Qwen-7B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B).
48
+
49
+ ## Models and Datasets
50
+
51
+ ### Model Versions
52
+
53
+ Pensez is built upon [Qwen 2.5 Instruct 7B](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct) and trained over five epochs.
54
+
55
+ | Model | Backbone | Size | Download Link |
56
+ |---------------|----------------------------------------|------|---------------|
57
+ | Pensez-v0.1-e1 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e1](https://huggingface.co/HoangHa/Pensez-v0.1-e1) |
58
+ | Pensez-v0.1-e2 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e2](https://huggingface.co/HoangHa/Pensez-v0.1-e2) |
59
+ | Pensez-v0.1-e3 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e3](https://huggingface.co/HoangHa/Pensez-v0.1-e3) |
60
+ | Pensez-v0.1-e4 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e4](https://huggingface.co/HoangHa/Pensez-v0.1-e4) |
61
+ | Pensez-v0.1-e5 | Qwen2.5-7B-Instruct | 7B | [🤗 Pensez-v0.1-e5](https://huggingface.co/HoangHa/Pensez-v0.1-e5) |
62
+
63
+ ### Dataset
64
+
65
+ Pensez was trained on the hand-curated [Pensez v0.1](https://huggingface.co/datasets/HoangHa/Pensez-v0.1) dataset containing 2,000 samples (1,000 French, 1,000 English).
66
+
67
+ | Dataset | Description | Size | Link |
68
+ |--------------|----------------------|-------|-------|
69
+ | Pensez v0.1 | SFT Training Dataset | 2K samples | [🤗 Pensez v0.1](https://huggingface.co/datasets/HoangHa/Pensez-v0.1) |
70
+
71
+ ## Benchmarks
72
+
73
+ Pensez was evaluated on French-specific benchmarks, demonstrating strong reasoning ability and improved task-specific performance:
74
+
75
+ | Benchmark | Pensez-v0.1-e5 | DeepSeek-R1-Distill-Qwen-7B | Qwen2.5-7B-Instruct |
76
+ |-----------|---------------|-----------------------------|----------------------|
77
+ | Math-hard (fr) | 0.3458 | 0.3403 | 0.2253 |
78
+ | MMLU (fr) | 0.5766 | 0.4961 | 0.6612 |
79
+ | BoolQA (fr) | 0.9157 | 0.7079 | 0.9382 |
80
+ | Trivia (en) | 0.4421 | 0.2711 | 0.5316 |
81
+ | HellaSwag (en) | 0.5050 | 0.3540 | 0.5258 |
82
+
83
+ **Key Observations:**
84
+ - Pensez outperforms Qwen2.5-7B-Instruct in reasoning tasks.
85
+ - Comparable to DeepSeek-R1-Distill-Qwen-7B in reasoning while maintaining strong understanding.
86
+ - Reduced degradation in knowledge-based tasks.
87
+
88
+ <details>
89
+ <summary>Click for detailed benchmark results</summary>
90
+
91
+ | Tasks | Pensez v0.1 e1 | Pensez v0.1 e2 | Pensez v0.1 e3 | Pensez v0.1 e4 | Pensez v0.1 e5 | Qwen 7B instruct | R1 distil |
92
+ |------------------------------------------------|---------------|---------------|---------------|---------------|---------------|-----------------|-----------|
93
+ | leaderboard_math_hard_fr | 0.0918 | 0.2547 | 0.2783 | 0.3035 | 0.3458 | 0.2253 | 0.3403 |
94
+ | leaderboard_math_algebra_hard_fr | 0.1029 | 0.3914 | 0.3971 | 0.5114 | 0.5000 | 0.4229 | 0.4771 |
95
+ | leaderboard_math_counting_and_prob_hard_fr | 0.0765 | 0.1378 | 0.1939 | 0.2041 | 0.2398 | 0.1224 | 0.2347 |
96
+ | leaderboard_math_geometry_hard_fr | 0.0388 | 0.1019 | 0.1408 | 0.1359 | 0.1748 | 0.1019 | 0.2330 |
97
+ | leaderboard_math_num_theory_hard_fr | 0.1198 | 0.2581 | 0.3502 | 0.3548 | 0.4332 | 0.3180 | 0.3963 |
98
+ | leaderboard_math_prealgebra_hard_fr | 0.1681 | 0.4425 | 0.4690 | 0.4956 | 0.5841 | 0.3274 | 0.4867 |
99
+ | leaderboard_math_precalculus_hard_fr | 0.0357 | 0.0714 | 0.1190 | 0.1190 | 0.1429 | 0.0595 | 0.2143 |
100
+ | leaderboard_mmlu_fr | 0.3806 | 0.3329 | - | - | 0.5766 | 0.6612 | 0.4961 |
101
+ | french_bench_arc_challenge | 0.5047 | 0.5021 | 0.4919 | 0.4859 | 0.4842 | 0.5518 | 0.3447 |
102
+ | french_bench_boolqa | 0.9326 | 0.9326 | 0.9326 | 0.9270 | 0.9157 | 0.9382 | 0.7079 |
103
+ | french_bench_fquadv2 | 0.4325 | 0.4400 | 0.4412 | 0.4375 | 0.4387 | 0.4800 | 0.2988 |
104
+ | french_bench_hellaswag | 0.4970 | 0.5055 | 0.5092 | 0.5058 | 0.5050 | 0.5258 | 0.3540 |
105
+ | french_bench_trivia | 0.4763 | 0.4763 | 0.4553 | 0.4395 | 0.4421 | 0.5316 | 0.2711 |
106
+
107
+ </details>
108
+
109
+ ## Run Locally
110
+
111
+ You can run Pensez using Hugging Face’s `transformers` library:
112
+
113
+ ```python
114
+ import torch
115
+ from transformers import AutoTokenizer, AutoModelForCausalLM
116
+
117
+ model_path = "HoangHa/Pensez-v0.1-e5"
118
+
119
+ # Load tokenizer and model
120
+ tokenizer = AutoTokenizer.from_pretrained(model_path)
121
+ model = AutoModelForCausalLM.from_pretrained(
122
+ model_path, torch_dtype=torch.float16, device_map="auto"
123
+ )
124
+
125
+ # Example input
126
+ messages = [{"role": "user", "content": "Bonjour!"}]
127
+ input_ids = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to("cuda")
128
+
129
+ generated_ids = model.generate(input_ids, max_new_tokens=2500, temperature=0.8, repetition_penalty=1.1, do_sample=True, eos_token_id=tokenizer.eos_token_id)
130
+ response = tokenizer.decode(generated_ids[0], skip_special_tokens=True, clean_up_tokenization_space=True)
131
+ print(f"Réponse: {response}")
132
+ ```
133
+
134
+ ## Training Details
135
+
136
+ Pensez was trained with:
137
+ - **Packing Inputs Without Cross-Contamination Attention** ([Reference](https://github.com/MeetKai/functionary/tree/main/functionary/train/packing))
138
+ - **Liger Kernel** ([Reference](https://github.com/linkedin/Liger-Kernel))
139
+ - **DeepSpeed 3** ([Reference](https://github.com/deepspeedai/DeepSpeed))
140
+ - **NEFTune Noise** ([Reference](https://arxiv.org/abs/2310.05914)) for robustness.
141
+
142
+ | **Parameter** | **Value** |
143
+ |--------------|----------|
144
+ | Epochs | 5 |
145
+ | Global Batch Size | 200 |
146
+ | Learning Rate | 1e-5 |
147
+ | Scheduler | Cosine |
148
+ | Optimizer | AdamW |
149
+ | Warmup Ratio | 0.05 |
150
+ | Weight Decay | 0.01 |
151
+ | Max Sequence Length | 16,384 |
152
+
153
+ More details: [Training Config](https://huggingface.co/HoangHa/Pensez-v0.1-e5/blob/main/fr_full_sft.yaml) | Loss curves: [Wandb](https://wandb.ai/hahuyhoanghhh41/llamafactory?nw=nwuserhahuyhoanghhh41)
154
+
155
+ ## Citation
156
+
157
+ ```bibtex
158
+ @misc{ha2025pensezreasoningfrenchllm,
159
+ title={Pensez: Less Data, Better Reasoning – Rethinking French LLM},
160
+ author={Ha Huy Hoang},
161
+ year={2025},
162
+ archivePrefix={arXiv},
163
+ primaryClass={cs.CL},
164
+ url={https://arxiv.org/abs/2503.13661},
165
+ }
166
+ ```
167
+
168
+
169
+ ## Acknowledgement
170
+
171
+ - [llama-factory](https://github.com/hiyouga/LLaMA-Factory)
172
+ - [Deepseek R1](https://github.com/deepseek-ai/DeepSeek-R1)
173
+ - [Qwen 2.5](https://github.com/QwenLM/Qwen2.5)
174
+ - [NEFTune Noise](https://arxiv.org/abs/2310.05914)
175
+ - [Packing Inputs Without Cross-Contamination Attention](https://github.com/MeetKai/functionary/tree/main/functionary/train/packing)
176
+ - [Liger Kernel](https://github.com/linkedin/Liger-Kernel)
177
+ - [Deepspeed](https://github.com/deepspeedai/DeepSpeed)
178
+ - [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness)
179
+ - [Hyperbolic](https://hyperbolic.xyz/)
180
+ - [Modal](https://modal.com/)
181
  ```