Update README.md
Browse files
README.md
CHANGED
|
@@ -13,6 +13,44 @@ library_name: transformers
|
|
| 13 |
|
| 14 |
Based on our latest technological advancements, we have trained a `GLM-4-0414` series model. During pretraining, we incorporated more code-related and reasoning-related data. In the alignment phase, we optimized the model specifically for agent capabilities. As a result, the model's performance in agent tasks such as tool use, web search, and coding has been significantly improved.
|
| 15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
## Inference Code
|
| 17 |
|
| 18 |
Make Sure Using `transforemrs>=4.51.3`.
|
|
|
|
| 13 |
|
| 14 |
Based on our latest technological advancements, we have trained a `GLM-4-0414` series model. During pretraining, we incorporated more code-related and reasoning-related data. In the alignment phase, we optimized the model specifically for agent capabilities. As a result, the model's performance in agent tasks such as tool use, web search, and coding has been significantly improved.
|
| 15 |
|
| 16 |
+
## Model Usage Guidelines
|
| 17 |
+
### I. Sampling Parameters
|
| 18 |
+
|
| 19 |
+
| Parameter | Recommended Value | Description |
|
| 20 |
+
| ------------ | ----------------- | -------------------------------------------- |
|
| 21 |
+
| temperature | **0.6** | Balances creativity and stability |
|
| 22 |
+
| top_p | **0.95** | Cumulative probability threshold for sampling|
|
| 23 |
+
| top_k | **20–40** | Filters out rare tokens while maintaining diversity |
|
| 24 |
+
| max_new_tokens | **30000** | Leaves enough tokens for thinking |
|
| 25 |
+
|
| 26 |
+
### II. Enforced Thinking
|
| 27 |
+
|
| 28 |
+
- Add \<think\>\n to the **first line**: Ensures the model thinks before responding
|
| 29 |
+
- When using `chat_template.jinja`, the prompt is automatically injected to enforce this behavior
|
| 30 |
+
|
| 31 |
+
|
| 32 |
+
### III. Dialogue History Trimming
|
| 33 |
+
|
| 34 |
+
- Retain only the **final user-visible reply**.
|
| 35 |
+
Hidden thinking content should **not** be saved to history to reduce interference—this is already implemented in `chat_template.jinja`
|
| 36 |
+
|
| 37 |
+
|
| 38 |
+
### IV. Handling Long Contexts (YaRN)
|
| 39 |
+
|
| 40 |
+
- When input length exceeds **8,192 tokens**, consider enabling YaRN (Rope Scaling)
|
| 41 |
+
|
| 42 |
+
- In supported frameworks, add the following snippet to `config.json`:
|
| 43 |
+
|
| 44 |
+
```json
|
| 45 |
+
"rope_scaling": {
|
| 46 |
+
"type": "yarn",
|
| 47 |
+
"factor": 4.0,
|
| 48 |
+
"original_max_position_embeddings": 32768
|
| 49 |
+
}
|
| 50 |
+
```
|
| 51 |
+
|
| 52 |
+
- **Static YaRN** applies uniformly to all text. It may slightly degrade performance on short texts, so enable as needed.
|
| 53 |
+
|
| 54 |
## Inference Code
|
| 55 |
|
| 56 |
Make Sure Using `transforemrs>=4.51.3`.
|