zai-org
/

GLM-Z1-9B-0414

Text Generation

Model card Files Files and versions

davidlvxin commited on Apr 14

Commit

5f6580e

·

verified ·

1 Parent(s): 8110142

Update README.md

Files changed (1) hide show

README.md +38 -0

README.md CHANGED Viewed

@@ -13,6 +13,44 @@ library_name: transformers
 Based on our latest technological advancements, we have trained a `GLM-4-0414` series model. During pretraining, we incorporated more code-related and reasoning-related data. In the alignment phase, we optimized the model specifically for agent capabilities. As a result, the model's performance in agent tasks such as tool use, web search, and coding has been significantly improved.
 ## Inference Code
 Make Sure Using `transforemrs>=4.51.3`.

 Based on our latest technological advancements, we have trained a `GLM-4-0414` series model. During pretraining, we incorporated more code-related and reasoning-related data. In the alignment phase, we optimized the model specifically for agent capabilities. As a result, the model's performance in agent tasks such as tool use, web search, and coding has been significantly improved.
+## Model Usage Guidelines
+### I. Sampling Parameters
+| Parameter    | Recommended Value | Description                                  |
+| ------------ | ----------------- | -------------------------------------------- |
+| temperature  | **0.6**           | Balances creativity and stability            |
+| top_p        | **0.95**          | Cumulative probability threshold for sampling|
+| top_k        | **20–40**         | Filters out rare tokens while maintaining diversity |
+| max_new_tokens        | **30000**         | Leaves enough tokens for thinking |
+### II. Enforced Thinking
+- Add \<think\>\n to the **first line**: Ensures the model thinks before responding
+- When using `chat_template.jinja`, the prompt is automatically injected to enforce this behavior
+### III. Dialogue History Trimming
+- Retain only the **final user-visible reply**.
+  Hidden thinking content should **not** be saved to history to reduce interference—this is already implemented in `chat_template.jinja`
+### IV. Handling Long Contexts (YaRN)
+- When input length exceeds **8,192 tokens**, consider enabling YaRN (Rope Scaling)
+- In supported frameworks, add the following snippet to `config.json`:
+  ```json
+  "rope_scaling": {
+    "type": "yarn",
+    "factor": 4.0,
+    "original_max_position_embeddings": 32768
+  }
+  ```
+- **Static YaRN** applies uniformly to all text. It may slightly degrade performance on short texts, so enable as needed.
 ## Inference Code
 Make Sure Using `transforemrs>=4.51.3`.