fix the calculation of act params
Browse files- README_WEIGHTS.md +1 -1
README_WEIGHTS.md
CHANGED
|
@@ -18,7 +18,7 @@ The DeepSeek-V3 weight file consists of two main components: **Main Model Weight
|
|
| 18 |
- Input/output embedding layers and a complete set of 61 Transformer hidden layers.
|
| 19 |
- **Parameter Count**:
|
| 20 |
- Total parameters: **671B**
|
| 21 |
-
- Activation parameters: **36.
|
| 22 |
|
| 23 |
#### Structural Details
|
| 24 |
|
|
|
|
| 18 |
- Input/output embedding layers and a complete set of 61 Transformer hidden layers.
|
| 19 |
- **Parameter Count**:
|
| 20 |
- Total parameters: **671B**
|
| 21 |
+
- Activation parameters: **36.6B**.
|
| 22 |
|
| 23 |
#### Structural Details
|
| 24 |
|