Update README.md
Browse files
README.md
CHANGED
|
@@ -81,7 +81,7 @@ The model was trained on a subset of [FineWeb-edu](https://huggingface.co/datase
|
|
| 81 |
- Activations quantized to 8-bit precision
|
| 82 |
|
| 83 |
10. **Key Findings**
|
| 84 |
-
- Warmup quantization (linear lambda scheduler) proved crucial for performance
|
| 85 |
|
| 86 |
These 10B token training runs showed that it's possible to effectively fine-tune pre-trained models to 1.58-bit precision, achieving strong performance with relatively limited additional training data.
|
| 87 |
|
|
|
|
| 81 |
- Activations quantized to 8-bit precision
|
| 82 |
|
| 83 |
10. **Key Findings**
|
| 84 |
+
- Warmup quantization (sigmoid or linear lambda scheduler) proved crucial for performance
|
| 85 |
|
| 86 |
These 10B token training runs showed that it's possible to effectively fine-tune pre-trained models to 1.58-bit precision, achieving strong performance with relatively limited additional training data.
|
| 87 |
|