Update README.md
Browse files
README.md
CHANGED
|
@@ -29,9 +29,9 @@ uv pip install --pre --index-url https://download.pytorch.org/whl/nightly/cu126
|
|
| 29 |
|
| 30 |
## QAT Finetuning with PARQ
|
| 31 |
|
| 32 |
-
The checkpoint uploaded here was trained with a LR of 4.5e-5 on 32 GPUs with a per-device batch size of 2 using an internal codebase.
|
| 33 |
|
| 34 |
-
|
| 35 |
|
| 36 |
Fetch the training script by running `curl -O https://huggingface.co/datasets/lvj/parq-sft/resolve/main/qat_sft.py` before running the below.
|
| 37 |
|
|
@@ -68,6 +68,7 @@ TRANSFORMERS_VERBOSITY=error TOKENIZERS_PARALLELISM=$(( ngpu == 1 )) \
|
|
| 68 |
--warmup_ratio 0.0 \
|
| 69 |
--seed $SEED \
|
| 70 |
--output_dir $SAVE_DIR \
|
|
|
|
| 71 |
--weight_bits 2 \
|
| 72 |
--linear_pat 'proj\.weight$' \
|
| 73 |
--embed_pat '(lm_head|embed_tokens)'
|
|
|
|
| 29 |
|
| 30 |
## QAT Finetuning with PARQ
|
| 31 |
|
| 32 |
+
We apply QAT with a torchao optimizer-only package called [PARQ](https://github.com/pytorch/ao/tree/main/torchao/prototype/parq). The checkpoint uploaded here was trained with a LR of 4.5e-5 on 32 GPUs with a per-device batch size of 2 using an internal codebase.
|
| 33 |
|
| 34 |
+
An open source implementation of the training script is provided below. Adjust the `ngpu`, `device_batch_size`, `grad_accum_steps`, and `lr` variables below to fit your setup.
|
| 35 |
|
| 36 |
Fetch the training script by running `curl -O https://huggingface.co/datasets/lvj/parq-sft/resolve/main/qat_sft.py` before running the below.
|
| 37 |
|
|
|
|
| 68 |
--warmup_ratio 0.0 \
|
| 69 |
--seed $SEED \
|
| 70 |
--output_dir $SAVE_DIR \
|
| 71 |
+
--enable_thinking \
|
| 72 |
--weight_bits 2 \
|
| 73 |
--linear_pat 'proj\.weight$' \
|
| 74 |
--embed_pat '(lm_head|embed_tokens)'
|