lvj commited on
Commit
22405f5
·
verified ·
1 Parent(s): a32a7c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -29,9 +29,9 @@ uv pip install --pre --index-url https://download.pytorch.org/whl/nightly/cu126
29
 
30
  ## QAT Finetuning with PARQ
31
 
32
- The checkpoint uploaded here was trained with a LR of 4.5e-5 on 32 GPUs with a per-device batch size of 2 using an internal codebase.
33
 
34
- We can approximate the training pipeline with an open source implementation. Adjust the `ngpu`, `device_batch_size`, `grad_accum_steps`, and `lr` variables below to fit your setup.
35
 
36
  Fetch the training script by running `curl -O https://huggingface.co/datasets/lvj/parq-sft/resolve/main/qat_sft.py` before running the below.
37
 
@@ -68,6 +68,7 @@ TRANSFORMERS_VERBOSITY=error TOKENIZERS_PARALLELISM=$(( ngpu == 1 )) \
68
  --warmup_ratio 0.0 \
69
  --seed $SEED \
70
  --output_dir $SAVE_DIR \
 
71
  --weight_bits 2 \
72
  --linear_pat 'proj\.weight$' \
73
  --embed_pat '(lm_head|embed_tokens)'
 
29
 
30
  ## QAT Finetuning with PARQ
31
 
32
+ We apply QAT with a torchao optimizer-only package called [PARQ](https://github.com/pytorch/ao/tree/main/torchao/prototype/parq). The checkpoint uploaded here was trained with a LR of 4.5e-5 on 32 GPUs with a per-device batch size of 2 using an internal codebase.
33
 
34
+ An open source implementation of the training script is provided below. Adjust the `ngpu`, `device_batch_size`, `grad_accum_steps`, and `lr` variables below to fit your setup.
35
 
36
  Fetch the training script by running `curl -O https://huggingface.co/datasets/lvj/parq-sft/resolve/main/qat_sft.py` before running the below.
37
 
 
68
  --warmup_ratio 0.0 \
69
  --seed $SEED \
70
  --output_dir $SAVE_DIR \
71
+ --enable_thinking \
72
  --weight_bits 2 \
73
  --linear_pat 'proj\.weight$' \
74
  --embed_pat '(lm_head|embed_tokens)'