lvj
/

Qwen3-4B-parq-2b-weight-4b-embed-shared

Text Generation

text-generation-inference

Model card Files Files and versions

lvj commited on 12 days ago

Commit

22405f5

·

verified ·

1 Parent(s): a32a7c5

Update README.md

Files changed (1) hide show

README.md +3 -2

README.md CHANGED Viewed

@@ -29,9 +29,9 @@ uv pip install --pre --index-url https://download.pytorch.org/whl/nightly/cu126
 ## QAT Finetuning with PARQ
-The checkpoint uploaded here was trained with a LR of 4.5e-5 on 32 GPUs with a per-device batch size of 2 using an internal codebase.
-We can approximate the training pipeline with an open source implementation. Adjust the `ngpu`, `device_batch_size`, `grad_accum_steps`, and `lr` variables below to fit your setup.
 Fetch the training script by running `curl -O https://huggingface.co/datasets/lvj/parq-sft/resolve/main/qat_sft.py` before running the below.
@@ -68,6 +68,7 @@ TRANSFORMERS_VERBOSITY=error TOKENIZERS_PARALLELISM=$(( ngpu == 1 )) \
     --warmup_ratio 0.0 \
     --seed $SEED \
     --output_dir $SAVE_DIR \
     --weight_bits 2 \
     --linear_pat 'proj\.weight$' \
     --embed_pat '(lm_head|embed_tokens)'

 ## QAT Finetuning with PARQ
+We apply QAT with a torchao optimizer-only package called [PARQ](https://github.com/pytorch/ao/tree/main/torchao/prototype/parq). The checkpoint uploaded here was trained with a LR of 4.5e-5 on 32 GPUs with a per-device batch size of 2 using an internal codebase.
+An open source implementation of the training script is provided below. Adjust the `ngpu`, `device_batch_size`, `grad_accum_steps`, and `lr` variables below to fit your setup.
 Fetch the training script by running `curl -O https://huggingface.co/datasets/lvj/parq-sft/resolve/main/qat_sft.py` before running the below.
     --warmup_ratio 0.0 \
     --seed $SEED \
     --output_dir $SAVE_DIR \
+    --enable_thinking \
     --weight_bits 2 \
     --linear_pat 'proj\.weight$' \
     --embed_pat '(lm_head|embed_tokens)'