🤝 Open to Collab
Frolov Anatolii
ssurface
·
AI & ML interests
None yet
Recent Activity
updated a collection about 21 hours ago
GRPO SFT-Length-Punishment-GDPO SFT GSM8K updated a model about 21 hours ago
ssurface/qwen3-4b-gdpo-length-sft-l5 published a model about 21 hours ago
ssurface/qwen3-4b-gdpo-length-sft-l5Organizations
GRPO abstract reward SFT GSM8K
Qwen3-4B CoT Compression Study
LoRA adapters trained for 5 progressively shorter chain-of-thought styles on GSM8K, plus the eval artifacts behind the Pareto curve.