Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_588_FlashRL_G4-L2048_new
This repository contains a checkpoint trained with GRPO on open-r1/DAPO-Math-17k-Processed starting from Qwen/Qwen2.5-1.5B-Instruct.
This snapshot corresponds to training step 588.
Contents include:
- Model weights (
.safetensors) - Config files (
config.json,generation_config.json) - Tokenizer files (
tokenizer.json,tokenizer_config.json,vocab.json,merges.txt,special_tokens_map.json,added_tokens.json) - Optional chat template (
chat_template.jinja)
Training artifacts (optimizer/scheduler states and RNG) have been intentionally excluded.
- Downloads last month
- 334