Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_588_FlashRL_G4-L2048_new

This repository contains a checkpoint trained with GRPO on open-r1/DAPO-Math-17k-Processed starting from Qwen/Qwen2.5-1.5B-Instruct.
This snapshot corresponds to training step 588.

Contents include:

Model weights (.safetensors)
Config files (config.json, generation_config.json)
Tokenizer files (tokenizer.json, tokenizer_config.json, vocab.json, merges.txt, special_tokens_map.json, added_tokens.json)
Optional chat template (chat_template.jinja)

Training artifacts (optimizer/scheduler states and RNG) have been intentionally excluded.

Downloads last month: 334

Safetensors

Model size

2B params

Tensor type

F32

Video Preview

Reinforcement Learning

Model tree for AzalKhan/Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_588_FlashRL_G4-L2048_new

Base model

Qwen/Qwen2.5-1.5B

Finetuned

Qwen/Qwen2.5-1.5B-Instruct

Finetuned

(1236)

this model

AzalKhan
/

Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_588_FlashRL_G4-L2048_new

Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_588_FlashRL_G4-L2048_new

Model tree for AzalKhan/Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_588_FlashRL_G4-L2048_new

Dataset used to train AzalKhan/Qwen2.5-1.5B-Instruct_BF16_open-r1-DAPO-Math-17k-Processed_588_FlashRL_G4-L2048_new