18 31 27

Shizhe Diao

shizhediao2

https://shizhediao.github.io/

AI & ML interests

LLM pre-training and reasoning

Recent Activity

upvoted a paper 5 days ago

Unified Reinforcement and Imitation Learning for Vision-Language Models

updated a dataset 6 days ago

nvidia/Nemotron-ClimbMix

upvoted a paper 7 days ago

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

View all activity

Organizations

upvoted a paper 5 days ago

Unified Reinforcement and Imitation Learning for Vision-Language Models

Paper • 2510.19307 • Published 5 days ago • 24

updated a dataset 6 days ago

nvidia/Nemotron-ClimbMix

Viewer • Updated 6 days ago • 355M • 748 • 32

upvoted 2 papers 7 days ago

DLER: Doing Length pEnalty Right - Incentivizing More Intelligence per Token via Reinforcement Learning

Paper • 2510.15110 • Published 11 days ago • 15

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published 10 days ago • 77

updated a model 12 days ago

shizhediao2/ToolOrchestrator-8B

Updated 12 days ago

published a model 12 days ago

shizhediao2/ToolOrchestrator-8B

Updated 12 days ago

upvoted 2 papers 13 days ago

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published 14 days ago • 166

GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving

Paper • 2510.11769 • Published 14 days ago • 25

upvoted an article 25 days ago

Article

Finally, a Replacement for BERT: Introducing ModernBERT

Dec 19, 2024

• 704

upvoted a paper 25 days ago

BroRL: Scaling Reinforcement Learning via Broadened Exploration

Paper • 2510.01180 • Published 26 days ago • 17

upvoted a paper 26 days ago

DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search

Paper • 2509.25454 • Published 28 days ago • 133

upvoted an article about 1 month ago

Article

Introducing smolagents: simple agents that write actions in code.

Dec 31, 2024

• 1.14k

upvoted a paper about 2 months ago

Beyond Correctness: Harmonizing Process and Outcome Rewards through RL Training

Paper • 2509.03403 • Published Sep 3 • 21

updated a model 2 months ago

shizhediao2/Llama-Nemotron-8B-v1-Prorl

Updated Aug 25

published a model 2 months ago

shizhediao2/Llama-Nemotron-8B-v1-Prorl

Updated Aug 25

upvoted a paper 3 months ago

Beyond Ten Turns: Unlocking Long-Horizon Agentic Search with Large-Scale Asynchronous RL

Paper • 2508.07976 • Published Aug 11 • 51

updated a model 3 months ago

nvidia/Nemotron-Research-Reasoning-Qwen-1.5B

Text Generation • 2B • Updated Aug 12 • 13.8k • 222

New activity in nvidia/Nemotron-Research-Reasoning-Qwen-1.5B 3 months ago

Can you open source your training dataset on STEM (after selection)? Thanks! :)

#6 opened 3 months ago by

Jianshu001

liked 2 models 3 months ago

Qwen/Qwen3-4B-Instruct-2507

Text Generation • 4B • Updated Sep 17 • 3.67M • • 423

Menlo/Lucy

Text Generation • 2B • Updated Aug 4 • 503 • 64