6 11 8

suu

Suu

AI & ML interests

None yet

Recent Activity

authored a paper 3 days ago

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

updated a collection 3 days ago

KlearReasoner

upvoted a paper 3 days ago

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

View all activity

Organizations

commented a paper 3 days ago

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

Paper • 2512.05591 • Published 6 days ago • 16 •

commented 3 papers 3 months ago

CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning

Paper • 2509.20712 • Published Sep 25 • 19 •

CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning

Paper • 2509.20712 • Published Sep 25 • 19 •

CE-GPPO: Controlling Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning

Paper • 2509.20712 • Published Sep 25 • 19 •

New activity in Kwai-Klear/Klear-Reasoner-8B 3 months ago

Improve model card: Add pipeline tag, library name, code/project links, and sample usage

#1 opened 4 months ago by

nielsr

请问评测结果是64K max_new_token下吗？

#2 opened 3 months ago by

JjjjjZzz

commented 2 papers 4 months ago

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published Aug 11 • 42 •

Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization

Paper • 2508.07629 • Published Aug 11 • 42 •

New activity in nvidia/AceReason-Nemotron-14B 6 months ago

Is it possible to open-source the 2k+ difficult samples from math stage3 separately, as well as the code training data?

#2 opened 6 months ago by

Suu

Is it possible to open-source the 2k+ difficult samples from math stage3 separately, as well as the code training data?

#2 opened 6 months ago by

Suu

suu

AI & ML interests

Recent Activity

Organizations

Suu's activity

Improve model card: Add pipeline tag, library name, code/project links, and sample usage

请问评测结果是64K max_new_token下吗？

Is it possible to open-source the 2k+ difficult samples from math stage3 separately, as well as the code training data?

Is it possible to open-source the 2k+ difficult samples from math stage3 separately, as well as the code training data?