self-paly - a xlalex Collection

xlalex 's Collections

omni

infra

survey

RL

critic

speech full duplex

agent

self-paly

updated 20 days ago

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Paper • 2505.03335 • Published May 6 • 185
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

Paper • 2408.06195 • Published Aug 12, 2024 • 73
Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

Paper • 2508.14029 • Published Aug 19 • 118
Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

Paper • 2509.25541 • Published Sep 29 • 138
PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning

Paper • 2509.19894 • Published Sep 24 • 32
SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

Paper • 2504.19162 • Published Apr 27 • 18
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

Paper • 2506.24119 • Published Jun 30 • 50
AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

Paper • 2509.24193 • Published Sep 29 • 6
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

Paper • 2412.17256 • Published Dec 23, 2024 • 47
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6, 2024 • 117