Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
xlalex 's Collections
omni
infra
synthesis
perception
survey
RL
critic
speech full duplex
agent
self-paly

self-paly

updated 20 days ago
Upvote
-

  • Absolute Zero: Reinforced Self-play Reasoning with Zero Data

    Paper • 2505.03335 • Published May 6 • 185

  • Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers

    Paper • 2408.06195 • Published Aug 12, 2024 • 73

  • Beyond Pass@1: Self-Play with Variational Problem Synthesis Sustains RLVR

    Paper • 2508.14029 • Published Aug 19 • 118

  • Vision-Zero: Scalable VLM Self-Improvement via Strategic Gamified Self-Play

    Paper • 2509.25541 • Published Sep 29 • 138

  • PromptCoT 2.0: Scaling Prompt Synthesis for Large Language Model Reasoning

    Paper • 2509.19894 • Published Sep 24 • 32

  • SPC: Evolving Self-Play Critic via Adversarial Games for LLM Reasoning

    Paper • 2504.19162 • Published Apr 27 • 18

  • SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning

    Paper • 2506.24119 • Published Jun 30 • 50

  • AceSearcher: Bootstrapping Reasoning and Search for LLMs via Reinforced Self-Play

    Paper • 2509.24193 • Published Sep 29 • 6

  • B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners

    Paper • 2412.17256 • Published Dec 23, 2024 • 47

  • Self-Discover: Large Language Models Self-Compose Reasoning Structures

    Paper • 2402.03620 • Published Feb 6, 2024 • 117
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs