Submitted by XUANMINGZHANG 2 Generalization or Memorization: Dynamic Decoding for Mode Steering Stanford NLP 1
Submitted by simonycl 17 Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity Stanford NLP 371 3
Submitted by fangwu97 136 DeepSearch: Overcome the Bottleneck of Reinforcement Learning with Verifiable Rewards via Monte Carlo Tree Search Stanford NLP 3