tianchi007 's Collections llm_pretrain
updated
Paper
• 2412.08905
• Published
• 122
Evaluating and Aligning CodeLLMs on Human Preference
Paper
• 2412.05210
• Published
• 50
Evaluating Language Models as Synthetic Data Generators
Paper
• 2412.03679
• Published
• 47
Yi-Lightning Technical Report
Paper
• 2412.01253
• Published
• 28
Large Language Model-Brained GUI Agents: A Survey
Paper
• 2411.18279
• Published
• 30
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
• 2411.16489
• Published
• 45
Marco-o1: Towards Open Reasoning Models for Open-Ended Solutions
Paper
• 2411.14405
• Published
• 61
Natural Language Reinforcement Learning
Paper
• 2411.14251
• Published
• 31
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer
Use
Paper
• 2411.10323
• Published
• 34
Large Language Models Can Self-Improve in Long-context Reasoning
Paper
• 2411.08147
• Published
• 65
A Survey of Small Language Models
Paper
• 2410.20011
• Published
• 46
Paper
• 2410.21276
• Published
• 87
Qwen2.5-Coder Technical Report
Paper
• 2409.12186
• Published
• 153
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for
Reinforcement Learning and Monte-Carlo Tree Search
Paper
• 2408.08152
• Published
• 61
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code
Intelligence
Paper
• 2406.11931
• Published
• 69
Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers
Paper
• 2408.06195
• Published
• 73
Paper
• 2412.15115
• Published
• 377
Paper
• 2412.13501
• Published
• 29
DeepSeek-V3 Technical Report
Paper
• 2412.19437
• Published
• 75
Direct Language Model Alignment from Online AI Feedback
Paper
• 2402.04792
• Published
• 34
Solving math word problems with process- and outcome-based feedback
Paper
• 2211.14275
• Published
• 10
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
• 2501.07301
• Published
• 100
Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published
• 115
OmniThink: Expanding Knowledge Boundaries in Machine Writing through
Thinking
Paper
• 2501.09751
• Published
• 46
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with
Large Language Models
Paper
• 2501.09686
• Published
• 41
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
Paper
• 2502.02737
• Published
• 254
s1: Simple test-time scaling
Paper
• 2501.19393
• Published
• 124
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model
Post-training
Paper
• 2501.17161
• Published
• 124
Qwen2.5-1M Technical Report
Paper
• 2501.15383
• Published
• 72
Best Practices and Lessons Learned on Synthetic Data for Language Models
Paper
• 2404.07503
• Published
• 31
Pre-training Small Base LMs with Fewer Tokens
Paper
• 2404.08634
• Published
• 36
CodecLM: Aligning Language Models with Tailored Synthetic Data
Paper
• 2404.05875
• Published
• 18
AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web
Navigating Agent
Paper
• 2404.03648
• Published
• 29
ChatGLM-Math: Improving Math Problem-Solving in Large Language Models
with a Self-Critique Pipeline
Paper
• 2404.02893
• Published
• 21
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published
• 62
START: Self-taught Reasoner with Tools
Paper
• 2503.04625
• Published
• 113
Large Language Model Agent: A Survey on Methodology, Applications and
Challenges
Paper
• 2503.21460
• Published
• 83
A Comprehensive Survey on Long Context Language Modeling
Paper
• 2503.17407
• Published
• 49
Paper
• 2503.19786
• Published
• 55
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Paper
• 2503.20201
• Published
• 48
A Survey of Efficient Reasoning for Large Reasoning Models: Language,
Multimodality, and Beyond
Paper
• 2503.21614
• Published
• 43
Why Do Multi-Agent LLM Systems Fail?
Paper
• 2503.13657
• Published
• 48
Process-based Self-Rewarding Language Models
Paper
• 2503.03746
• Published
• 39
Rethinking Reflection in Pre-Training
Paper
• 2504.04022
• Published
• 80
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
• 2504.07128
• Published
• 87
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning
Paper
• 2505.24726
• Published
• 277
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective
Reinforcement Learning for LLM Reasoning
Paper
• 2506.01939
• Published
• 188
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in
Large Language Models
Paper
• 2505.24864
• Published
• 143
The Entropy Mechanism of Reinforcement Learning for Reasoning Language
Models
Paper
• 2505.22617
• Published
• 131
Scaling Test-time Compute for LLM Agents
Paper
• 2506.12928
• Published
• 63
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper
• 2505.24863
• Published
• 97
Confidence Is All You Need: Few-Shot RL Fine-Tuning of Language Models
Paper
• 2506.06395
• Published
• 133