L-Hongbin
's Collections
Search, Verify and Feedback: Towards Next Generation Post-training
Paradigm of Foundation Models via Verifier Engineering
Paper
•
2411.11504
•
Published
•
23
Top-nσ: Not All Logits Are You Need
Paper
•
2411.07641
•
Published
•
23
Adaptive Decoding via Latent Preference Optimization
Paper
•
2411.09661
•
Published
•
10
When Precision Meets Position: BFloat16 Breaks Down RoPE in Long-Context
Training
Paper
•
2411.13476
•
Published
•
16
Viewer
•
Updated
•
2.2M
•
5.01k
•
369
Hymba: A Hybrid-head Architecture for Small Language Models
Paper
•
2411.13676
•
Published
•
45
TÜLU 3: Pushing Frontiers in Open Language Model Post-Training
Paper
•
2411.15124
•
Published
•
66
Star Attention: Efficient LLM Inference over Long Sequences
Paper
•
2411.17116
•
Published
•
55
O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple
Distillation, Big Progress or Bitter Lesson?
Paper
•
2411.16489
•
Published
•
47
MH-MoE:Multi-Head Mixture-of-Experts
Paper
•
2411.16205
•
Published
•
28
nGPT: Normalized Transformer with Representation Learning on the
Hypersphere
Paper
•
2410.01131
•
Published
•
10
Viewer
•
Updated
•
77.7k
•
466
•
382
Viewer
•
Updated
•
860k
•
12.7k
•
500
Viewer
•
Updated
•
327
•
39
•
134
allenai/tulu-3-sft-mixture
Viewer
•
Updated
•
939k
•
14.6k
•
181
CASIA-LM/ChineseWebText2.0
Viewer
•
Updated
•
2k
•
1.63k
•
26
Yi-Lightning Technical Report
Paper
•
2412.01253
•
Published
•
28
Training Large Language Models to Reason in a Continuous Latent Space
Paper
•
2412.06769
•
Published
•
89
Weighted-Reward Preference Optimization for Implicit Model Fusion
Paper
•
2412.03187
•
Published
•
12
Paper
•
2412.08905
•
Published
•
121
SPaR: Self-Play with Tree-Search Refinement to Improve
Instruction-Following in Large Language Models
Paper
•
2412.11605
•
Published
•
18
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and
Post-LN
Paper
•
2412.13795
•
Published
•
20
Paper
•
2412.15115
•
Published
•
376
A Post-Training Enhanced Optimization Approach for Small Language Models
Paper
•
2411.02939
•
Published
Viewer
•
Updated
•
133k
•
99
•
150
How to Synthesize Text Data without Model Collapse?
Paper
•
2412.14689
•
Published
•
52
Viewer
•
Updated
•
18.7M
•
96
•
55
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
•
2412.14922
•
Published
•
88
DRT-o1: Optimized Deep Reasoning Translation via Long Chain-of-Thought
Paper
•
2412.17498
•
Published
•
22
B-STaR: Monitoring and Balancing Exploration and Exploitation in
Self-Taught Reasoners
Paper
•
2412.17256
•
Published
•
47
OpenRFT: Adapting Reasoning Foundation Model for Domain-specific Tasks
with Reinforcement Fine-Tuning
Paper
•
2412.16849
•
Published
•
9
rStar-Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep
Thinking
Paper
•
2501.04519
•
Published
•
285
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
•
2501.08313
•
Published
•
298
OpenCSG Chinese Corpus: A Series of High-quality Chinese Datasets for
LLM Training
Paper
•
2501.08197
•
Published
•
9
Viewer
•
Updated
•
1k
•
1.22k
•
226
Infi-MM/InfiMM-WebMath-40B
Viewer
•
Updated
•
22.8M
•
2.15k
•
68
Exploring the Limit of Outcome Reward for Learning Mathematical
Reasoning
Paper
•
2502.06781
•
Published
•
59
Technologies on Effectiveness and Efficiency: A Survey of State Spaces
Models
Paper
•
2503.11224
•
Published
•
28
The Lessons of Developing Process Reward Models in Mathematical
Reasoning
Paper
•
2501.07301
•
Published
•
99
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU
Inference via Dynamic-Length Float
Paper
•
2504.11651
•
Published
•
31
Paper2Code: Automating Code Generation from Scientific Papers in Machine
Learning
Paper
•
2504.17192
•
Published
•
120
Viewer
•
Updated
•
692k
•
477
•
16
AdaptThink: Reasoning Models Can Learn When to Think
Paper
•
2505.13417
•
Published
•
82
Multi-Token Prediction Needs Registers
Paper
•
2505.10518
•
Published
•
14
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Paper
•
2505.14669
•
Published
•
77
Distilling LLM Agent into Small Models with Retrieval and Code Tools
Paper
•
2505.17612
•
Published
•
81
J1: Incentivizing Thinking in LLM-as-a-Judge via Reinforcement Learning
Paper
•
2505.10320
•
Published
•
24
SeerAttention-R: Sparse Attention Adaptation for Long Reasoning
Paper
•
2506.08889
•
Published
•
23
QuantAgent: Price-Driven Multi-Agent LLMs for High-Frequency Trading
Paper
•
2509.09995
•
Published
•
14
SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning
Paper
•
2509.09674
•
Published
•
78
Causal Attention with Lookahead Keys
Paper
•
2509.07301
•
Published
•
21
A Survey of Reinforcement Learning for Large Reasoning Models
Paper
•
2509.08827
•
Published
•
183
Why Low-Precision Transformer Training Fails: An Analysis on Flash
Attention
Paper
•
2510.04212
•
Published
•
22
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for
Generalist Robot Policy
Paper
•
2510.13778
•
Published
•
16
Direct Multi-Token Decoding
Paper
•
2510.11958
•
Published
•
5
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning
for LLMs
Paper
•
2510.11696
•
Published
•
165
Attention Is All You Need for KV Cache in Diffusion LLMs
Paper
•
2510.14973
•
Published
•
36