Skynet
updated
FLAME: Factuality-Aware Alignment for Large Language Models
Paper
• 2405.01525
• Published
• 29
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale
Synthetic Data
Paper
• 2405.14333
• Published
• 44
Transformers Can Do Arithmetic with the Right Embeddings
Paper
• 2405.17399
• Published
• 54
EasyAnimate: A High-Performance Long Video Generation Method based on
Transformer Architecture
Paper
• 2405.18991
• Published
• 12
The Prompt Report: A Systematic Survey of Prompting Techniques
Paper
• 2406.06608
• Published
• 68
Autoregressive Model Beats Diffusion: Llama for Scalable Image
Generation
Paper
• 2406.06525
• Published
• 71
Transformers meet Neural Algorithmic Reasoners
Paper
• 2406.09308
• Published
• 44
Self-MoE: Towards Compositional Large Language Models with
Self-Specialized Experts
Paper
• 2406.12034
• Published
• 16
A Closer Look into Mixture-of-Experts in Large Language Models
Paper
• 2406.18219
• Published
• 17
DiffusionPDE: Generative PDE-Solving Under Partial Observation
Paper
• 2406.17763
• Published
• 24
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data
Paper
• 2406.18790
• Published
• 34
Controlling Space and Time with Diffusion Models
Paper
• 2407.07860
• Published
• 17
Lookback Lens: Detecting and Mitigating Contextual Hallucinations in
Large Language Models Using Only Attention Maps
Paper
• 2407.07071
• Published
• 12
Open-FinLLMs: Open Multimodal Large Language Models for Financial
Applications
Paper
• 2408.11878
• Published
• 64
Leveraging Open Knowledge for Advancing Task Expertise in Large Language
Models
Paper
• 2408.15915
• Published
• 19
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with
100+ NLP Researchers
Paper
• 2409.04109
• Published
• 48
Training Language Models to Self-Correct via Reinforcement Learning
Paper
• 2409.12917
• Published
• 140
Scaling Smart: Accelerating Large Language Model Pre-training with Small
Model Initialization
Paper
• 2409.12903
• Published
• 22
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of
Experts
Paper
• 2409.16040
• Published
• 16
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Paper
• 2409.20566
• Published
• 55
Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
Paper
• 2410.10814
• Published
• 51
"Give Me BF16 or Give Me Death"? Accuracy-Performance Trade-Offs in LLM
Quantization
Paper
• 2411.02355
• Published
• 51
POINTS1.5: Building a Vision-Language Model towards Real World
Applications
Paper
• 2412.08443
• Published
• 38
Euclid: Supercharging Multimodal LLMs with Synthetic High-Fidelity
Visual Descriptions
Paper
• 2412.08737
• Published
• 54
Multimodal Latent Language Modeling with Next-Token Diffusion
Paper
• 2412.08635
• Published
• 49
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
• 2412.10360
• Published
• 147
RetroLLM: Empowering Large Language Models to Retrieve Fine-grained
Evidence within Generation
Paper
• 2412.11919
• Published
• 36
Smaller Language Models Are Better Instruction Evolvers
Paper
• 2412.11231
• Published
• 28
Learned Compression for Compressed Learning
Paper
• 2412.09405
• Published
• 13
Paper
• 2412.13501
• Published
• 29
RobustFT: Robust Supervised Fine-tuning for Large Language Models under
Noisy Response
Paper
• 2412.14922
• Published
• 88
YuLan-Mini: An Open Data-efficient Language Model
Paper
• 2412.17743
• Published
• 66
Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Paper
• 2412.18619
• Published
• 60
Task Preference Optimization: Improving Multimodal Large Language Models
with Vision Task Alignment
Paper
• 2412.19326
• Published
• 18
LUSIFER: Language Universal Space Integration for Enhanced Multilingual
Embeddings with Large Language Models
Paper
• 2501.00874
• Published
• 13
Personalized Graph-Based Retrieval for Large Language Models
Paper
• 2501.02157
• Published
• 31
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language
Models
Paper
• 2501.03262
• Published
• 104
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video
Generation Control
Paper
• 2501.03847
• Published
• 22
LLM4SR: A Survey on Large Language Models for Scientific Research
Paper
• 2501.04306
• Published
• 35
Search-o1: Agentic Search-Enhanced Large Reasoning Models
Paper
• 2501.05366
• Published
• 102
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction
Paper
• 2501.06282
• Published
• 53
Transformer^2: Self-adaptive LLMs
Paper
• 2501.06252
• Published
• 55
ChemAgent: Self-updating Library in Large Language Models Improves
Chemical Reasoning
Paper
• 2501.06590
• Published
• 11
Text Generation
• Updated
• 1.48M
• • 4.03k
Learnings from Scaling Visual Tokenizers for Reconstruction and
Generation
Paper
• 2501.09755
• Published
• 35
RLHS: Mitigating Misalignment in RLHF with Hindsight Simulation
Paper
• 2501.08617
• Published
• 10
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper
• 2501.09686
• Published
• 41
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities
Paper
• 2501.08983
• Published
• 22
Evolving Deeper LLM Thinking
Paper
• 2501.09891
• Published
• 115
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial
Network for High-Fidelity Speech Super-Resolution
Paper
• 2501.10045
• Published
• 10
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D
Assets Generation
Paper
• 2501.12202
• Published
• 49
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video
Understanding
Paper
• 2501.13106
• Published
• 90
Autonomy-of-Experts Models
Paper
• 2501.13074
• Published
• 44
Critique Fine-Tuning: Learning to Critique is More Effective than
Learning to Imitate
Paper
• 2501.17703
• Published
• 59
Optimizing Large Language Model Training Using FP4 Quantization
Paper
• 2501.17116
• Published
• 36
WILDCHAT-50M: A Deep Dive Into the Role of Synthetic Data in
Post-Training
Paper
• 2501.18511
• Published
• 20
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute
in Linear Diffusion Transformer
Paper
• 2501.18427
• Published
• 24
Towards General-Purpose Model-Free Reinforcement Learning
Paper
• 2501.16142
• Published
• 31
Reward-Guided Speculative Decoding for Efficient LLM Reasoning
Paper
• 2501.19324
• Published
• 39
The Curse of Depth in Large Language Models
Paper
• 2502.05795
• Published
• 40
Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time
Scaling
Paper
• 2502.06703
• Published
• 152
ARR: Question Answering with Large Language Models via Analyzing,
Retrieving, and Reasoning
Paper
• 2502.04689
• Published
• 8
Generating Symbolic World Models via Test-time Scaling of Large Language
Models
Paper
• 2502.04728
• Published
• 19
MetaChain: A Fully-Automated and Zero-Code Framework for LLM Agents
Paper
• 2502.05957
• Published
• 15
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth
Approach
Paper
• 2502.05171
• Published
• 152
Scaling Pre-training to One Hundred Billion Data for Vision Language
Models
Paper
• 2502.07617
• Published
• 29
LLMs Can Easily Learn to Reason from Demonstrations Structure, not
content, is what matters!
Paper
• 2502.07374
• Published
• 40
Forget What You Know about LLMs Evaluations - LLMs are Like a Chameleon
Paper
• 2502.07445
• Published
• 11
Next Block Prediction: Video Generation via Semi-Autoregressive Modeling
Paper
• 2502.07737
• Published
• 9
CODESIM: Multi-Agent Code Generation and Problem Solving through
Simulation-Driven Planning and Debugging
Paper
• 2502.05664
• Published
• 24
LLM Pretraining with Continuous Concepts
Paper
• 2502.08524
• Published
• 30
Retrieval-augmented Large Language Models for Financial Time Series
Forecasting
Paper
• 2502.05878
• Published
• 40
Hephaestus: Improving Fundamental Agent Capabilities of Large Language
Models through Continual Pre-Training
Paper
• 2502.06589
• Published
• 21
Training Language Models for Social Deduction with Multi-Agent
Reinforcement Learning
Paper
• 2502.06060
• Published
• 38
SelfCite: Self-Supervised Alignment for Context Attribution in Large
Language Models
Paper
• 2502.09604
• Published
• 37
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM
Multi-Agent Systems
Paper
• 2502.11098
• Published
• 13
Large Language Diffusion Models
Paper
• 2502.09992
• Published
• 126
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising
Trajectory Sharpening
Paper
• 2502.12146
• Published
• 16
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning
in Diffusion Models
Paper
• 2502.10458
• Published
• 38
video-SALMONN-o1: Reasoning-enhanced Audio-visual Large Language Model
Paper
• 2502.11775
• Published
• 9
Intuitive physics understanding emerges from self-supervised pretraining
on natural videos
Paper
• 2502.11831
• Published
• 20
FLAG-Trader: Fusion LLM-Agent with Gradient-based Reinforcement Learning
for Financial Trading
Paper
• 2502.11433
• Published
• 36
Building A Proof-Oriented Programmer That Is 64% Better Than GPT-4o
Under Data Scarsity
Paper
• 2502.11901
• Published
• 6
LongPO: Long Context Self-Evolution of Large Language Models through
Short-to-Long Preference Optimization
Paper
• 2502.13922
• Published
• 27
NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule
Generation
Paper
• 2502.12638
• Published
• 9
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song
Generation
Paper
• 2502.13128
• Published
• 41
Craw4LLM: Efficient Web Crawling for LLM Pretraining
Paper
• 2502.13347
• Published
• 30
Train Small, Infer Large: Memory-Efficient LoRA Training for Large
Language Models
Paper
• 2502.13533
• Published
• 13
Is That Your Final Answer? Test-Time Scaling Improves Selective Question
Answering
Paper
• 2502.13962
• Published
• 28
SearchRAG: Can Search Engines Be Helpful for LLM-based Medical Question
Answering?
Paper
• 2502.13233
• Published
• 15
S^2R: Teaching LLMs to Self-verify and Self-correct via Reinforcement
Learning
Paper
• 2502.12853
• Published
• 29
How Much Knowledge Can You Pack into a LoRA Adapter without Harming LLM?
Paper
• 2502.14502
• Published
• 91
Logic-RL: Unleashing LLM Reasoning with Rule-Based Reinforcement
Learning
Paper
• 2502.14768
• Published
• 47
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers
Paper
• 2502.14377
• Published
• 12
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal
Models via Human Feedback
Paper
• 2502.15027
• Published
• 7
SurveyX: Academic Survey Automation via Large Language Models
Paper
• 2502.14776
• Published
• 100
Make LoRA Great Again: Boosting LoRA with Adaptive Singular Values and
Mixture-of-Experts Optimization Alignment
Paper
• 2502.16894
• Published
• 32
DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks
Paper
• 2502.17157
• Published
• 52
Rank1: Test-Time Compute for Reranking in Information Retrieval
Paper
• 2502.18418
• Published
• 29
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language
Models
Paper
• 2502.16614
• Published
• 27
Phi-4-Mini Technical Report: Compact yet Powerful Multimodal Language
Models via Mixture-of-LoRAs
Paper
• 2503.01743
• Published
• 89
Text Generation
• Updated
• 59.1k
• • 2.89k
Sketch-of-Thought: Efficient LLM Reasoning with Adaptive
Cognitive-Inspired Sketching
Paper
• 2503.05179
• Published
• 46
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with
Reinforcing Learning
Paper
• 2503.05379
• Published
• 38
R1-Searcher: Incentivizing the Search Capability in LLMs via
Reinforcement Learning
Paper
• 2503.05592
• Published
• 27
AnyAnomaly: Zero-Shot Customizable Video Anomaly Detection with LVLM
Paper
• 2503.04504
• Published
• 5
Effective and Efficient Masked Image Generation Models
Paper
• 2503.07197
• Published
• 11
TrajectoryCrafter: Redirecting Camera Trajectory for Monocular Videos
via Diffusion Models
Paper
• 2503.05638
• Published
• 20
Words or Vision: Do Vision-Language Models Have Blind Faith in Text?
Paper
• 2503.02199
• Published
• 8
Self-Taught Self-Correction for Small Language Models
Paper
• 2503.08681
• Published
• 15
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model
for Visual Generation and Editing
Paper
• 2503.10639
• Published
• 53
Transformers without Normalization
Paper
• 2503.10622
• Published
• 170
Autoregressive Image Generation with Randomized Parallel Decoding
Paper
• 2503.10568
• Published
• 9
Silent Branding Attack: Trigger-free Data Poisoning Attack on
Text-to-Image Diffusion Models
Paper
• 2503.09669
• Published
• 35
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large
Language Models
Paper
• 2503.10437
• Published
• 34
Learning from Failures in Multi-Attempt Reinforcement Learning
Paper
• 2503.04808
• Published
• 18
R1-VL: Learning to Reason with Multimodal Large Language Models via
Step-wise Group Relative Policy Optimization
Paper
• 2503.12937
• Published
• 30
API Agents vs. GUI Agents: Divergence and Convergence
Paper
• 2503.11069
• Published
• 36
Being-0: A Humanoid Robotic Agent with Vision-Language Models and
Modular Skills
Paper
• 2503.12533
• Published
• 68
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper
• 2503.14476
• Published
• 144
Personalize Anything for Free with Diffusion Transformer
Paper
• 2503.12590
• Published
• 44
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement
Learning
Paper
• 2503.15265
• Published
• 46
Fin-R1: A Large Language Model for Financial Reasoning through
Reinforcement Learning
Paper
• 2503.16252
• Published
• 30
Stop Overthinking: A Survey on Efficient Reasoning for Large Language
Models
Paper
• 2503.16419
• Published
• 77
Why Do Multi-Agent LLM Systems Fail?
Paper
• 2503.13657
• Published
• 48
Reinforcement Learning for Reasoning in Small LLMs: What Works and What
Doesn't
Paper
• 2503.16219
• Published
• 52
Expert Race: A Flexible Routing Strategy for Scaling Diffusion
Transformer with Mixture of Experts
Paper
• 2503.16057
• Published
• 14
ELTEX: A Framework for Domain-Driven Synthetic Data Generation
Paper
• 2503.15055
• Published
• 6
Plug-and-Play 1.x-Bit KV Cache Quantization for Video Large Language
Models
Paper
• 2503.16257
• Published
• 27
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning
via Iterative Self-Improvement
Paper
• 2503.17352
• Published
• 24
MAPS: A Multi-Agent Framework Based on Big Seven Personality and
Socratic Guidance for Multimodal Scientific Problem Solving
Paper
• 2503.16905
• Published
• 54
Modifying Large Language Model Post-Training for Diverse Creative
Writing
Paper
• 2503.17126
• Published
• 36
I Have Covered All the Bases Here: Interpreting Reasoning Features in
Large Language Models via Sparse Autoencoders
Paper
• 2503.18878
• Published
• 119
Open Deep Search: Democratizing Search with Open-source Reasoning Agents
Paper
• 2503.20201
• Published
• 48
ReSearch: Learning to Reason with Search for LLMs via Reinforcement
Learning
Paper
• 2503.19470
• Published
• 19
UI-R1: Enhancing Action Prediction of GUI Agents by Reinforcement
Learning
Paper
• 2503.21620
• Published
• 62
LeX-Art: Rethinking Text Generation via Scalable High-Quality Data
Synthesis
Paper
• 2503.21749
• Published
• 26
Qwen2.5-Omni Technical Report
Paper
• 2503.20215
• Published
• 170
ORIGEN: Zero-Shot 3D Orientation Grounding in Text-to-Image Generation
Paper
• 2503.22194
• Published
• 25
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large
Language Models
Paper
• 2503.24235
• Published
• 55
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement
Learning on the Base Model
Paper
• 2503.24290
• Published
• 62
Exploring the Effect of Reinforcement Learning on Video Understanding:
Insights from SEED-Bench-R1
Paper
• 2503.24376
• Published
• 38
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal
LLMs on Academic Resources
Paper
• 2504.00595
• Published
• 37
ScholarCopilot: Training Large Language Models for Academic Writing with
Accurate Citations
Paper
• 2504.00824
• Published
• 43
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via
Iterative Instruction Tuning and Reinforcement Learning
Paper
• 2504.02949
• Published
• 21
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated
Agent-Human Interplay
Paper
• 2504.03601
• Published
• 17
Tuning-Free Image Editing with Fidelity and Editability via Unified
Latent Diffusion Model
Paper
• 2504.05594
• Published
• 11
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement
Fine-Tuning
Paper
• 2504.06958
• Published
• 13
V-MAGE: A Game Evaluation Framework for Assessing Visual-Centric
Capabilities in Multimodal Large Language Models
Paper
• 2504.06148
• Published
• 13
DDT: Decoupled Diffusion Transformer
Paper
• 2504.05741
• Published
• 77
A Unified Agentic Framework for Evaluating Conditional Image Generation
Paper
• 2504.07046
• Published
• 30
HiFlow: Training-free High-Resolution Image Generation with Flow-Aligned
Guidance
Paper
• 2504.06232
• Published
• 13
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning
Paper
• 2504.07128
• Published
• 87
InternVL3: Exploring Advanced Training and Test-Time Recipes for
Open-Source Multimodal Models
Paper
• 2504.10479
• Published
• 306
Have we unified image generation and understanding yet? An empirical
study of GPT-4o's image generation ability
Paper
• 2504.08003
• Published
• 49
CoRAG: Collaborative Retrieval-Augmented Generation
Paper
• 2504.01883
• Published
• 9
How new data permeates LLM knowledge and how to dilute it
Paper
• 2504.09522
• Published
• 7
SQL-R1: Training Natural Language to SQL Reasoning Model By
Reinforcement Learning
Paper
• 2504.08600
• Published
• 33
InteractVLM: 3D Interaction Reasoning from 2D Foundational Models
Paper
• 2504.05303
• Published
• 5
ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on
Transformer Encoder Models Performance
Paper
• 2504.08716
• Published
• 9
Genius: A Generalizable and Purely Unsupervised Self-Training Framework
For Advanced Reasoning
Paper
• 2504.08672
• Published
• 55
Efficient Generative Model Training via Embedded Representation Warmup
Paper
• 2504.10188
• Published
• 12
Iterative Self-Training for Code Generation via Reinforced Re-Ranking
Paper
• 2504.09643
• Published
• 34
Vidi: Large Multimodal Models for Video Understanding and Editing
Paper
• 2504.15681
• Published
• 14
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making
Abilities
Paper
• 2504.16078
• Published
• 21
CheXWorld: Exploring Image World Modeling for Radiograph Representation
Learning
Paper
• 2504.13820
• Published
• 16
WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World
Model-based LLM Agents
Paper
• 2504.15785
• Published
• 22
Can Large Language Models Help Multimodal Language Analysis? MMLA: A
Comprehensive Benchmark
Paper
• 2504.16427
• Published
• 18
WebThinker: Empowering Large Reasoning Models with Deep Research
Capability
Paper
• 2504.21776
• Published
• 59
T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level
and Token-level CoT
Paper
• 2505.00703
• Published
• 44
Self-Generated In-Context Examples Improve LLM Agents for Sequential
Decision-Making Tasks
Paper
• 2505.00234
• Published
• 26
Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG
Evaluation Prompts
Paper
• 2504.21117
• Published
• 26
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive
Streaming Speech Synthesis
Paper
• 2505.02625
• Published
• 23
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop
Reasoning with Transformers
Paper
• 2504.20752
• Published
• 94
Unified Multimodal Chain-of-Thought Reward Model through Reinforcement
Fine-Tuning
Paper
• 2505.03318
• Published
• 92
Improving Editability in Image Generation with Layer-wise Memory
Paper
• 2505.01079
• Published
• 29
Think on your Feet: Adaptive Thinking via Reinforcement Learning for
Social Agents
Paper
• 2505.02156
• Published
• 18
An Empirical Study of Qwen3 Quantization
Paper
• 2505.02214
• Published
• 25
Unified Multimodal Understanding and Generation Models: Advances,
Challenges, and Opportunities
Paper
• 2505.02567
• Published
• 80
A Survey on Inference Engines for Large Language Models: Perspectives on
Optimization and Efficiency
Paper
• 2505.01658
• Published
• 39
Knowledge Augmented Complex Problem Solving with Large Language Models:
A Survey
Paper
• 2505.03418
• Published
• 9
Multi-Agent System for Comprehensive Soccer Understanding
Paper
• 2505.03735
• Published
• 25
PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with
Auto-Regressive Transformer
Paper
• 2505.04622
• Published
• 27
Sentient Agent as a Judge: Evaluating Higher-Order Social Cognition in
Large Language Models
Paper
• 2505.02847
• Published
• 29
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM
Reasoners With Verifiers
Paper
• 2505.04842
• Published
• 12
Sailing AI by the Stars: A Survey of Learning from Rewards in
Post-Training and Test-Time Scaling of Large Language Models
Paper
• 2505.02686
• Published
• 16
MiMo: Unlocking the Reasoning Potential of Language Model -- From
Pretraining to Posttraining
Paper
• 2505.07608
• Published
• 82
StreamBridge: Turning Your Offline Video Large Language Model into a
Proactive Streaming Assistant
Paper
• 2505.05467
• Published
• 13
BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture,
Training and Dataset
Paper
• 2505.09568
• Published
• 99
Exploring the Deep Fusion of Large Language Models and Diffusion
Transformers for Text-to-Image Synthesis
Paper
• 2505.10046
• Published
• 9
Beyond 'Aha!': Toward Systematic Meta-Abilities Alignment in Large
Reasoning Models
Paper
• 2505.10554
• Published
• 120
OpenThinkIMG: Learning to Think with Images via Visual Tool
Reinforcement Learning
Paper
• 2505.08617
• Published
• 42
AdaCoT: Pareto-Optimal Adaptive Chain-of-Thought Triggering via
Reinforcement Learning
Paper
• 2505.11896
• Published
• 58
Chain-of-Model Learning for Language Model
Paper
• 2505.11820
• Published
• 121
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Paper
• 2505.14669
• Published
• 78
s3: You Don't Need That Much Data to Train a Search Agent via RL
Paper
• 2505.14146
• Published
• 19
Synthetic Data RL: Task Definition Is All You Need
Paper
• 2505.17063
• Published
• 11
ComposeAnything: Composite Object Priors for Text-to-Image Generation
Paper
• 2505.24086
• Published
• 5
Reinforcement Pre-Training
Paper
• 2506.08007
• Published
• 263
Multiverse: Your Language Models Secretly Decide How to Parallelize and
Merge Generation
Paper
• 2506.09991
• Published
• 55
DeepResearch Bench: A Comprehensive Benchmark for Deep Research Agents
Paper
• 2506.11763
• Published
• 74
From Bytes to Ideas: Language Modeling with Autoregressive U-Nets
Paper
• 2506.14761
• Published
• 17
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought
Reasoning in LLMs
Paper
• 2506.18896
• Published
• 29
SlimMoE: Structured Compression of Large MoE Models via Expert Slimming
and Distillation
Paper
• 2506.18349
• Published
• 13
Unified Vision-Language-Action Model
Paper
• 2506.19850
• Published
• 27
Teaching a Language Model to Speak the Language of Tools
Paper
• 2506.23394
• Published
• 3
Coding Triangle: How Does Large Language Model Understand Code?
Paper
• 2507.06138
• Published
• 22
One Token to Fool LLM-as-a-Judge
Paper
• 2507.08794
• Published
• 32
UloRL:An Ultra-Long Output Reinforcement Learning Approach for Advancing
Large Language Models' Reasoning Abilities
Paper
• 2507.19766
• Published
• 15
NeRF Is a Valuable Assistant for 3D Gaussian Splatting
Paper
• 2507.23374
• Published
• 12
Agent Lightning: Train ANY AI Agents with Reinforcement Learning
Paper
• 2508.03680
• Published
• 136
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
Paper
• 2508.08221
• Published
• 50
MeshLLM: Empowering Large Language Models to Progressively Understand
and Generate 3D Mesh
Paper
• 2508.01242
• Published
• 11
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with
Long-Term Memory
Paper
• 2508.09736
• Published
• 58
BeyondWeb: Lessons from Scaling Synthetic Data for Trillion-scale
Pretraining
Paper
• 2508.10975
• Published
• 60
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent
Distillation and Agentic RL
Paper
• 2508.13167
• Published
• 129
nvidia/Nemotron-Pretraining-Code-v1
Viewer
• Updated
• 936M • 554
• 60
Speed Always Wins: A Survey on Efficient Architectures for Large
Language Models
Paper
• 2508.09834
• Published
• 53
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid
Mamba-Transformer Reasoning Model
Paper
• 2508.14444
• Published
• 43
Breaking the Exploration Bottleneck: Rubric-Scaffolded Reinforcement
Learning for General LLM Reasoning
Paper
• 2508.16949
• Published
• 24
AgentFly: Fine-tuning LLM Agents without Fine-tuning LLMs
Paper
• 2508.16153
• Published
• 160
Autoregressive Universal Video Segmentation Model
Paper
• 2508.19242
• Published
• 29
Beyond Memorization: Extending Reasoning Depth with Recurrence, Memory
and Test-Time Compute Scaling
Paper
• 2508.16745
• Published
• 29
Think in Games: Learning to Reason in Games via Reinforcement Learning
with Large Language Models
Paper
• 2508.21365
• Published
• 29
A Survey of Scientific Large Language Models: From Data Foundations to
Agent Frontiers
Paper
• 2508.21148
• Published
• 140
UItron: Foundational GUI Agent with Advanced Perception and Planning
Paper
• 2508.21767
• Published
• 12
jupyter-agent/jupyter-agent-dataset
Viewer
• Updated
• 95.8k • 10.7k
• 156
UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn
Reinforcement Learning
Paper
• 2509.02544
• Published
• 125
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey
Paper
• 2509.02547
• Published
• 232
Universal Deep Research: Bring Your Own Model and Strategy
Paper
• 2509.00244
• Published
• 14
Robix: A Unified Model for Robot Interaction, Reasoning and Planning
Paper
• 2509.01106
• Published
• 52
Open Data Synthesis For Deep Research
Paper
• 2509.00375
• Published
• 72
Revolutionizing Reinforcement Learning Framework for Diffusion Large
Language Models
Paper
• 2509.06949
• Published
• 56
Reinforcement Learning Foundations for Deep Research Systems: A Survey
Paper
• 2509.06733
• Published
• 32
World Simulation with Video Foundation Models for Physical AI
Paper
• 2511.00062
• Published
• 44
The Art of Scaling Reinforcement Learning Compute for LLMs
Paper
• 2510.13786
• Published
• 32
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models
Paper
• 2512.02556
• Published
• 258
OneThinker: All-in-one Reasoning Model for Image and Video
Paper
• 2512.03043
• Published
• 33
Reinforcement Learning for Self-Improving Agent with Skill Library
Paper
• 2512.17102
• Published
• 36
Next-Embedding Prediction Makes Strong Vision Learners
Paper
• 2512.16922
• Published
• 87