Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2512.24880

Deepseek Papers

Deepseek papers collection

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Paper • 2310.16818 • Published Oct 25, 2023 • 33
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 53
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 59
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 70

The Trinity of Consistency as a Defining Principle for General World Models

Paper • 2602.23152 • Published 5 days ago • 191
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Paper • 2602.22859 • Published 5 days ago • 148
OmniGAIA: Towards Native Omni-Modal AI Agents

Paper • 2602.22897 • Published 5 days ago • 51
Imagination Helps Visual Reasoning, But Not Yet in Latent Space

Paper • 2602.22766 • Published 5 days ago • 38

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Paper • 2602.12036 • Published 19 days ago • 98
Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published Dec 18, 2025 • 36
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Paper • 2512.23705 • Published Dec 29, 2025 • 45
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

Paper • 2512.19995 • Published Dec 23, 2025 • 16

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 313

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 313
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Paper • 2512.23988 • Published Dec 30, 2025 • 18
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

Paper • 2512.25075 • Published Dec 31, 2025 • 15
Guiding a Diffusion Transformer with the Internal Dynamics of Itself

Paper • 2512.24176 • Published Dec 30, 2025 • 8

about 1 hour ago

Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published Dec 15, 2025 • 106
MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Paper • 2512.23447 • Published Dec 29, 2025 • 98
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Paper • 2512.23576 • Published Dec 29, 2025 • 65

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published 26 days ago • 342
PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 206
FASA: Frequency-aware Sparse Attention

Paper • 2602.03152 • Published 28 days ago • 150
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 313

Papers reimplemented

List of research papers, architectures, and techniques I re implemented in LLM-quest or Hugging Face's TRL. Missing papers: Qwen3-Next, GPT-2

Reinforced Attention Learning

Paper • 2602.04884 • Published 27 days ago • 28
Learning to Reason in 13 Parameters

Paper • 2602.04118 • Published 27 days ago • 6
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published May 27, 2024 • 3
mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations

Paper • 2601.05732 • Published Jan 9 • 1

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 155
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 136
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 313
BitNet Distillation

Paper • 2510.13998 • Published Oct 15, 2025 • 59

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 313

Deepseek Papers

Deepseek papers collection

DreamCraft3D: Hierarchical 3D Generation with Bootstrapped Diffusion Prior

Paper • 2310.16818 • Published Oct 25, 2023 • 33
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism

Paper • 2401.02954 • Published Jan 5, 2024 • 53
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

Paper • 2401.06066 • Published Jan 11, 2024 • 59
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence

Paper • 2401.14196 • Published Jan 25, 2024 • 70

about 1 hour ago

Towards Scalable Pre-training of Visual Tokenizers for Generation

Paper • 2512.13687 • Published Dec 15, 2025 • 106
MMGR: Multi-Modal Generative Reasoning

Paper • 2512.14691 • Published Dec 16, 2025 • 119
Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Paper • 2512.23447 • Published Dec 29, 2025 • 98
LiveTalk: Real-Time Multimodal Interactive Video Diffusion via Improved On-Policy Distillation

Paper • 2512.23576 • Published Dec 29, 2025 • 65

The Trinity of Consistency as a Defining Principle for General World Models

Paper • 2602.23152 • Published 5 days ago • 191
From Blind Spots to Gains: Diagnostic-Driven Iterative Training for Large Multimodal Models

Paper • 2602.22859 • Published 5 days ago • 148
OmniGAIA: Towards Native Omni-Modal AI Agents

Paper • 2602.22897 • Published 5 days ago • 51
Imagination Helps Visual Reasoning, But Not Yet in Latent Space

Paper • 2602.22766 • Published 5 days ago • 38

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published 26 days ago • 342
PaperBanana: Automating Academic Illustration for AI Scientists

Paper • 2601.23265 • Published Jan 30 • 206
FASA: Frequency-aware Sparse Attention

Paper • 2602.03152 • Published 28 days ago • 150
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 313

Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models

Paper • 2602.12036 • Published 19 days ago • 98
Reinforcement Learning for Self-Improving Agent with Skill Library

Paper • 2512.17102 • Published Dec 18, 2025 • 36
Diffusion Knows Transparency: Repurposing Video Diffusion for Transparent Object Depth and Normal Estimation

Paper • 2512.23705 • Published Dec 29, 2025 • 45
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models

Paper • 2512.19995 • Published Dec 23, 2025 • 16

Papers reimplemented

List of research papers, architectures, and techniques I re implemented in LLM-quest or Hugging Face's TRL. Missing papers: Qwen3-Next, GPT-2

Reinforced Attention Learning

Paper • 2602.04884 • Published 27 days ago • 28
Learning to Reason in 13 Parameters

Paper • 2602.04118 • Published 27 days ago • 6
LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Paper • 2405.17604 • Published May 27, 2024 • 3
mHC-lite: You Don't Need 20 Sinkhorn-Knopp Iterations

Paper • 2601.05732 • Published Jan 9 • 1

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 313

Your Group-Relative Advantage Is Biased

Paper • 2601.08521 • Published Jan 13 • 155
Agent Lightning: Train ANY AI Agents with Reinforcement Learning

Paper • 2508.03680 • Published Aug 5, 2025 • 136
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 313
BitNet Distillation

Paper • 2510.13998 • Published Oct 15, 2025 • 59

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 313
Fantastic Reasoning Behaviors and Where to Find Them: Unsupervised Discovery of the Reasoning Process

Paper • 2512.23988 • Published Dec 30, 2025 • 18
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time

Paper • 2512.25075 • Published Dec 31, 2025 • 15
Guiding a Diffusion Transformer with the Internal Dynamics of Itself

Paper • 2512.24176 • Published Dec 30, 2025 • 8

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 313

Previous
1
2
3
...
6
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs