-
MiniMaxAI/VTP-Small-f16d64
Image Feature Extraction • 0.2B • Updated • 16.3k • 10 -
MiniMaxAI/VTP-Base-f16d64
Image Feature Extraction • 0.3B • Updated • 15.5k • 17 -
MiniMaxAI/VTP-Large-f16d64
Image Feature Extraction • 0.7B • Updated • 16k • 12 -
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 96
Collections
Discover the best community collections!
Collections including paper arxiv:2512.13687
-
Continuous Autoregressive Language Models
Paper • 2510.27688 • Published • 70 -
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
Paper • 2505.13181 • Published • 9 -
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper • 2503.19325 • Published • 73 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 447 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image • Updated • 4 • 6 -
Ovis-U1 Technical Report
Paper • 2506.23044 • Published • 61 -
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Paper • 2507.01953 • Published • 18 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 76
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 107 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 151 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33
-
MiniMaxAI/VTP-Small-f16d64
Image Feature Extraction • 0.2B • Updated • 16.3k • 10 -
MiniMaxAI/VTP-Base-f16d64
Image Feature Extraction • 0.3B • Updated • 15.5k • 17 -
MiniMaxAI/VTP-Large-f16d64
Image Feature Extraction • 0.7B • Updated • 16k • 12 -
Towards Scalable Pre-training of Visual Tokenizers for Generation
Paper • 2512.13687 • Published • 96
-
yandex/stable-diffusion-3.5-medium-alchemist
Text-to-Image • Updated • 4 • 6 -
Ovis-U1 Technical Report
Paper • 2506.23044 • Published • 61 -
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model
Paper • 2507.01953 • Published • 18 -
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory
Paper • 2507.01945 • Published • 76
-
Continuous Autoregressive Language Models
Paper • 2510.27688 • Published • 70 -
Efficient Speech Language Modeling via Energy Distance in Continuous Latent Space
Paper • 2505.13181 • Published • 9 -
Long-Context Autoregressive Video Modeling with Next-Frame Prediction
Paper • 2503.19325 • Published • 73 -
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation
Paper • 2503.16430 • Published • 34
-
Test-Time Scaling with Reflective Generative Model
Paper • 2507.01951 • Published • 107 -
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach
Paper • 2502.05171 • Published • 151 -
Autoregressive Diffusion Models
Paper • 2110.02037 • Published -
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Paper • 2502.09509 • Published • 8
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 447 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 29 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 15 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 50 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 33