Transition Matching Distillation for Fast Video Generation Paper • 2601.09881 • Published 3 days ago • 17
Think-Then-Generate: Reasoning-Aware Text-to-Image Diffusion with LLM Encoders Paper • 2601.10332 • Published 3 days ago • 22
view changelog Changelog Team & Enterprise Articles Now Featured on the Hugging Face Blog Dec 8, 2025 • 89
UM-Text: A Unified Multimodal Model for Image Understanding Paper • 2601.08321 • Published 5 days ago • 6
Efficient Camera-Controlled Video Generation of Static Scenes via Sparse Diffusion and 3D Rendering Paper • 2601.09697 • Published 4 days ago • 6
view article Article How We Built a Semantic Highlight Model To Save Token Cost for RAG 3 days ago • 43
Aligning Text, Code, and Vision: A Multi-Objective Reinforcement Learning Framework for Text-to-Visualization Paper • 2601.04582 • Published 10 days ago • 8
User-Oriented Multi-Turn Dialogue Generation with Tool Use at scale Paper • 2601.08225 • Published 5 days ago • 46
SnapGen++: Unleashing Diffusion Transformers for Efficient High-Fidelity Image Generation on Edge Devices Paper • 2601.08303 • Published 5 days ago • 13
MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head Paper • 2601.07832 • Published 6 days ago • 43
Can We Predict Before Executing Machine Learning Agents? Paper • 2601.05930 • Published 9 days ago • 25
Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking Paper • 2601.04720 • Published 10 days ago • 44
MMFormalizer: Multimodal Autoformalization in the Wild Paper • 2601.03017 • Published 12 days ago • 102