Qwen3-VL-Embedding and Qwen3-VL-Reranker: A Unified Framework for State-of-the-Art Multimodal Retrieval and Ranking Paper • 2601.04720 • Published 12 days ago • 45
YOLO-Master: MOE-Accelerated with Specialized Transformers for Enhanced Real-time Detection Paper • 2512.23273 • Published 22 days ago • 13
TurboDiffusion: Accelerating Video Diffusion Models by 100-200 Times Paper • 2512.16093 • Published Dec 18, 2025 • 93
DataFlow: An LLM-Driven Framework for Unified Data Preparation and Workflow Automation in the Era of Data-Centric AI Paper • 2512.16676 • Published Dec 18, 2025 • 208
Few-Step Distillation for Text-to-Image Generation: A Practical Guide Paper • 2512.13006 • Published Dec 15, 2025 • 7
RF-DETR: Neural Architecture Search for Real-Time Detection Transformers Paper • 2511.09554 • Published Nov 12, 2025 • 7
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published Dec 18, 2025 • 83
WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling Paper • 2512.14614 • Published Dec 16, 2025 • 69
In Pursuit of Pixel Supervision for Visual Pre-training Paper • 2512.15715 • Published Dec 17, 2025 • 10
DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models Paper • 2512.15713 • Published Dec 17, 2025 • 16
EgoX: Egocentric Video Generation from a Single Exocentric Video Paper • 2512.08269 • Published Dec 9, 2025 • 118
E-RayZer: Self-supervised 3D Reconstruction as Spatial Visual Pre-training Paper • 2512.10950 • Published Dec 11, 2025 • 1
Fast-FoundationStereo: Real-Time Zero-Shot Stereo Matching Paper • 2512.11130 • Published Dec 11, 2025 • 5
MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing Paper • 2509.22186 • Published Sep 26, 2025 • 139