From Masks to Worlds: A Hitchhiker's Guide to World Models Paper • 2510.20668 • Published 28 days ago • 6
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent Paper • 2506.17612 • Published Jun 21 • 64
Lumina-DiMOO: An Omni Diffusion Large Language Model for Multi-Modal Generation and Understanding Paper • 2510.06308 • Published Oct 7 • 53
HiFiHR: Enhancing 3D Hand Reconstruction from a Single Image via High-Fidelity Texture Paper • 2308.13628 • Published Aug 25, 2023
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model Paper • 2505.23606 • Published May 29 • 14
Personalized Safety Alignment for Text-to-Image Diffusion Models Paper • 2508.01151 • Published Aug 2 • 8
R-KV: Redundancy-aware KV Cache Compression for Reasoning Models Paper • 2505.24133 • Published May 30 • 1
Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation Paper • 2312.16610 • Published Dec 27, 2023
DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering Paper • 2507.11527 • Published Jul 15 • 32
Mixed-R1: Unified Reward Perspective For Reasoning Capability in Multimodal Large Language Models Paper • 2505.24164 • Published May 30
UltraVideo: High-Quality UHD Video Dataset with Comprehensive Captions Paper • 2506.13691 • Published Jun 16 • 2
Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology Paper • 2507.07999 • Published Jul 10 • 49