The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment Paper • 2511.20614 • Published 11 days ago • 37
PICABench: How Far Are We from Physically Realistic Image Editing? Paper • 2510.17681 • Published Oct 20 • 62
VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos Paper • 2505.23693 • Published May 29 • 54
The Stochastic Parrot on LLM's Shoulder: A Summative Assessment of Physical Concept Understanding Paper • 2502.08946 • Published Feb 13 • 191
VEnhancer: Generative Space-Time Enhancement for Video Generation Paper • 2407.07667 • Published Jul 10, 2024 • 16
Still-Moving: Customized Video Generation without Customized Video Data Paper • 2407.08674 • Published Jul 11, 2024 • 13
CosmoCLIP: Generalizing Large Vision-Language Models for Astronomical Imaging Paper • 2407.07315 • Published Jul 10, 2024 • 7
An accurate detection is not all you need to combat label noise in web-noisy datasets Paper • 2407.05528 • Published Jul 8, 2024 • 4
This&That: Language-Gesture Controlled Video Generation for Robot Planning Paper • 2407.05530 • Published Jul 8, 2024 • 4
CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation Paper • 2407.06188 • Published Jul 8, 2024 • 3
Scaling Up Personalized Aesthetic Assessment via Task Vector Customization Paper • 2407.07176 • Published Jul 9, 2024 • 6
Towards Building Specialized Generalist AI with System 1 and System 2 Fusion Paper • 2407.08642 • Published Jul 11, 2024 • 11
Autoregressive Speech Synthesis without Vector Quantization Paper • 2407.08551 • Published Jul 11, 2024 • 17
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception Paper • 2407.08303 • Published Jul 11, 2024 • 19
Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models Paper • 2407.08701 • Published Jul 11, 2024 • 13