PhysBrain: Human Egocentric Data as a Bridge from Vision Language Models to Physical Intelligence Paper • 2512.16793 • Published 15 days ago • 72
Depth Any Panoramas: A Foundation Model for Panoramic Depth Estimation Paper • 2512.16913 • Published 14 days ago • 33
JarvisEvo: Towards a Self-Evolving Photo Editing Agent with Synergistic Editor-Evaluator Optimization Paper • 2511.23002 • Published Nov 28, 2025 • 26
DynamicVerse: A Physically-Aware Multimodal Framework for 4D World Modeling Paper • 2512.03000 • Published about 1 month ago • 36
OneThinker: All-in-one Reasoning Model for Image and Video Paper • 2512.03043 • Published about 1 month ago • 32
SPARK: Synergistic Policy And Reward Co-Evolving Framework Paper • 2509.22624 • Published Sep 26, 2025 • 17
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second Paper • 2507.10065 • Published Jul 14, 2025 • 24
Thinking with Images for Multimodal Reasoning: Foundations, Methods, and Future Frontiers Paper • 2506.23918 • Published Jun 30, 2025 • 89
JarvisArt: Liberating Human Artistic Creativity via an Intelligent Photo Retouching Agent Paper • 2506.17612 • Published Jun 21, 2025 • 64
PartCrafter: Structured 3D Mesh Generation via Compositional Latent Diffusion Transformers Paper • 2506.05573 • Published Jun 5, 2025 • 82
PosterCraft: Rethinking High-Quality Aesthetic Poster Generation in a Unified Framework Paper • 2506.10741 • Published Jun 12, 2025 • 27
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model Paper • 2505.23606 • Published May 29, 2025 • 14
JarvisIR: Elevating Autonomous Driving Perception with Intelligent Image Restoration Paper • 2504.04158 • Published Apr 5, 2025 • 2
MagicInfinite: Generating Infinite Talking Videos with Your Words and Voice Paper • 2503.05978 • Published Mar 7, 2025 • 36