Models
Datasets
Spaces
Buckets new
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2603.21986

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published Mar 23 • 125

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

Paper • 2602.20161 • Published Feb 23 • 23
A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 522
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published Mar 23 • 125
AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Paper • 2604.04184 • Published about 1 month ago • 50

lixinyizju/moda

Updated Aug 8, 2025 • 13
Running on Zero

Agents

Featured

489

Qwen Image Edit Fast!

✒

489

Fast 8 step inference of Qwen Image Edit
Runtime error

Agents

Featured

905

Qwen Image

🖼

905

Generate detailed images from your text prompts
Running on Zero

MCP

Featured

1.57k

FLUX.1 Kontext

⚡

1.57k

Kontext image editing on FLUX[dev]

yandex/stable-diffusion-3.5-medium-alchemist

Text-to-Image • Updated May 16, 2025 • 12 • 7
Ovis-U1 Technical Report

Paper • 2506.23044 • Published Jun 29, 2025 • 61
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

Paper • 2507.01953 • Published Jul 2, 2025 • 18
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Paper • 2507.01945 • Published Jul 2, 2025 • 76

MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

Paper • 2603.25319 • Published Mar 26 • 32
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 131
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Paper • 2603.22458 • Published Mar 23 • 135
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published Mar 23 • 125

audio-video model

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published Mar 23 • 125

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 550
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 324
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

Paper • 2601.00393 • Published Jan 1 • 133
LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 177

Vision Language Action models

about 5 hours ago

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2, 2025 • 39
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22, 2025 • 35
MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11, 2025 • 45
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published Aug 27, 2025 • 32

Video Generation

Video Generation

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Paper • 2412.11100 • Published Dec 15, 2024 • 7
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

Paper • 2412.09856 • Published Dec 13, 2024 • 11
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Paper • 2412.09349 • Published Dec 12, 2024 • 8
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation

Paper • 2412.04448 • Published Dec 5, 2024 • 10

WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens

Paper • 2401.09985 • Published Jan 18, 2024 • 18
CustomVideo: Customizing Text-to-Video Generation with Multiple Subjects

Paper • 2401.09962 • Published Jan 18, 2024 • 9
Inflation with Diffusion: Efficient Temporal Adaptation for Text-to-Video Super-Resolution

Paper • 2401.10404 • Published Jan 18, 2024 • 10
ActAnywhere: Subject-Aware Video Background Generation

Paper • 2401.10822 • Published Jan 19, 2024 • 13

MACRO: Advancing Multi-Reference Image Generation with Structured Long-Context Data

Paper • 2603.25319 • Published Mar 26 • 32
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale

Paper • 2603.25040 • Published Mar 26 • 131
MinerU-Diffusion: Rethinking Document OCR as Inverse Rendering via Diffusion Decoding

Paper • 2603.22458 • Published Mar 23 • 135
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published Mar 23 • 125

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published Mar 23 • 125

audio-video model

Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published Mar 23 • 125

Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

Paper • 2602.20161 • Published Feb 23 • 23
A Very Big Video Reasoning Suite

Paper • 2602.20159 • Published Feb 23 • 522
Speed by Simplicity: A Single-Stream Architecture for Fast Audio-Video Generative Foundation Model

Paper • 2603.21986 • Published Mar 23 • 125
AURA: Always-On Understanding and Real-Time Assistance via Video Streams

Paper • 2604.04184 • Published about 1 month ago • 50

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 550
mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 324
NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos

Paper • 2601.00393 • Published Jan 1 • 133
LTX-2: Efficient Joint Audio-Visual Foundation Model

Paper • 2601.03233 • Published Jan 6 • 177

lixinyizju/moda

Updated Aug 8, 2025 • 13
Running on Zero

Agents

Featured

489

Qwen Image Edit Fast!

✒

489

Fast 8 step inference of Qwen Image Edit
Runtime error

Agents

Featured

905

Qwen Image

🖼

905

Generate detailed images from your text prompts
Running on Zero

MCP

Featured

1.57k

FLUX.1 Kontext

⚡

1.57k

Kontext image editing on FLUX[dev]

Vision Language Action models

about 5 hours ago

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

Paper • 2507.01925 • Published Jul 2, 2025 • 39
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning

Paper • 2507.16746 • Published Jul 22, 2025 • 35
MolmoAct: Action Reasoning Models that can Reason in Space

Paper • 2508.07917 • Published Aug 11, 2025 • 45
Discrete Diffusion VLA: Bringing Discrete Diffusion to Action Decoding in Vision-Language-Action Policies

Paper • 2508.20072 • Published Aug 27, 2025 • 32

yandex/stable-diffusion-3.5-medium-alchemist

Text-to-Image • Updated May 16, 2025 • 12 • 7
Ovis-U1 Technical Report

Paper • 2506.23044 • Published Jun 29, 2025 • 61
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model

Paper • 2507.01953 • Published Jul 2, 2025 • 18
LongAnimation: Long Animation Generation with Dynamic Global-Local Memory

Paper • 2507.01945 • Published Jul 2, 2025 • 76

Video Generation

Video Generation

DynamicScaler: Seamless and Scalable Video Generation for Panoramic Scenes

Paper • 2412.11100 • Published Dec 15, 2024 • 7
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity

Paper • 2412.09856 • Published Dec 13, 2024 • 11
DisPose: Disentangling Pose Guidance for Controllable Human Image Animation

Paper • 2412.09349 • Published Dec 12, 2024 • 8
MEMO: Memory-Guided Diffusion for Expressive Talking Video Generation

Paper • 2412.04448 • Published Dec 5, 2024 • 10

Previous
1
2
Next

Company

TOS Privacy About Careers

Website

Models Datasets Spaces Pricing Docs