Files changed (1) hide show
  1. README.md +107 -89
README.md CHANGED
@@ -1,116 +1,134 @@
1
  ---
2
  emoji: πŸŽ₯
3
- title: 'Self Forcing Wan 2.1 '
4
  short_description: Real-time video generation
5
  sdk: gradio
6
  sdk_version: 5.34.2
 
 
 
 
 
 
7
  ---
8
- <p align="center">
9
- <h1 align="center">Self Forcing</h1>
10
- <h3 align="center">Bridging the Train-Test Gap in Autoregressive Video Diffusion</h3>
11
- </p>
12
- <p align="center">
13
- <p align="center">
14
- <a href="https://www.xunhuang.me/">Xun Huang</a><sup>1</sup>
15
- Β·
16
- <a href="https://zhengqili.github.io/">Zhengqi Li</a><sup>1</sup>
17
- Β·
18
- <a href="https://guandehe.github.io/">Guande He</a><sup>2</sup>
19
- Β·
20
- <a href="https://mingyuanzhou.github.io/">Mingyuan Zhou</a><sup>2</sup>
21
- Β·
22
- <a href="https://research.adobe.com/person/eli-shechtman/">Eli Shechtman</a><sup>1</sup><br>
23
- <sup>1</sup>Adobe Research <sup>2</sup>UT Austin
24
- </p>
25
- <h3 align="center"><a href="https://arxiv.org/abs/2506.08009">Paper</a> | <a href="https://self-forcing.github.io">Website</a> | <a href="https://huggingface.co/gdhe17/Self-Forcing/tree/main">Models (HuggingFace)</a></h3>
26
- </p>
27
 
28
  ---
29
 
30
- Self Forcing trains autoregressive video diffusion models by **simulating the inference process during training**, performing autoregressive rollout with KV caching. It resolves the train-test distribution mismatch and enables **real-time, streaming video generation on a single RTX 4090** while matching the quality of state-of-the-art diffusion models.
 
 
31
 
32
  ---
33
 
 
34
 
35
- https://github.com/user-attachments/assets/7548c2db-fe03-4ba8-8dd3-52d2c6160739
 
 
 
 
 
36
 
 
37
 
38
- ## Requirements
39
- We tested this repo on the following setup:
40
- * Nvidia GPU with at least 24 GB memory (RTX 4090, A100, and H100 are tested).
41
- * Linux operating system.
42
- * 64 GB RAM.
43
 
44
- Other hardware setup could also work but hasn't been tested.
 
 
 
 
 
45
 
46
- ## Installation
47
- Create a conda environment and install dependencies:
48
- ```
49
- conda create -n self_forcing python=3.10 -y
50
- conda activate self_forcing
51
- pip install -r requirements.txt
52
- pip install flash-attn --no-build-isolation
53
- python setup.py develop
54
- ```
55
 
56
- ## Quick Start
57
- ### Download checkpoints
58
- ```
59
- huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir-use-symlinks False --local-dir wan_models/Wan2.1-T2V-1.3B
60
- huggingface-cli download gdhe17/Self-Forcing checkpoints/self_forcing_dmd.pt --local-dir .
61
- ```
62
 
63
- ### GUI demo
64
- ```
65
- python demo.py
66
- ```
67
- Note:
68
- * **Our model works better with long, detailed prompts** since it's trained with such prompts. We will integrate prompt extension into the codebase (similar to [Wan2.1](https://github.com/Wan-Video/Wan2.1/tree/main?tab=readme-ov-file#2-using-prompt-extention)) in the future. For now, it is recommended to use third-party LLMs (such as GPT-4o) to extend your prompt before providing to the model.
69
- * You may want to adjust FPS so it plays smoothly on your device.
70
- * The speed can be improved by enabling `torch.compile`, [TAEHV-VAE](https://github.com/madebyollin/taehv/), or using FP8 Linear layers, although the latter two options may sacrifice quality. It is recommended to use `torch.compile` if possible and enable TAEHV-VAE if further speedup is needed.
71
 
72
- ### CLI Inference
73
- Example inference script using the chunk-wise autoregressive checkpoint trained with DMD:
74
- ```
75
- python inference.py \
76
- --config_path configs/self_forcing_dmd.yaml \
77
- --output_folder videos/self_forcing_dmd \
78
- --checkpoint_path checkpoints/self_forcing_dmd.pt \
79
- --data_path prompts/MovieGenVideoBench_extended.txt \
80
- --use_ema
81
  ```
82
- Other config files and corresponding checkpoints can be found in [configs](configs) folder and our [huggingface repo](https://huggingface.co/gdhe17/Self-Forcing/tree/main/checkpoints).
83
 
84
- ## Training
85
- ### Download text prompts and ODE initialized checkpoint
86
- ```
87
- huggingface-cli download gdhe17/Self-Forcing checkpoints/ode_init.pt --local-dir .
88
- huggingface-cli download gdhe17/Self-Forcing vidprom_filtered_extended.txt --local-dir prompts
89
- ```
90
- Note: Our training algorithm (except for the GAN version) is data-free (**no video data is needed**). For now, we directly provide the ODE initialization checkpoint and will add more instructions on how to perform ODE initialization in the future (which is identical to the process described in the [CausVid](https://github.com/tianweiy/CausVid) repo).
91
 
92
- ### Self Forcing Training with DMD
93
- ```
94
- torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
95
- --rdzv_backend=c10d \
96
- --rdzv_endpoint $MASTER_ADDR \
97
- train.py \
98
- --config_path configs/self_forcing_dmd.yaml \
99
- --logdir logs/self_forcing_dmd \
100
- --disable-wandb
101
  ```
102
- Our training run uses 600 iterations and completes in under 2 hours using 64 H100 GPUs. By implementing gradient accumulation, it should be possible to reproduce the results in less than 16 hours using 8 H100 GPUs.
103
 
104
- ## Acknowledgements
105
- This codebase is built on top of the open-source implementation of [CausVid](https://github.com/tianweiy/CausVid) by [Tianwei Yin](https://tianweiy.github.io/) and the [Wan2.1](https://github.com/Wan-Video/Wan2.1) repo.
 
 
 
106
 
107
- ## Citation
108
- If you find this codebase useful for your research, please kindly cite our paper:
 
 
 
 
 
 
 
 
 
109
  ```
110
- @article{huang2025selfforcing,
111
- title={Self Forcing: Bridging the Train-Test Gap in Autoregressive Video Diffusion},
112
- author={Huang, Xun and Li, Zhengqi and He, Guande and Zhou, Mingyuan and Shechtman, Eli},
113
- journal={arXiv preprint arXiv:2506.08009},
114
- year={2025}
115
- }
116
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  emoji: πŸŽ₯
3
+ title: 'RunAsh Wan 2.1 '
4
  short_description: Real-time video generation
5
  sdk: gradio
6
  sdk_version: 5.34.2
7
+ license: apache-2.0
8
+ colorFrom: yellow
9
+ colorTo: red
10
+ pinned: true
11
+ thumbnail: >-
12
+ https://cdn-uploads.huggingface.co/production/uploads/6799f4b5a2b48413dd18a8dd/nxPqZaXa6quMBU4ojqDzC.png
13
  ---
14
+ # 🎬 RunAsh Real Time Video Generation
15
+
16
+ > *A real-time video generation model based on Wan 2.1 β€” optimized for low-latency inference and interactive applications.*
17
+
18
+ ![Demo GIF or Screenshot Placeholder](https://via.placeholder.com/800x400?text=Real-Time+Video+Generation+Demo)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
  ---
21
 
22
+ ## 🧠 Model Overview
23
+
24
+ **RunAsh Real Time Video Generation** is a fine-tuned and optimized version of **Wan 2.1**, designed for **real-time video synthesis** from text prompts or image inputs. It leverages efficient architecture modifications and inference optimizations to enable smooth, interactive video generation at the edge or in-browser environments.
25
 
26
  ---
27
 
28
+ ## πŸš€ Features
29
 
30
+ - βœ… **Real-time generation** (target: <500ms per frame on GPU)
31
+ - βœ… **Text-to-Video** and **Image-to-Video** modes
32
+ - βœ… Low VRAM usage optimizations
33
+ - βœ… Interactive Gradio UI for live demos
34
+ - βœ… Supports variable length outputs (2s–8s clips)
35
+ - βœ… Plug-and-play with Hugging Face Spaces
36
 
37
+ ---
38
 
39
+ ## πŸ› οΈ Technical Details
 
 
 
 
40
 
41
+ - **Base Model**: `Wan-2.1-RealTime` (by original authors)
42
+ - **Architecture**: Diffusion Transformer + Latent Consistency Modules
43
+ - **Resolution**: 576x320 (16:9) or 320x576 (9:16) β€” configurable
44
+ - **Frame Rate**: 24 FPS (adjustable)
45
+ - **Latency**: ~300–600ms per generation step (RTX 3090 / A10G)
46
+ - **Max Duration**: 8 seconds (configurable in code)
47
 
48
+ ---
 
 
 
 
 
 
 
 
49
 
50
+ ## πŸ–ΌοΈ Example Usage
 
 
 
 
 
51
 
52
+ ### Text-to-Video
 
 
 
 
 
 
 
53
 
54
+ ```python
55
+ prompt = "A cyberpunk cat riding a neon scooter through Tokyo at night"
56
+ video = pipeline(prompt, num_frames=48, guidance_scale=7.5)
 
 
 
 
 
 
57
  ```
 
58
 
59
+ ### Image-to-Video
 
 
 
 
 
 
60
 
61
+ ```python
62
+ init_image = load_image("cat.png")
63
+ video = pipeline(init_image, motion_prompt="zoom in slowly", num_frames=24)
 
 
 
 
 
 
64
  ```
 
65
 
66
+ ---
67
+
68
+ ## πŸ§ͺ Try It Out
69
+
70
+ πŸ‘‰ **Live Demo**: [https://huggingface.co/spaces/rammurmu/runash-realtime-video](https://huggingface.co/spaces/your-username/runash-realtime-video)
71
 
72
+ *Try generating short video clips in real time β€” no queue, no wait!*
73
+
74
+ ---
75
+
76
+ ## βš™οΈ Installation & Local Use
77
+
78
+ ```bash
79
+ git clone https://huggingface.co/spaces/rammurmu/runash-realtime-video
80
+ cd runash-realtime-video
81
+ pip install -r requirements.txt
82
+ python app.py
83
  ```
84
+
85
+ > Requires: Python 3.9+, PyTorch 2.0+, xFormers (optional), and a GPU with β‰₯8GB VRAM.
86
+
87
+ ---
88
+
89
+ ## πŸ“œ License
90
+
91
+ This space is a derivative of **Wan 2.1 Real-time Video Generation**.
92
+
93
+ - **Original Model License**: Apache 2.0
94
+ - **This Space**: Licensed under the same terms as the original. For commercial use, please refer to the original authors’ terms.
95
+ - **Disclaimer**: This is a demonstration/educational fork. Not affiliated with original authors unless explicitly stated.
96
+
97
+ ---
98
+
99
+ ## πŸ™ Attribution
100
+
101
+ This project is based on:
102
+
103
+ > **Wan 2.1 Real-time Video Generation**
104
+ > Authors: Ram Murmu RunAsh AI
105
+ > Hugging Face Link: [https://huggingface.co/spaces/wan-2.1](https://huggingface.co/spaces/wan-2.1)
106
+ > Paper: [Link to Paper, if available]
107
+
108
+ ---
109
+
110
+ ## πŸ’¬ Feedback & Support
111
+
112
+ Found a bug? Want a feature?
113
+ β†’ Open an [Issue](https://github.com/your-org/runash-video/issues)
114
+ β†’ Join our [Discord](https://discord.gg/your-invite-link)
115
+ β†’ Tweet at [@RunAsh AI Labs](https://twitter.com/RunAsh_AI)
116
+
117
+ ---
118
+
119
+ ## 🌟 Star This Repo
120
+
121
+ If you find this useful, please ⭐️ the original Wan 2.1 repo and this space!
122
+
123
+ ---
124
+
125
+ ## πŸ“Œ Disclaimer
126
+
127
+ This is a **duplicate/forked space** for demonstration and community experimentation. All credit for the underlying model goes to the original Wan 2.1 authors. This space does not claim original authorship of the core model.
128
+
129
+ ---
130
+
131
+ βœ… **Updated**: April 2025
132
+ πŸ§‘β€πŸ’» **Maintained by**: RunAsh AI Labs
133
+ πŸ“¬ ** Contact**: [[email protected]](mailto:[email protected])
134
+ ---