Spaces:
Runtime error
Runtime error
Update README.md
#1
by
rammurmu
- opened
README.md
CHANGED
|
@@ -1,116 +1,134 @@
|
|
| 1 |
---
|
| 2 |
emoji: π₯
|
| 3 |
-
title: '
|
| 4 |
short_description: Real-time video generation
|
| 5 |
sdk: gradio
|
| 6 |
sdk_version: 5.34.2
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
<p align="center">
|
| 14 |
-
<a href="https://www.xunhuang.me/">Xun Huang</a><sup>1</sup>
|
| 15 |
-
Β·
|
| 16 |
-
<a href="https://zhengqili.github.io/">Zhengqi Li</a><sup>1</sup>
|
| 17 |
-
Β·
|
| 18 |
-
<a href="https://guandehe.github.io/">Guande He</a><sup>2</sup>
|
| 19 |
-
Β·
|
| 20 |
-
<a href="https://mingyuanzhou.github.io/">Mingyuan Zhou</a><sup>2</sup>
|
| 21 |
-
Β·
|
| 22 |
-
<a href="https://research.adobe.com/person/eli-shechtman/">Eli Shechtman</a><sup>1</sup><br>
|
| 23 |
-
<sup>1</sup>Adobe Research <sup>2</sup>UT Austin
|
| 24 |
-
</p>
|
| 25 |
-
<h3 align="center"><a href="https://arxiv.org/abs/2506.08009">Paper</a> | <a href="https://self-forcing.github.io">Website</a> | <a href="https://huggingface.co/gdhe17/Self-Forcing/tree/main">Models (HuggingFace)</a></h3>
|
| 26 |
-
</p>
|
| 27 |
|
| 28 |
---
|
| 29 |
|
| 30 |
-
|
|
|
|
|
|
|
| 31 |
|
| 32 |
---
|
| 33 |
|
|
|
|
| 34 |
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
|
|
|
| 37 |
|
| 38 |
-
##
|
| 39 |
-
We tested this repo on the following setup:
|
| 40 |
-
* Nvidia GPU with at least 24 GB memory (RTX 4090, A100, and H100 are tested).
|
| 41 |
-
* Linux operating system.
|
| 42 |
-
* 64 GB RAM.
|
| 43 |
|
| 44 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 45 |
|
| 46 |
-
|
| 47 |
-
Create a conda environment and install dependencies:
|
| 48 |
-
```
|
| 49 |
-
conda create -n self_forcing python=3.10 -y
|
| 50 |
-
conda activate self_forcing
|
| 51 |
-
pip install -r requirements.txt
|
| 52 |
-
pip install flash-attn --no-build-isolation
|
| 53 |
-
python setup.py develop
|
| 54 |
-
```
|
| 55 |
|
| 56 |
-
##
|
| 57 |
-
### Download checkpoints
|
| 58 |
-
```
|
| 59 |
-
huggingface-cli download Wan-AI/Wan2.1-T2V-1.3B --local-dir-use-symlinks False --local-dir wan_models/Wan2.1-T2V-1.3B
|
| 60 |
-
huggingface-cli download gdhe17/Self-Forcing checkpoints/self_forcing_dmd.pt --local-dir .
|
| 61 |
-
```
|
| 62 |
|
| 63 |
-
###
|
| 64 |
-
```
|
| 65 |
-
python demo.py
|
| 66 |
-
```
|
| 67 |
-
Note:
|
| 68 |
-
* **Our model works better with long, detailed prompts** since it's trained with such prompts. We will integrate prompt extension into the codebase (similar to [Wan2.1](https://github.com/Wan-Video/Wan2.1/tree/main?tab=readme-ov-file#2-using-prompt-extention)) in the future. For now, it is recommended to use third-party LLMs (such as GPT-4o) to extend your prompt before providing to the model.
|
| 69 |
-
* You may want to adjust FPS so it plays smoothly on your device.
|
| 70 |
-
* The speed can be improved by enabling `torch.compile`, [TAEHV-VAE](https://github.com/madebyollin/taehv/), or using FP8 Linear layers, although the latter two options may sacrifice quality. It is recommended to use `torch.compile` if possible and enable TAEHV-VAE if further speedup is needed.
|
| 71 |
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
python inference.py \
|
| 76 |
-
--config_path configs/self_forcing_dmd.yaml \
|
| 77 |
-
--output_folder videos/self_forcing_dmd \
|
| 78 |
-
--checkpoint_path checkpoints/self_forcing_dmd.pt \
|
| 79 |
-
--data_path prompts/MovieGenVideoBench_extended.txt \
|
| 80 |
-
--use_ema
|
| 81 |
```
|
| 82 |
-
Other config files and corresponding checkpoints can be found in [configs](configs) folder and our [huggingface repo](https://huggingface.co/gdhe17/Self-Forcing/tree/main/checkpoints).
|
| 83 |
|
| 84 |
-
|
| 85 |
-
### Download text prompts and ODE initialized checkpoint
|
| 86 |
-
```
|
| 87 |
-
huggingface-cli download gdhe17/Self-Forcing checkpoints/ode_init.pt --local-dir .
|
| 88 |
-
huggingface-cli download gdhe17/Self-Forcing vidprom_filtered_extended.txt --local-dir prompts
|
| 89 |
-
```
|
| 90 |
-
Note: Our training algorithm (except for the GAN version) is data-free (**no video data is needed**). For now, we directly provide the ODE initialization checkpoint and will add more instructions on how to perform ODE initialization in the future (which is identical to the process described in the [CausVid](https://github.com/tianweiy/CausVid) repo).
|
| 91 |
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
--rdzv_backend=c10d \
|
| 96 |
-
--rdzv_endpoint $MASTER_ADDR \
|
| 97 |
-
train.py \
|
| 98 |
-
--config_path configs/self_forcing_dmd.yaml \
|
| 99 |
-
--logdir logs/self_forcing_dmd \
|
| 100 |
-
--disable-wandb
|
| 101 |
```
|
| 102 |
-
Our training run uses 600 iterations and completes in under 2 hours using 64 H100 GPUs. By implementing gradient accumulation, it should be possible to reproduce the results in less than 16 hours using 8 H100 GPUs.
|
| 103 |
|
| 104 |
-
|
| 105 |
-
|
|
|
|
|
|
|
|
|
|
| 106 |
|
| 107 |
-
|
| 108 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 109 |
```
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
emoji: π₯
|
| 3 |
+
title: 'RunAsh Wan 2.1 '
|
| 4 |
short_description: Real-time video generation
|
| 5 |
sdk: gradio
|
| 6 |
sdk_version: 5.34.2
|
| 7 |
+
license: apache-2.0
|
| 8 |
+
colorFrom: yellow
|
| 9 |
+
colorTo: red
|
| 10 |
+
pinned: true
|
| 11 |
+
thumbnail: >-
|
| 12 |
+
https://cdn-uploads.huggingface.co/production/uploads/6799f4b5a2b48413dd18a8dd/nxPqZaXa6quMBU4ojqDzC.png
|
| 13 |
---
|
| 14 |
+
# π¬ RunAsh Real Time Video Generation
|
| 15 |
+
|
| 16 |
+
> *A real-time video generation model based on Wan 2.1 β optimized for low-latency inference and interactive applications.*
|
| 17 |
+
|
| 18 |
+

|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
---
|
| 21 |
|
| 22 |
+
## π§ Model Overview
|
| 23 |
+
|
| 24 |
+
**RunAsh Real Time Video Generation** is a fine-tuned and optimized version of **Wan 2.1**, designed for **real-time video synthesis** from text prompts or image inputs. It leverages efficient architecture modifications and inference optimizations to enable smooth, interactive video generation at the edge or in-browser environments.
|
| 25 |
|
| 26 |
---
|
| 27 |
|
| 28 |
+
## π Features
|
| 29 |
|
| 30 |
+
- β
**Real-time generation** (target: <500ms per frame on GPU)
|
| 31 |
+
- β
**Text-to-Video** and **Image-to-Video** modes
|
| 32 |
+
- β
Low VRAM usage optimizations
|
| 33 |
+
- β
Interactive Gradio UI for live demos
|
| 34 |
+
- β
Supports variable length outputs (2sβ8s clips)
|
| 35 |
+
- β
Plug-and-play with Hugging Face Spaces
|
| 36 |
|
| 37 |
+
---
|
| 38 |
|
| 39 |
+
## π οΈ Technical Details
|
|
|
|
|
|
|
|
|
|
|
|
|
| 40 |
|
| 41 |
+
- **Base Model**: `Wan-2.1-RealTime` (by original authors)
|
| 42 |
+
- **Architecture**: Diffusion Transformer + Latent Consistency Modules
|
| 43 |
+
- **Resolution**: 576x320 (16:9) or 320x576 (9:16) β configurable
|
| 44 |
+
- **Frame Rate**: 24 FPS (adjustable)
|
| 45 |
+
- **Latency**: ~300β600ms per generation step (RTX 3090 / A10G)
|
| 46 |
+
- **Max Duration**: 8 seconds (configurable in code)
|
| 47 |
|
| 48 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
+
## πΌοΈ Example Usage
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
+
### Text-to-Video
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
+
```python
|
| 55 |
+
prompt = "A cyberpunk cat riding a neon scooter through Tokyo at night"
|
| 56 |
+
video = pipeline(prompt, num_frames=48, guidance_scale=7.5)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
```
|
|
|
|
| 58 |
|
| 59 |
+
### Image-to-Video
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
|
| 61 |
+
```python
|
| 62 |
+
init_image = load_image("cat.png")
|
| 63 |
+
video = pipeline(init_image, motion_prompt="zoom in slowly", num_frames=24)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 64 |
```
|
|
|
|
| 65 |
|
| 66 |
+
---
|
| 67 |
+
|
| 68 |
+
## π§ͺ Try It Out
|
| 69 |
+
|
| 70 |
+
π **Live Demo**: [https://huggingface.co/spaces/rammurmu/runash-realtime-video](https://huggingface.co/spaces/your-username/runash-realtime-video)
|
| 71 |
|
| 72 |
+
*Try generating short video clips in real time β no queue, no wait!*
|
| 73 |
+
|
| 74 |
+
---
|
| 75 |
+
|
| 76 |
+
## βοΈ Installation & Local Use
|
| 77 |
+
|
| 78 |
+
```bash
|
| 79 |
+
git clone https://huggingface.co/spaces/rammurmu/runash-realtime-video
|
| 80 |
+
cd runash-realtime-video
|
| 81 |
+
pip install -r requirements.txt
|
| 82 |
+
python app.py
|
| 83 |
```
|
| 84 |
+
|
| 85 |
+
> Requires: Python 3.9+, PyTorch 2.0+, xFormers (optional), and a GPU with β₯8GB VRAM.
|
| 86 |
+
|
| 87 |
+
---
|
| 88 |
+
|
| 89 |
+
## π License
|
| 90 |
+
|
| 91 |
+
This space is a derivative of **Wan 2.1 Real-time Video Generation**.
|
| 92 |
+
|
| 93 |
+
- **Original Model License**: Apache 2.0
|
| 94 |
+
- **This Space**: Licensed under the same terms as the original. For commercial use, please refer to the original authorsβ terms.
|
| 95 |
+
- **Disclaimer**: This is a demonstration/educational fork. Not affiliated with original authors unless explicitly stated.
|
| 96 |
+
|
| 97 |
+
---
|
| 98 |
+
|
| 99 |
+
## π Attribution
|
| 100 |
+
|
| 101 |
+
This project is based on:
|
| 102 |
+
|
| 103 |
+
> **Wan 2.1 Real-time Video Generation**
|
| 104 |
+
> Authors: Ram Murmu RunAsh AI
|
| 105 |
+
> Hugging Face Link: [https://huggingface.co/spaces/wan-2.1](https://huggingface.co/spaces/wan-2.1)
|
| 106 |
+
> Paper: [Link to Paper, if available]
|
| 107 |
+
|
| 108 |
+
---
|
| 109 |
+
|
| 110 |
+
## π¬ Feedback & Support
|
| 111 |
+
|
| 112 |
+
Found a bug? Want a feature?
|
| 113 |
+
β Open an [Issue](https://github.com/your-org/runash-video/issues)
|
| 114 |
+
β Join our [Discord](https://discord.gg/your-invite-link)
|
| 115 |
+
β Tweet at [@RunAsh AI Labs](https://twitter.com/RunAsh_AI)
|
| 116 |
+
|
| 117 |
+
---
|
| 118 |
+
|
| 119 |
+
## π Star This Repo
|
| 120 |
+
|
| 121 |
+
If you find this useful, please βοΈ the original Wan 2.1 repo and this space!
|
| 122 |
+
|
| 123 |
+
---
|
| 124 |
+
|
| 125 |
+
## π Disclaimer
|
| 126 |
+
|
| 127 |
+
This is a **duplicate/forked space** for demonstration and community experimentation. All credit for the underlying model goes to the original Wan 2.1 authors. This space does not claim original authorship of the core model.
|
| 128 |
+
|
| 129 |
+
---
|
| 130 |
+
|
| 131 |
+
β
**Updated**: April 2025
|
| 132 |
+
π§βπ» **Maintained by**: RunAsh AI Labs
|
| 133 |
+
π¬ ** Contact**: [[email protected]](mailto:[email protected])
|
| 134 |
+
---
|