WoW-1-Wan-14B-2M / README.md

litwell

Update README.md

24e6af8 verified 21 days ago

preview code

raw

history blame

3.12 kB

metadata

license: mit
language:
  - en
library_name: transformers
tags:
  - video-generation
  - robotics
  - embodied-ai
  - physical-reasoning
  - causal-reasoning
  - inverse-dynamics
  - wow
  - arxiv:2509.22642
datasets:
  - WoW-world-model/WoW-1-Benchmark-Samples
pipeline_tag: video-generation
base_model: wan

🤖 WoW-1-Wan-14B-2M

WoW-1-Wan-14B is a 14-billion-parameter generative world model trained on 2 million real-world robot interaction trajectories. It is designed to imagine, reason, and act in physically consistent environments, powered by SOPHIA-guided refinement and a co-trained Inverse Dynamics Model.

This model is part of the WoW (World-Omniscient World Model) project, introduced in the paper:

WoW: Towards a World omniscient World model Through Embodied Interaction
Chi et al., 2025 – arXiv:2509.22642

🧠 Key Features

14B parameters trained on 2M robot interaction samples
Learns causal physical reasoning from embodied action
Generates physically consistent video and robotic action plans
Uses SOPHIA, a vision-language critic, to refine outputs
Paired with an Inverse Dynamics Model to complete imagination-to-action loop

🧪 Training Data

2M Real-world robot interaction trajectories
Multimodal scenes including vision, action, and language
Diverse mixture captions for better generalization

🧠 Mixture Caption Strategy

Prompt Lengths:
- Short: "The Franka robot, grasp the red bottle on the table"
- Long: "The scene... open the drawer, take the screwdriver, place it on the table..."
Robot Model Mixing:
- Captions reference various robot types
- Example: "grasp with the Franka Panda arm", "use end-effector to align"
Action Granularity:
- Coarse: "move to object"
- Fine: "rotate wrist 30° before grasping"

🔄 Continuous Updates

This dataset will be continuously updated with:

More trajectories
Richer language
Finer multimodal annotations

🧩 Applications

Zero-shot video generation in robotics
Causal reasoning and physics simulation
Long-horizon manipulation planning
Forward and inverse control prediction

📄 Citation

@article{chi2025wow,
  title={WoW: Towards a World omniscient World model Through Embodied Interaction},
  author={Chi, Xiaowei and Jia, Peidong and Fan, Chun-Kai and Ju, Xiaozhu and Mi, Weishi and Qin, Zhiyuan and Zhang, Kevin and Tian, Wanxin and Ge, Kuangzhi and Li, Hao and others},
  journal={arXiv preprint arXiv:2509.22642},
  year={2025}
}

🔗 Resources

🧠 Project page: wow-world-model.github.io
💻 GitHub repo: wow-world-model/wow-world-model
📊 Dataset: WoW-1 Benchmark Samples