metadata
license: mit
language:
- en
library_name: transformers
tags:
- video-generation
- robotics
- embodied-ai
- physical-reasoning
- causal-reasoning
- inverse-dynamics
- wow
- arxiv:2509.22642
datasets:
- WoW-world-model/WoW-1-Benchmark-Samples
pipeline_tag: video-generation
base_model: wan
π€ WoW-1-Wan-14B-2M
WoW-1-Wan-14B is a 14-billion-parameter generative world model trained on 2 million real-world robot interaction trajectories. It is designed to imagine, reason, and act in physically consistent environments, powered by SOPHIA-guided refinement and a co-trained Inverse Dynamics Model.
This model is part of the WoW (World-Omniscient World Model) project, introduced in the paper:
WoW: Towards a World omniscient World model Through Embodied Interaction
Chi et al., 2025 β arXiv:2509.22642
π§ Key Features
- 14B parameters trained on 2M robot interaction samples
- Learns causal physical reasoning from embodied action
- Generates physically consistent video and robotic action plans
- Uses SOPHIA, a vision-language critic, to refine outputs
- Paired with an Inverse Dynamics Model to complete imagination-to-action loop
π§ͺ Training Data
- 2M Real-world robot interaction trajectories
- Multimodal scenes including vision, action, and language
- Diverse mixture captions for better generalization
π§ Mixture Caption Strategy
Prompt Lengths:
- Short: "The Franka robot, grasp the red bottle on the table"
- Long: "The scene... open the drawer, take the screwdriver, place it on the table..."
Robot Model Mixing:
- Captions reference various robot types
- Example: "grasp with the Franka Panda arm", "use end-effector to align"
Action Granularity:
- Coarse: "move to object"
- Fine: "rotate wrist 30Β° before grasping"
π Continuous Updates
This dataset will be continuously updated with:
- More trajectories
- Richer language
- Finer multimodal annotations
π§© Applications
- Zero-shot video generation in robotics
- Causal reasoning and physics simulation
- Long-horizon manipulation planning
- Forward and inverse control prediction
π Citation
@article{chi2025wow,
title={WoW: Towards a World omniscient World model Through Embodied Interaction},
author={Chi, Xiaowei and Jia, Peidong and Fan, Chun-Kai and Ju, Xiaozhu and Mi, Weishi and Qin, Zhiyuan and Zhang, Kevin and Tian, Wanxin and Ge, Kuangzhi and Li, Hao and others},
journal={arXiv preprint arXiv:2509.22642},
year={2025}
}
π Resources
- π§ Project page: wow-world-model.github.io
- π» GitHub repo: wow-world-model/wow-world-model
- π Dataset: WoW-1 Benchmark Samples