turing-motors
/

Terra

autonomous driving

video generation

Model card Files Files and versions

koukyo1994 commited on Dec 10, 2024

Commit

3580aa5

·

verified ·

1 Parent(s): 2f55211

update README

Files changed (1) hide show

README.md +15 -3

README.md CHANGED Viewed

@@ -17,21 +17,33 @@ A key feature of Terra is its **high adherence to trajectory instructions**, ena
 ## Related Links
 For more technical details and discussions, please refer to:
-- **Paper:** To Be Updated
 - **Code:** https://github.com/turingmotors/ACT-Bench
-- **Blog Post:** [運転版の"Sora"を作る:動画生成の世界モデルTerraの開発背景](https://zenn.dev/turing_motors/articles/6c0ddc10aae542) (ja) / [Create a driving version of "Sora"](https://medium.com/@hide1996/create-a-driving-version-of-sora-33cf4040937a) (en)
 ## How to use
 We have verified the execution on a machine equipped with a single NVIDIA H100 80GB GPU. However, we believe it should be possible to run the model on any machine equipped with an NVIDIA GPU with 16GB or more of VRAM.
 ### Install Packages
 ### Action-Conditioned Video Generation without Video Refiner
-### Action-Conditioned Video Generation with Video Refiner
 ## Citation

 ## Related Links
 For more technical details and discussions, please refer to:
+- **Paper:** https://arxiv.org/abs/2412.05337
 - **Code:** https://github.com/turingmotors/ACT-Bench
 ## How to use
 We have verified the execution on a machine equipped with a single NVIDIA H100 80GB GPU. However, we believe it should be possible to run the model on any machine equipped with an NVIDIA GPU with 16GB or more of VRAM.
+Terra consists of an Image Tokenizer, an Autoregressive Transformer, and a Video Refiner. Due to the complexity of setting up the Video Refiner, please refer to the [ACT-Bench repository](https://github.com/turingmotors/ACT-Bench) for detailed instructions. Here, we provide an example of generating video continuations using the Image Tokenizer and the Autoregressive Transformer, conditioned on image frames and a template trajectory. The resulting video quality might seem suboptimal as each frame is decoded individually. To improve the visual quality, you can use Video Refiner.
 ### Install Packages
+We use [uv](https://docs.astral.sh/uv/) to manage python packages. If you don't have uv installed in your environment, please see the document of it.
+```shell
+$ git clone https://huggingface.co/turing-motors/Terra
+$ uv sync
+```
 ### Action-Conditioned Video Generation without Video Refiner
+```shell
+$ python inference.py
+```
+This command generates a video using three image frames located in ![`assets/conditioning_frames`](./assets/conditioning_frames/) and the `curving_to_left/curving_to_left_moderate` trajectory defined in the trajectory template file ![`assets/template_trajectory.json`](./assets/template_trajectory.json).
+You can find more details by referring to the ![`inference.py`](./inference.py) script.
 ## Citation