update README
Browse files
README.md
CHANGED
|
@@ -17,21 +17,33 @@ A key feature of Terra is its **high adherence to trajectory instructions**, ena
|
|
| 17 |
## Related Links
|
| 18 |
|
| 19 |
For more technical details and discussions, please refer to:
|
| 20 |
-
- **Paper:**
|
| 21 |
- **Code:** https://github.com/turingmotors/ACT-Bench
|
| 22 |
-
- **Blog Post:** [運転版の"Sora"を作る:動画生成の世界モデルTerraの開発背景](https://zenn.dev/turing_motors/articles/6c0ddc10aae542) (ja) / [Create a driving version of "Sora"](https://medium.com/@hide1996/create-a-driving-version-of-sora-33cf4040937a) (en)
|
| 23 |
|
| 24 |
## How to use
|
| 25 |
|
| 26 |
We have verified the execution on a machine equipped with a single NVIDIA H100 80GB GPU. However, we believe it should be possible to run the model on any machine equipped with an NVIDIA GPU with 16GB or more of VRAM.
|
| 27 |
|
|
|
|
|
|
|
| 28 |
### Install Packages
|
| 29 |
|
|
|
|
| 30 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
### Action-Conditioned Video Generation without Video Refiner
|
| 33 |
|
| 34 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
|
| 36 |
## Citation
|
| 37 |
|
|
|
|
| 17 |
## Related Links
|
| 18 |
|
| 19 |
For more technical details and discussions, please refer to:
|
| 20 |
+
- **Paper:** https://arxiv.org/abs/2412.05337
|
| 21 |
- **Code:** https://github.com/turingmotors/ACT-Bench
|
|
|
|
| 22 |
|
| 23 |
## How to use
|
| 24 |
|
| 25 |
We have verified the execution on a machine equipped with a single NVIDIA H100 80GB GPU. However, we believe it should be possible to run the model on any machine equipped with an NVIDIA GPU with 16GB or more of VRAM.
|
| 26 |
|
| 27 |
+
Terra consists of an Image Tokenizer, an Autoregressive Transformer, and a Video Refiner. Due to the complexity of setting up the Video Refiner, please refer to the [ACT-Bench repository](https://github.com/turingmotors/ACT-Bench) for detailed instructions. Here, we provide an example of generating video continuations using the Image Tokenizer and the Autoregressive Transformer, conditioned on image frames and a template trajectory. The resulting video quality might seem suboptimal as each frame is decoded individually. To improve the visual quality, you can use Video Refiner.
|
| 28 |
+
|
| 29 |
### Install Packages
|
| 30 |
|
| 31 |
+
We use [uv](https://docs.astral.sh/uv/) to manage python packages. If you don't have uv installed in your environment, please see the document of it.
|
| 32 |
|
| 33 |
+
```shell
|
| 34 |
+
$ git clone https://huggingface.co/turing-motors/Terra
|
| 35 |
+
$ uv sync
|
| 36 |
+
```
|
| 37 |
|
| 38 |
### Action-Conditioned Video Generation without Video Refiner
|
| 39 |
|
| 40 |
+
```shell
|
| 41 |
+
$ python inference.py
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
This command generates a video using three image frames located in  and the `curving_to_left/curving_to_left_moderate` trajectory defined in the trajectory template file .
|
| 45 |
+
|
| 46 |
+
You can find more details by referring to the  script.
|
| 47 |
|
| 48 |
## Citation
|
| 49 |
|