ezeanubis commited on
Commit
a7aea10
·
verified ·
1 Parent(s): 530733d

Upload folder using huggingface_hub

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +1019 -0
  2. .github/workflows/update_space.yml +28 -0
  3. .gitignore +3 -0
  4. .gradio/certificate.pem +31 -0
  5. LICENSE +81 -0
  6. NOTICE +104 -0
  7. README.md +397 -7
  8. README_zh.md +395 -0
  9. app.py +15 -0
  10. assets/HYWorld_Voyager.pdf +3 -0
  11. assets/backbone.jpg +3 -0
  12. assets/data_engine.jpg +3 -0
  13. assets/demo/camera/input1.png +3 -0
  14. assets/demo/camera/input2.png +3 -0
  15. assets/demo/camera/input3.png +3 -0
  16. assets/gradio.png +3 -0
  17. assets/qrcode/discord.png +0 -0
  18. assets/qrcode/wechat.png +0 -0
  19. assets/qrcode/x.png +0 -0
  20. assets/qrcode/xiaohongshu.png +0 -0
  21. assets/teaser.png +3 -0
  22. assets/teaser_zh.png +3 -0
  23. ckpts/README.md +57 -0
  24. data_engine/README.md +62 -0
  25. data_engine/convert_point.py +72 -0
  26. data_engine/create_input.py +391 -0
  27. data_engine/depth_align.py +418 -0
  28. data_engine/metric3d_infer.py +115 -0
  29. data_engine/moge_infer.py +73 -0
  30. data_engine/requirements.txt +16 -0
  31. data_engine/run.sh +27 -0
  32. data_engine/vggt_infer.py +242 -0
  33. examples/case1/condition.mp4 +3 -0
  34. examples/case1/depth_range.json +1 -0
  35. examples/case1/prompt.txt +1 -0
  36. examples/case1/ref_depth.exr +3 -0
  37. examples/case1/ref_image.png +3 -0
  38. examples/case1/video_input/depth_0000.exr +3 -0
  39. examples/case1/video_input/depth_0001.exr +3 -0
  40. examples/case1/video_input/depth_0002.exr +3 -0
  41. examples/case1/video_input/depth_0003.exr +3 -0
  42. examples/case1/video_input/depth_0004.exr +3 -0
  43. examples/case1/video_input/depth_0005.exr +3 -0
  44. examples/case1/video_input/depth_0006.exr +3 -0
  45. examples/case1/video_input/depth_0007.exr +3 -0
  46. examples/case1/video_input/depth_0008.exr +3 -0
  47. examples/case1/video_input/depth_0009.exr +3 -0
  48. examples/case1/video_input/depth_0010.exr +3 -0
  49. examples/case1/video_input/depth_0011.exr +3 -0
  50. examples/case1/video_input/depth_0012.exr +3 -0
.gitattributes CHANGED
@@ -33,3 +33,1022 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/HYWorld_Voyager.pdf filter=lfs diff=lfs merge=lfs -text
37
+ assets/backbone.jpg filter=lfs diff=lfs merge=lfs -text
38
+ assets/data_engine.jpg filter=lfs diff=lfs merge=lfs -text
39
+ assets/demo/camera/input1.png filter=lfs diff=lfs merge=lfs -text
40
+ assets/demo/camera/input2.png filter=lfs diff=lfs merge=lfs -text
41
+ assets/demo/camera/input3.png filter=lfs diff=lfs merge=lfs -text
42
+ assets/gradio.png filter=lfs diff=lfs merge=lfs -text
43
+ assets/teaser.png filter=lfs diff=lfs merge=lfs -text
44
+ assets/teaser_zh.png filter=lfs diff=lfs merge=lfs -text
45
+ examples/case1/condition.mp4 filter=lfs diff=lfs merge=lfs -text
46
+ examples/case1/ref_depth.exr filter=lfs diff=lfs merge=lfs -text
47
+ examples/case1/ref_image.png filter=lfs diff=lfs merge=lfs -text
48
+ examples/case1/video_input/depth_0000.exr filter=lfs diff=lfs merge=lfs -text
49
+ examples/case1/video_input/depth_0001.exr filter=lfs diff=lfs merge=lfs -text
50
+ examples/case1/video_input/depth_0002.exr filter=lfs diff=lfs merge=lfs -text
51
+ examples/case1/video_input/depth_0003.exr filter=lfs diff=lfs merge=lfs -text
52
+ examples/case1/video_input/depth_0004.exr filter=lfs diff=lfs merge=lfs -text
53
+ examples/case1/video_input/depth_0005.exr filter=lfs diff=lfs merge=lfs -text
54
+ examples/case1/video_input/depth_0006.exr filter=lfs diff=lfs merge=lfs -text
55
+ examples/case1/video_input/depth_0007.exr filter=lfs diff=lfs merge=lfs -text
56
+ examples/case1/video_input/depth_0008.exr filter=lfs diff=lfs merge=lfs -text
57
+ examples/case1/video_input/depth_0009.exr filter=lfs diff=lfs merge=lfs -text
58
+ examples/case1/video_input/depth_0010.exr filter=lfs diff=lfs merge=lfs -text
59
+ examples/case1/video_input/depth_0011.exr filter=lfs diff=lfs merge=lfs -text
60
+ examples/case1/video_input/depth_0012.exr filter=lfs diff=lfs merge=lfs -text
61
+ examples/case1/video_input/depth_0013.exr filter=lfs diff=lfs merge=lfs -text
62
+ examples/case1/video_input/depth_0014.exr filter=lfs diff=lfs merge=lfs -text
63
+ examples/case1/video_input/depth_0015.exr filter=lfs diff=lfs merge=lfs -text
64
+ examples/case1/video_input/depth_0016.exr filter=lfs diff=lfs merge=lfs -text
65
+ examples/case1/video_input/depth_0017.exr filter=lfs diff=lfs merge=lfs -text
66
+ examples/case1/video_input/depth_0018.exr filter=lfs diff=lfs merge=lfs -text
67
+ examples/case1/video_input/depth_0019.exr filter=lfs diff=lfs merge=lfs -text
68
+ examples/case1/video_input/depth_0020.exr filter=lfs diff=lfs merge=lfs -text
69
+ examples/case1/video_input/depth_0021.exr filter=lfs diff=lfs merge=lfs -text
70
+ examples/case1/video_input/depth_0022.exr filter=lfs diff=lfs merge=lfs -text
71
+ examples/case1/video_input/depth_0023.exr filter=lfs diff=lfs merge=lfs -text
72
+ examples/case1/video_input/depth_0024.exr filter=lfs diff=lfs merge=lfs -text
73
+ examples/case1/video_input/depth_0025.exr filter=lfs diff=lfs merge=lfs -text
74
+ examples/case1/video_input/depth_0026.exr filter=lfs diff=lfs merge=lfs -text
75
+ examples/case1/video_input/depth_0027.exr filter=lfs diff=lfs merge=lfs -text
76
+ examples/case1/video_input/depth_0028.exr filter=lfs diff=lfs merge=lfs -text
77
+ examples/case1/video_input/depth_0029.exr filter=lfs diff=lfs merge=lfs -text
78
+ examples/case1/video_input/depth_0030.exr filter=lfs diff=lfs merge=lfs -text
79
+ examples/case1/video_input/depth_0031.exr filter=lfs diff=lfs merge=lfs -text
80
+ examples/case1/video_input/depth_0032.exr filter=lfs diff=lfs merge=lfs -text
81
+ examples/case1/video_input/depth_0033.exr filter=lfs diff=lfs merge=lfs -text
82
+ examples/case1/video_input/depth_0034.exr filter=lfs diff=lfs merge=lfs -text
83
+ examples/case1/video_input/depth_0035.exr filter=lfs diff=lfs merge=lfs -text
84
+ examples/case1/video_input/depth_0036.exr filter=lfs diff=lfs merge=lfs -text
85
+ examples/case1/video_input/depth_0037.exr filter=lfs diff=lfs merge=lfs -text
86
+ examples/case1/video_input/depth_0038.exr filter=lfs diff=lfs merge=lfs -text
87
+ examples/case1/video_input/depth_0039.exr filter=lfs diff=lfs merge=lfs -text
88
+ examples/case1/video_input/depth_0040.exr filter=lfs diff=lfs merge=lfs -text
89
+ examples/case1/video_input/depth_0041.exr filter=lfs diff=lfs merge=lfs -text
90
+ examples/case1/video_input/depth_0042.exr filter=lfs diff=lfs merge=lfs -text
91
+ examples/case1/video_input/depth_0043.exr filter=lfs diff=lfs merge=lfs -text
92
+ examples/case1/video_input/depth_0044.exr filter=lfs diff=lfs merge=lfs -text
93
+ examples/case1/video_input/depth_0045.exr filter=lfs diff=lfs merge=lfs -text
94
+ examples/case1/video_input/depth_0046.exr filter=lfs diff=lfs merge=lfs -text
95
+ examples/case1/video_input/depth_0047.exr filter=lfs diff=lfs merge=lfs -text
96
+ examples/case1/video_input/depth_0048.exr filter=lfs diff=lfs merge=lfs -text
97
+ examples/case1/video_input/render_0000.png filter=lfs diff=lfs merge=lfs -text
98
+ examples/case1/video_input/render_0001.png filter=lfs diff=lfs merge=lfs -text
99
+ examples/case1/video_input/render_0002.png filter=lfs diff=lfs merge=lfs -text
100
+ examples/case1/video_input/render_0003.png filter=lfs diff=lfs merge=lfs -text
101
+ examples/case1/video_input/render_0004.png filter=lfs diff=lfs merge=lfs -text
102
+ examples/case1/video_input/render_0005.png filter=lfs diff=lfs merge=lfs -text
103
+ examples/case1/video_input/render_0006.png filter=lfs diff=lfs merge=lfs -text
104
+ examples/case1/video_input/render_0007.png filter=lfs diff=lfs merge=lfs -text
105
+ examples/case1/video_input/render_0008.png filter=lfs diff=lfs merge=lfs -text
106
+ examples/case1/video_input/render_0009.png filter=lfs diff=lfs merge=lfs -text
107
+ examples/case1/video_input/render_0010.png filter=lfs diff=lfs merge=lfs -text
108
+ examples/case1/video_input/render_0011.png filter=lfs diff=lfs merge=lfs -text
109
+ examples/case1/video_input/render_0012.png filter=lfs diff=lfs merge=lfs -text
110
+ examples/case1/video_input/render_0013.png filter=lfs diff=lfs merge=lfs -text
111
+ examples/case1/video_input/render_0014.png filter=lfs diff=lfs merge=lfs -text
112
+ examples/case1/video_input/render_0015.png filter=lfs diff=lfs merge=lfs -text
113
+ examples/case1/video_input/render_0016.png filter=lfs diff=lfs merge=lfs -text
114
+ examples/case1/video_input/render_0017.png filter=lfs diff=lfs merge=lfs -text
115
+ examples/case1/video_input/render_0018.png filter=lfs diff=lfs merge=lfs -text
116
+ examples/case1/video_input/render_0019.png filter=lfs diff=lfs merge=lfs -text
117
+ examples/case1/video_input/render_0020.png filter=lfs diff=lfs merge=lfs -text
118
+ examples/case1/video_input/render_0021.png filter=lfs diff=lfs merge=lfs -text
119
+ examples/case1/video_input/render_0022.png filter=lfs diff=lfs merge=lfs -text
120
+ examples/case1/video_input/render_0023.png filter=lfs diff=lfs merge=lfs -text
121
+ examples/case1/video_input/render_0024.png filter=lfs diff=lfs merge=lfs -text
122
+ examples/case1/video_input/render_0025.png filter=lfs diff=lfs merge=lfs -text
123
+ examples/case1/video_input/render_0026.png filter=lfs diff=lfs merge=lfs -text
124
+ examples/case1/video_input/render_0027.png filter=lfs diff=lfs merge=lfs -text
125
+ examples/case1/video_input/render_0028.png filter=lfs diff=lfs merge=lfs -text
126
+ examples/case1/video_input/render_0029.png filter=lfs diff=lfs merge=lfs -text
127
+ examples/case1/video_input/render_0030.png filter=lfs diff=lfs merge=lfs -text
128
+ examples/case1/video_input/render_0031.png filter=lfs diff=lfs merge=lfs -text
129
+ examples/case1/video_input/render_0032.png filter=lfs diff=lfs merge=lfs -text
130
+ examples/case1/video_input/render_0033.png filter=lfs diff=lfs merge=lfs -text
131
+ examples/case1/video_input/render_0034.png filter=lfs diff=lfs merge=lfs -text
132
+ examples/case1/video_input/render_0035.png filter=lfs diff=lfs merge=lfs -text
133
+ examples/case1/video_input/render_0036.png filter=lfs diff=lfs merge=lfs -text
134
+ examples/case1/video_input/render_0037.png filter=lfs diff=lfs merge=lfs -text
135
+ examples/case1/video_input/render_0038.png filter=lfs diff=lfs merge=lfs -text
136
+ examples/case1/video_input/render_0039.png filter=lfs diff=lfs merge=lfs -text
137
+ examples/case1/video_input/render_0040.png filter=lfs diff=lfs merge=lfs -text
138
+ examples/case1/video_input/render_0041.png filter=lfs diff=lfs merge=lfs -text
139
+ examples/case1/video_input/render_0042.png filter=lfs diff=lfs merge=lfs -text
140
+ examples/case1/video_input/render_0043.png filter=lfs diff=lfs merge=lfs -text
141
+ examples/case1/video_input/render_0044.png filter=lfs diff=lfs merge=lfs -text
142
+ examples/case1/video_input/render_0045.png filter=lfs diff=lfs merge=lfs -text
143
+ examples/case1/video_input/render_0046.png filter=lfs diff=lfs merge=lfs -text
144
+ examples/case1/video_input/render_0047.png filter=lfs diff=lfs merge=lfs -text
145
+ examples/case1/video_input/render_0048.png filter=lfs diff=lfs merge=lfs -text
146
+ examples/case10/condition.mp4 filter=lfs diff=lfs merge=lfs -text
147
+ examples/case10/ref_depth.exr filter=lfs diff=lfs merge=lfs -text
148
+ examples/case10/ref_image.png filter=lfs diff=lfs merge=lfs -text
149
+ examples/case10/video_input/depth_0000.exr filter=lfs diff=lfs merge=lfs -text
150
+ examples/case10/video_input/depth_0001.exr filter=lfs diff=lfs merge=lfs -text
151
+ examples/case10/video_input/depth_0002.exr filter=lfs diff=lfs merge=lfs -text
152
+ examples/case10/video_input/depth_0003.exr filter=lfs diff=lfs merge=lfs -text
153
+ examples/case10/video_input/depth_0004.exr filter=lfs diff=lfs merge=lfs -text
154
+ examples/case10/video_input/depth_0005.exr filter=lfs diff=lfs merge=lfs -text
155
+ examples/case10/video_input/depth_0006.exr filter=lfs diff=lfs merge=lfs -text
156
+ examples/case10/video_input/depth_0007.exr filter=lfs diff=lfs merge=lfs -text
157
+ examples/case10/video_input/depth_0008.exr filter=lfs diff=lfs merge=lfs -text
158
+ examples/case10/video_input/depth_0009.exr filter=lfs diff=lfs merge=lfs -text
159
+ examples/case10/video_input/depth_0010.exr filter=lfs diff=lfs merge=lfs -text
160
+ examples/case10/video_input/depth_0011.exr filter=lfs diff=lfs merge=lfs -text
161
+ examples/case10/video_input/depth_0012.exr filter=lfs diff=lfs merge=lfs -text
162
+ examples/case10/video_input/depth_0013.exr filter=lfs diff=lfs merge=lfs -text
163
+ examples/case10/video_input/depth_0014.exr filter=lfs diff=lfs merge=lfs -text
164
+ examples/case10/video_input/depth_0015.exr filter=lfs diff=lfs merge=lfs -text
165
+ examples/case10/video_input/depth_0016.exr filter=lfs diff=lfs merge=lfs -text
166
+ examples/case10/video_input/depth_0017.exr filter=lfs diff=lfs merge=lfs -text
167
+ examples/case10/video_input/depth_0018.exr filter=lfs diff=lfs merge=lfs -text
168
+ examples/case10/video_input/depth_0019.exr filter=lfs diff=lfs merge=lfs -text
169
+ examples/case10/video_input/depth_0020.exr filter=lfs diff=lfs merge=lfs -text
170
+ examples/case10/video_input/depth_0021.exr filter=lfs diff=lfs merge=lfs -text
171
+ examples/case10/video_input/depth_0022.exr filter=lfs diff=lfs merge=lfs -text
172
+ examples/case10/video_input/depth_0023.exr filter=lfs diff=lfs merge=lfs -text
173
+ examples/case10/video_input/depth_0024.exr filter=lfs diff=lfs merge=lfs -text
174
+ examples/case10/video_input/depth_0025.exr filter=lfs diff=lfs merge=lfs -text
175
+ examples/case10/video_input/depth_0026.exr filter=lfs diff=lfs merge=lfs -text
176
+ examples/case10/video_input/depth_0027.exr filter=lfs diff=lfs merge=lfs -text
177
+ examples/case10/video_input/depth_0028.exr filter=lfs diff=lfs merge=lfs -text
178
+ examples/case10/video_input/depth_0029.exr filter=lfs diff=lfs merge=lfs -text
179
+ examples/case10/video_input/depth_0030.exr filter=lfs diff=lfs merge=lfs -text
180
+ examples/case10/video_input/depth_0031.exr filter=lfs diff=lfs merge=lfs -text
181
+ examples/case10/video_input/depth_0032.exr filter=lfs diff=lfs merge=lfs -text
182
+ examples/case10/video_input/depth_0033.exr filter=lfs diff=lfs merge=lfs -text
183
+ examples/case10/video_input/depth_0034.exr filter=lfs diff=lfs merge=lfs -text
184
+ examples/case10/video_input/depth_0035.exr filter=lfs diff=lfs merge=lfs -text
185
+ examples/case10/video_input/depth_0036.exr filter=lfs diff=lfs merge=lfs -text
186
+ examples/case10/video_input/depth_0037.exr filter=lfs diff=lfs merge=lfs -text
187
+ examples/case10/video_input/depth_0038.exr filter=lfs diff=lfs merge=lfs -text
188
+ examples/case10/video_input/depth_0039.exr filter=lfs diff=lfs merge=lfs -text
189
+ examples/case10/video_input/depth_0040.exr filter=lfs diff=lfs merge=lfs -text
190
+ examples/case10/video_input/depth_0041.exr filter=lfs diff=lfs merge=lfs -text
191
+ examples/case10/video_input/depth_0042.exr filter=lfs diff=lfs merge=lfs -text
192
+ examples/case10/video_input/depth_0043.exr filter=lfs diff=lfs merge=lfs -text
193
+ examples/case10/video_input/depth_0044.exr filter=lfs diff=lfs merge=lfs -text
194
+ examples/case10/video_input/depth_0045.exr filter=lfs diff=lfs merge=lfs -text
195
+ examples/case10/video_input/depth_0046.exr filter=lfs diff=lfs merge=lfs -text
196
+ examples/case10/video_input/depth_0047.exr filter=lfs diff=lfs merge=lfs -text
197
+ examples/case10/video_input/depth_0048.exr filter=lfs diff=lfs merge=lfs -text
198
+ examples/case10/video_input/render_0000.png filter=lfs diff=lfs merge=lfs -text
199
+ examples/case10/video_input/render_0001.png filter=lfs diff=lfs merge=lfs -text
200
+ examples/case10/video_input/render_0002.png filter=lfs diff=lfs merge=lfs -text
201
+ examples/case10/video_input/render_0003.png filter=lfs diff=lfs merge=lfs -text
202
+ examples/case10/video_input/render_0004.png filter=lfs diff=lfs merge=lfs -text
203
+ examples/case10/video_input/render_0005.png filter=lfs diff=lfs merge=lfs -text
204
+ examples/case10/video_input/render_0006.png filter=lfs diff=lfs merge=lfs -text
205
+ examples/case10/video_input/render_0007.png filter=lfs diff=lfs merge=lfs -text
206
+ examples/case10/video_input/render_0008.png filter=lfs diff=lfs merge=lfs -text
207
+ examples/case10/video_input/render_0009.png filter=lfs diff=lfs merge=lfs -text
208
+ examples/case10/video_input/render_0010.png filter=lfs diff=lfs merge=lfs -text
209
+ examples/case10/video_input/render_0011.png filter=lfs diff=lfs merge=lfs -text
210
+ examples/case10/video_input/render_0012.png filter=lfs diff=lfs merge=lfs -text
211
+ examples/case10/video_input/render_0013.png filter=lfs diff=lfs merge=lfs -text
212
+ examples/case10/video_input/render_0014.png filter=lfs diff=lfs merge=lfs -text
213
+ examples/case10/video_input/render_0015.png filter=lfs diff=lfs merge=lfs -text
214
+ examples/case10/video_input/render_0016.png filter=lfs diff=lfs merge=lfs -text
215
+ examples/case10/video_input/render_0017.png filter=lfs diff=lfs merge=lfs -text
216
+ examples/case10/video_input/render_0018.png filter=lfs diff=lfs merge=lfs -text
217
+ examples/case10/video_input/render_0019.png filter=lfs diff=lfs merge=lfs -text
218
+ examples/case10/video_input/render_0020.png filter=lfs diff=lfs merge=lfs -text
219
+ examples/case10/video_input/render_0021.png filter=lfs diff=lfs merge=lfs -text
220
+ examples/case10/video_input/render_0022.png filter=lfs diff=lfs merge=lfs -text
221
+ examples/case10/video_input/render_0023.png filter=lfs diff=lfs merge=lfs -text
222
+ examples/case10/video_input/render_0024.png filter=lfs diff=lfs merge=lfs -text
223
+ examples/case10/video_input/render_0025.png filter=lfs diff=lfs merge=lfs -text
224
+ examples/case10/video_input/render_0026.png filter=lfs diff=lfs merge=lfs -text
225
+ examples/case10/video_input/render_0027.png filter=lfs diff=lfs merge=lfs -text
226
+ examples/case10/video_input/render_0028.png filter=lfs diff=lfs merge=lfs -text
227
+ examples/case10/video_input/render_0029.png filter=lfs diff=lfs merge=lfs -text
228
+ examples/case10/video_input/render_0030.png filter=lfs diff=lfs merge=lfs -text
229
+ examples/case10/video_input/render_0031.png filter=lfs diff=lfs merge=lfs -text
230
+ examples/case10/video_input/render_0032.png filter=lfs diff=lfs merge=lfs -text
231
+ examples/case10/video_input/render_0033.png filter=lfs diff=lfs merge=lfs -text
232
+ examples/case10/video_input/render_0034.png filter=lfs diff=lfs merge=lfs -text
233
+ examples/case10/video_input/render_0035.png filter=lfs diff=lfs merge=lfs -text
234
+ examples/case10/video_input/render_0036.png filter=lfs diff=lfs merge=lfs -text
235
+ examples/case10/video_input/render_0037.png filter=lfs diff=lfs merge=lfs -text
236
+ examples/case10/video_input/render_0038.png filter=lfs diff=lfs merge=lfs -text
237
+ examples/case10/video_input/render_0039.png filter=lfs diff=lfs merge=lfs -text
238
+ examples/case10/video_input/render_0040.png filter=lfs diff=lfs merge=lfs -text
239
+ examples/case10/video_input/render_0041.png filter=lfs diff=lfs merge=lfs -text
240
+ examples/case10/video_input/render_0042.png filter=lfs diff=lfs merge=lfs -text
241
+ examples/case10/video_input/render_0043.png filter=lfs diff=lfs merge=lfs -text
242
+ examples/case10/video_input/render_0044.png filter=lfs diff=lfs merge=lfs -text
243
+ examples/case10/video_input/render_0045.png filter=lfs diff=lfs merge=lfs -text
244
+ examples/case10/video_input/render_0046.png filter=lfs diff=lfs merge=lfs -text
245
+ examples/case10/video_input/render_0047.png filter=lfs diff=lfs merge=lfs -text
246
+ examples/case10/video_input/render_0048.png filter=lfs diff=lfs merge=lfs -text
247
+ examples/case2/condition.mp4 filter=lfs diff=lfs merge=lfs -text
248
+ examples/case2/ref_depth.exr filter=lfs diff=lfs merge=lfs -text
249
+ examples/case2/ref_image.png filter=lfs diff=lfs merge=lfs -text
250
+ examples/case2/video_input/depth_0000.exr filter=lfs diff=lfs merge=lfs -text
251
+ examples/case2/video_input/depth_0001.exr filter=lfs diff=lfs merge=lfs -text
252
+ examples/case2/video_input/depth_0002.exr filter=lfs diff=lfs merge=lfs -text
253
+ examples/case2/video_input/depth_0003.exr filter=lfs diff=lfs merge=lfs -text
254
+ examples/case2/video_input/depth_0004.exr filter=lfs diff=lfs merge=lfs -text
255
+ examples/case2/video_input/depth_0005.exr filter=lfs diff=lfs merge=lfs -text
256
+ examples/case2/video_input/depth_0006.exr filter=lfs diff=lfs merge=lfs -text
257
+ examples/case2/video_input/depth_0007.exr filter=lfs diff=lfs merge=lfs -text
258
+ examples/case2/video_input/depth_0008.exr filter=lfs diff=lfs merge=lfs -text
259
+ examples/case2/video_input/depth_0009.exr filter=lfs diff=lfs merge=lfs -text
260
+ examples/case2/video_input/depth_0010.exr filter=lfs diff=lfs merge=lfs -text
261
+ examples/case2/video_input/depth_0011.exr filter=lfs diff=lfs merge=lfs -text
262
+ examples/case2/video_input/depth_0012.exr filter=lfs diff=lfs merge=lfs -text
263
+ examples/case2/video_input/depth_0013.exr filter=lfs diff=lfs merge=lfs -text
264
+ examples/case2/video_input/depth_0014.exr filter=lfs diff=lfs merge=lfs -text
265
+ examples/case2/video_input/depth_0015.exr filter=lfs diff=lfs merge=lfs -text
266
+ examples/case2/video_input/depth_0016.exr filter=lfs diff=lfs merge=lfs -text
267
+ examples/case2/video_input/depth_0017.exr filter=lfs diff=lfs merge=lfs -text
268
+ examples/case2/video_input/depth_0018.exr filter=lfs diff=lfs merge=lfs -text
269
+ examples/case2/video_input/depth_0019.exr filter=lfs diff=lfs merge=lfs -text
270
+ examples/case2/video_input/depth_0020.exr filter=lfs diff=lfs merge=lfs -text
271
+ examples/case2/video_input/depth_0021.exr filter=lfs diff=lfs merge=lfs -text
272
+ examples/case2/video_input/depth_0022.exr filter=lfs diff=lfs merge=lfs -text
273
+ examples/case2/video_input/depth_0023.exr filter=lfs diff=lfs merge=lfs -text
274
+ examples/case2/video_input/depth_0024.exr filter=lfs diff=lfs merge=lfs -text
275
+ examples/case2/video_input/depth_0025.exr filter=lfs diff=lfs merge=lfs -text
276
+ examples/case2/video_input/depth_0026.exr filter=lfs diff=lfs merge=lfs -text
277
+ examples/case2/video_input/depth_0027.exr filter=lfs diff=lfs merge=lfs -text
278
+ examples/case2/video_input/depth_0028.exr filter=lfs diff=lfs merge=lfs -text
279
+ examples/case2/video_input/depth_0029.exr filter=lfs diff=lfs merge=lfs -text
280
+ examples/case2/video_input/depth_0030.exr filter=lfs diff=lfs merge=lfs -text
281
+ examples/case2/video_input/depth_0031.exr filter=lfs diff=lfs merge=lfs -text
282
+ examples/case2/video_input/depth_0032.exr filter=lfs diff=lfs merge=lfs -text
283
+ examples/case2/video_input/depth_0033.exr filter=lfs diff=lfs merge=lfs -text
284
+ examples/case2/video_input/depth_0034.exr filter=lfs diff=lfs merge=lfs -text
285
+ examples/case2/video_input/depth_0035.exr filter=lfs diff=lfs merge=lfs -text
286
+ examples/case2/video_input/depth_0036.exr filter=lfs diff=lfs merge=lfs -text
287
+ examples/case2/video_input/depth_0037.exr filter=lfs diff=lfs merge=lfs -text
288
+ examples/case2/video_input/depth_0038.exr filter=lfs diff=lfs merge=lfs -text
289
+ examples/case2/video_input/depth_0039.exr filter=lfs diff=lfs merge=lfs -text
290
+ examples/case2/video_input/depth_0040.exr filter=lfs diff=lfs merge=lfs -text
291
+ examples/case2/video_input/depth_0041.exr filter=lfs diff=lfs merge=lfs -text
292
+ examples/case2/video_input/depth_0042.exr filter=lfs diff=lfs merge=lfs -text
293
+ examples/case2/video_input/depth_0043.exr filter=lfs diff=lfs merge=lfs -text
294
+ examples/case2/video_input/depth_0044.exr filter=lfs diff=lfs merge=lfs -text
295
+ examples/case2/video_input/depth_0045.exr filter=lfs diff=lfs merge=lfs -text
296
+ examples/case2/video_input/depth_0046.exr filter=lfs diff=lfs merge=lfs -text
297
+ examples/case2/video_input/depth_0047.exr filter=lfs diff=lfs merge=lfs -text
298
+ examples/case2/video_input/depth_0048.exr filter=lfs diff=lfs merge=lfs -text
299
+ examples/case2/video_input/render_0000.png filter=lfs diff=lfs merge=lfs -text
300
+ examples/case2/video_input/render_0001.png filter=lfs diff=lfs merge=lfs -text
301
+ examples/case2/video_input/render_0002.png filter=lfs diff=lfs merge=lfs -text
302
+ examples/case2/video_input/render_0003.png filter=lfs diff=lfs merge=lfs -text
303
+ examples/case2/video_input/render_0004.png filter=lfs diff=lfs merge=lfs -text
304
+ examples/case2/video_input/render_0005.png filter=lfs diff=lfs merge=lfs -text
305
+ examples/case2/video_input/render_0006.png filter=lfs diff=lfs merge=lfs -text
306
+ examples/case2/video_input/render_0007.png filter=lfs diff=lfs merge=lfs -text
307
+ examples/case2/video_input/render_0008.png filter=lfs diff=lfs merge=lfs -text
308
+ examples/case2/video_input/render_0009.png filter=lfs diff=lfs merge=lfs -text
309
+ examples/case2/video_input/render_0010.png filter=lfs diff=lfs merge=lfs -text
310
+ examples/case2/video_input/render_0011.png filter=lfs diff=lfs merge=lfs -text
311
+ examples/case2/video_input/render_0012.png filter=lfs diff=lfs merge=lfs -text
312
+ examples/case2/video_input/render_0013.png filter=lfs diff=lfs merge=lfs -text
313
+ examples/case2/video_input/render_0014.png filter=lfs diff=lfs merge=lfs -text
314
+ examples/case2/video_input/render_0015.png filter=lfs diff=lfs merge=lfs -text
315
+ examples/case2/video_input/render_0016.png filter=lfs diff=lfs merge=lfs -text
316
+ examples/case2/video_input/render_0017.png filter=lfs diff=lfs merge=lfs -text
317
+ examples/case2/video_input/render_0018.png filter=lfs diff=lfs merge=lfs -text
318
+ examples/case2/video_input/render_0019.png filter=lfs diff=lfs merge=lfs -text
319
+ examples/case2/video_input/render_0020.png filter=lfs diff=lfs merge=lfs -text
320
+ examples/case2/video_input/render_0021.png filter=lfs diff=lfs merge=lfs -text
321
+ examples/case2/video_input/render_0022.png filter=lfs diff=lfs merge=lfs -text
322
+ examples/case2/video_input/render_0023.png filter=lfs diff=lfs merge=lfs -text
323
+ examples/case2/video_input/render_0024.png filter=lfs diff=lfs merge=lfs -text
324
+ examples/case2/video_input/render_0025.png filter=lfs diff=lfs merge=lfs -text
325
+ examples/case2/video_input/render_0026.png filter=lfs diff=lfs merge=lfs -text
326
+ examples/case2/video_input/render_0027.png filter=lfs diff=lfs merge=lfs -text
327
+ examples/case2/video_input/render_0028.png filter=lfs diff=lfs merge=lfs -text
328
+ examples/case2/video_input/render_0029.png filter=lfs diff=lfs merge=lfs -text
329
+ examples/case2/video_input/render_0030.png filter=lfs diff=lfs merge=lfs -text
330
+ examples/case2/video_input/render_0031.png filter=lfs diff=lfs merge=lfs -text
331
+ examples/case2/video_input/render_0032.png filter=lfs diff=lfs merge=lfs -text
332
+ examples/case2/video_input/render_0033.png filter=lfs diff=lfs merge=lfs -text
333
+ examples/case2/video_input/render_0034.png filter=lfs diff=lfs merge=lfs -text
334
+ examples/case2/video_input/render_0035.png filter=lfs diff=lfs merge=lfs -text
335
+ examples/case2/video_input/render_0036.png filter=lfs diff=lfs merge=lfs -text
336
+ examples/case2/video_input/render_0037.png filter=lfs diff=lfs merge=lfs -text
337
+ examples/case2/video_input/render_0038.png filter=lfs diff=lfs merge=lfs -text
338
+ examples/case2/video_input/render_0039.png filter=lfs diff=lfs merge=lfs -text
339
+ examples/case2/video_input/render_0040.png filter=lfs diff=lfs merge=lfs -text
340
+ examples/case2/video_input/render_0041.png filter=lfs diff=lfs merge=lfs -text
341
+ examples/case2/video_input/render_0042.png filter=lfs diff=lfs merge=lfs -text
342
+ examples/case2/video_input/render_0043.png filter=lfs diff=lfs merge=lfs -text
343
+ examples/case2/video_input/render_0044.png filter=lfs diff=lfs merge=lfs -text
344
+ examples/case2/video_input/render_0045.png filter=lfs diff=lfs merge=lfs -text
345
+ examples/case2/video_input/render_0046.png filter=lfs diff=lfs merge=lfs -text
346
+ examples/case2/video_input/render_0047.png filter=lfs diff=lfs merge=lfs -text
347
+ examples/case2/video_input/render_0048.png filter=lfs diff=lfs merge=lfs -text
348
+ examples/case3/condition.mp4 filter=lfs diff=lfs merge=lfs -text
349
+ examples/case3/ref_depth.exr filter=lfs diff=lfs merge=lfs -text
350
+ examples/case3/ref_image.png filter=lfs diff=lfs merge=lfs -text
351
+ examples/case3/video_input/depth_0000.exr filter=lfs diff=lfs merge=lfs -text
352
+ examples/case3/video_input/depth_0001.exr filter=lfs diff=lfs merge=lfs -text
353
+ examples/case3/video_input/depth_0002.exr filter=lfs diff=lfs merge=lfs -text
354
+ examples/case3/video_input/depth_0003.exr filter=lfs diff=lfs merge=lfs -text
355
+ examples/case3/video_input/depth_0004.exr filter=lfs diff=lfs merge=lfs -text
356
+ examples/case3/video_input/depth_0005.exr filter=lfs diff=lfs merge=lfs -text
357
+ examples/case3/video_input/depth_0006.exr filter=lfs diff=lfs merge=lfs -text
358
+ examples/case3/video_input/depth_0007.exr filter=lfs diff=lfs merge=lfs -text
359
+ examples/case3/video_input/depth_0008.exr filter=lfs diff=lfs merge=lfs -text
360
+ examples/case3/video_input/depth_0009.exr filter=lfs diff=lfs merge=lfs -text
361
+ examples/case3/video_input/depth_0010.exr filter=lfs diff=lfs merge=lfs -text
362
+ examples/case3/video_input/depth_0011.exr filter=lfs diff=lfs merge=lfs -text
363
+ examples/case3/video_input/depth_0012.exr filter=lfs diff=lfs merge=lfs -text
364
+ examples/case3/video_input/depth_0013.exr filter=lfs diff=lfs merge=lfs -text
365
+ examples/case3/video_input/depth_0014.exr filter=lfs diff=lfs merge=lfs -text
366
+ examples/case3/video_input/depth_0015.exr filter=lfs diff=lfs merge=lfs -text
367
+ examples/case3/video_input/depth_0016.exr filter=lfs diff=lfs merge=lfs -text
368
+ examples/case3/video_input/depth_0017.exr filter=lfs diff=lfs merge=lfs -text
369
+ examples/case3/video_input/depth_0018.exr filter=lfs diff=lfs merge=lfs -text
370
+ examples/case3/video_input/depth_0019.exr filter=lfs diff=lfs merge=lfs -text
371
+ examples/case3/video_input/depth_0020.exr filter=lfs diff=lfs merge=lfs -text
372
+ examples/case3/video_input/depth_0021.exr filter=lfs diff=lfs merge=lfs -text
373
+ examples/case3/video_input/depth_0022.exr filter=lfs diff=lfs merge=lfs -text
374
+ examples/case3/video_input/depth_0023.exr filter=lfs diff=lfs merge=lfs -text
375
+ examples/case3/video_input/depth_0024.exr filter=lfs diff=lfs merge=lfs -text
376
+ examples/case3/video_input/depth_0025.exr filter=lfs diff=lfs merge=lfs -text
377
+ examples/case3/video_input/depth_0026.exr filter=lfs diff=lfs merge=lfs -text
378
+ examples/case3/video_input/depth_0027.exr filter=lfs diff=lfs merge=lfs -text
379
+ examples/case3/video_input/depth_0028.exr filter=lfs diff=lfs merge=lfs -text
380
+ examples/case3/video_input/depth_0029.exr filter=lfs diff=lfs merge=lfs -text
381
+ examples/case3/video_input/depth_0030.exr filter=lfs diff=lfs merge=lfs -text
382
+ examples/case3/video_input/depth_0031.exr filter=lfs diff=lfs merge=lfs -text
383
+ examples/case3/video_input/depth_0032.exr filter=lfs diff=lfs merge=lfs -text
384
+ examples/case3/video_input/depth_0033.exr filter=lfs diff=lfs merge=lfs -text
385
+ examples/case3/video_input/depth_0034.exr filter=lfs diff=lfs merge=lfs -text
386
+ examples/case3/video_input/depth_0035.exr filter=lfs diff=lfs merge=lfs -text
387
+ examples/case3/video_input/depth_0036.exr filter=lfs diff=lfs merge=lfs -text
388
+ examples/case3/video_input/depth_0037.exr filter=lfs diff=lfs merge=lfs -text
389
+ examples/case3/video_input/depth_0038.exr filter=lfs diff=lfs merge=lfs -text
390
+ examples/case3/video_input/depth_0039.exr filter=lfs diff=lfs merge=lfs -text
391
+ examples/case3/video_input/depth_0040.exr filter=lfs diff=lfs merge=lfs -text
392
+ examples/case3/video_input/depth_0041.exr filter=lfs diff=lfs merge=lfs -text
393
+ examples/case3/video_input/depth_0042.exr filter=lfs diff=lfs merge=lfs -text
394
+ examples/case3/video_input/depth_0043.exr filter=lfs diff=lfs merge=lfs -text
395
+ examples/case3/video_input/depth_0044.exr filter=lfs diff=lfs merge=lfs -text
396
+ examples/case3/video_input/depth_0045.exr filter=lfs diff=lfs merge=lfs -text
397
+ examples/case3/video_input/depth_0046.exr filter=lfs diff=lfs merge=lfs -text
398
+ examples/case3/video_input/depth_0047.exr filter=lfs diff=lfs merge=lfs -text
399
+ examples/case3/video_input/depth_0048.exr filter=lfs diff=lfs merge=lfs -text
400
+ examples/case3/video_input/render_0000.png filter=lfs diff=lfs merge=lfs -text
401
+ examples/case3/video_input/render_0001.png filter=lfs diff=lfs merge=lfs -text
402
+ examples/case3/video_input/render_0002.png filter=lfs diff=lfs merge=lfs -text
403
+ examples/case3/video_input/render_0003.png filter=lfs diff=lfs merge=lfs -text
404
+ examples/case3/video_input/render_0004.png filter=lfs diff=lfs merge=lfs -text
405
+ examples/case3/video_input/render_0005.png filter=lfs diff=lfs merge=lfs -text
406
+ examples/case3/video_input/render_0006.png filter=lfs diff=lfs merge=lfs -text
407
+ examples/case3/video_input/render_0007.png filter=lfs diff=lfs merge=lfs -text
408
+ examples/case3/video_input/render_0008.png filter=lfs diff=lfs merge=lfs -text
409
+ examples/case3/video_input/render_0009.png filter=lfs diff=lfs merge=lfs -text
410
+ examples/case3/video_input/render_0010.png filter=lfs diff=lfs merge=lfs -text
411
+ examples/case3/video_input/render_0011.png filter=lfs diff=lfs merge=lfs -text
412
+ examples/case3/video_input/render_0012.png filter=lfs diff=lfs merge=lfs -text
413
+ examples/case3/video_input/render_0013.png filter=lfs diff=lfs merge=lfs -text
414
+ examples/case3/video_input/render_0014.png filter=lfs diff=lfs merge=lfs -text
415
+ examples/case3/video_input/render_0015.png filter=lfs diff=lfs merge=lfs -text
416
+ examples/case3/video_input/render_0016.png filter=lfs diff=lfs merge=lfs -text
417
+ examples/case3/video_input/render_0017.png filter=lfs diff=lfs merge=lfs -text
418
+ examples/case3/video_input/render_0018.png filter=lfs diff=lfs merge=lfs -text
419
+ examples/case3/video_input/render_0019.png filter=lfs diff=lfs merge=lfs -text
420
+ examples/case3/video_input/render_0020.png filter=lfs diff=lfs merge=lfs -text
421
+ examples/case3/video_input/render_0021.png filter=lfs diff=lfs merge=lfs -text
422
+ examples/case3/video_input/render_0022.png filter=lfs diff=lfs merge=lfs -text
423
+ examples/case3/video_input/render_0023.png filter=lfs diff=lfs merge=lfs -text
424
+ examples/case3/video_input/render_0024.png filter=lfs diff=lfs merge=lfs -text
425
+ examples/case3/video_input/render_0025.png filter=lfs diff=lfs merge=lfs -text
426
+ examples/case3/video_input/render_0026.png filter=lfs diff=lfs merge=lfs -text
427
+ examples/case3/video_input/render_0027.png filter=lfs diff=lfs merge=lfs -text
428
+ examples/case3/video_input/render_0028.png filter=lfs diff=lfs merge=lfs -text
429
+ examples/case3/video_input/render_0029.png filter=lfs diff=lfs merge=lfs -text
430
+ examples/case3/video_input/render_0030.png filter=lfs diff=lfs merge=lfs -text
431
+ examples/case3/video_input/render_0031.png filter=lfs diff=lfs merge=lfs -text
432
+ examples/case3/video_input/render_0032.png filter=lfs diff=lfs merge=lfs -text
433
+ examples/case3/video_input/render_0033.png filter=lfs diff=lfs merge=lfs -text
434
+ examples/case3/video_input/render_0034.png filter=lfs diff=lfs merge=lfs -text
435
+ examples/case3/video_input/render_0035.png filter=lfs diff=lfs merge=lfs -text
436
+ examples/case3/video_input/render_0036.png filter=lfs diff=lfs merge=lfs -text
437
+ examples/case3/video_input/render_0037.png filter=lfs diff=lfs merge=lfs -text
438
+ examples/case3/video_input/render_0038.png filter=lfs diff=lfs merge=lfs -text
439
+ examples/case3/video_input/render_0039.png filter=lfs diff=lfs merge=lfs -text
440
+ examples/case3/video_input/render_0040.png filter=lfs diff=lfs merge=lfs -text
441
+ examples/case3/video_input/render_0041.png filter=lfs diff=lfs merge=lfs -text
442
+ examples/case3/video_input/render_0042.png filter=lfs diff=lfs merge=lfs -text
443
+ examples/case3/video_input/render_0043.png filter=lfs diff=lfs merge=lfs -text
444
+ examples/case3/video_input/render_0044.png filter=lfs diff=lfs merge=lfs -text
445
+ examples/case3/video_input/render_0045.png filter=lfs diff=lfs merge=lfs -text
446
+ examples/case3/video_input/render_0046.png filter=lfs diff=lfs merge=lfs -text
447
+ examples/case3/video_input/render_0047.png filter=lfs diff=lfs merge=lfs -text
448
+ examples/case3/video_input/render_0048.png filter=lfs diff=lfs merge=lfs -text
449
+ examples/case4/condition.mp4 filter=lfs diff=lfs merge=lfs -text
450
+ examples/case4/ref_depth.exr filter=lfs diff=lfs merge=lfs -text
451
+ examples/case4/ref_image.png filter=lfs diff=lfs merge=lfs -text
452
+ examples/case4/video_input/depth_0000.exr filter=lfs diff=lfs merge=lfs -text
453
+ examples/case4/video_input/depth_0001.exr filter=lfs diff=lfs merge=lfs -text
454
+ examples/case4/video_input/depth_0002.exr filter=lfs diff=lfs merge=lfs -text
455
+ examples/case4/video_input/depth_0003.exr filter=lfs diff=lfs merge=lfs -text
456
+ examples/case4/video_input/depth_0004.exr filter=lfs diff=lfs merge=lfs -text
457
+ examples/case4/video_input/depth_0005.exr filter=lfs diff=lfs merge=lfs -text
458
+ examples/case4/video_input/depth_0006.exr filter=lfs diff=lfs merge=lfs -text
459
+ examples/case4/video_input/depth_0007.exr filter=lfs diff=lfs merge=lfs -text
460
+ examples/case4/video_input/depth_0008.exr filter=lfs diff=lfs merge=lfs -text
461
+ examples/case4/video_input/depth_0009.exr filter=lfs diff=lfs merge=lfs -text
462
+ examples/case4/video_input/depth_0010.exr filter=lfs diff=lfs merge=lfs -text
463
+ examples/case4/video_input/depth_0011.exr filter=lfs diff=lfs merge=lfs -text
464
+ examples/case4/video_input/depth_0012.exr filter=lfs diff=lfs merge=lfs -text
465
+ examples/case4/video_input/depth_0013.exr filter=lfs diff=lfs merge=lfs -text
466
+ examples/case4/video_input/depth_0014.exr filter=lfs diff=lfs merge=lfs -text
467
+ examples/case4/video_input/depth_0015.exr filter=lfs diff=lfs merge=lfs -text
468
+ examples/case4/video_input/depth_0016.exr filter=lfs diff=lfs merge=lfs -text
469
+ examples/case4/video_input/depth_0017.exr filter=lfs diff=lfs merge=lfs -text
470
+ examples/case4/video_input/depth_0018.exr filter=lfs diff=lfs merge=lfs -text
471
+ examples/case4/video_input/depth_0019.exr filter=lfs diff=lfs merge=lfs -text
472
+ examples/case4/video_input/depth_0020.exr filter=lfs diff=lfs merge=lfs -text
473
+ examples/case4/video_input/depth_0021.exr filter=lfs diff=lfs merge=lfs -text
474
+ examples/case4/video_input/depth_0022.exr filter=lfs diff=lfs merge=lfs -text
475
+ examples/case4/video_input/depth_0023.exr filter=lfs diff=lfs merge=lfs -text
476
+ examples/case4/video_input/depth_0024.exr filter=lfs diff=lfs merge=lfs -text
477
+ examples/case4/video_input/depth_0025.exr filter=lfs diff=lfs merge=lfs -text
478
+ examples/case4/video_input/depth_0026.exr filter=lfs diff=lfs merge=lfs -text
479
+ examples/case4/video_input/depth_0027.exr filter=lfs diff=lfs merge=lfs -text
480
+ examples/case4/video_input/depth_0028.exr filter=lfs diff=lfs merge=lfs -text
481
+ examples/case4/video_input/depth_0029.exr filter=lfs diff=lfs merge=lfs -text
482
+ examples/case4/video_input/depth_0030.exr filter=lfs diff=lfs merge=lfs -text
483
+ examples/case4/video_input/depth_0031.exr filter=lfs diff=lfs merge=lfs -text
484
+ examples/case4/video_input/depth_0032.exr filter=lfs diff=lfs merge=lfs -text
485
+ examples/case4/video_input/depth_0033.exr filter=lfs diff=lfs merge=lfs -text
486
+ examples/case4/video_input/depth_0034.exr filter=lfs diff=lfs merge=lfs -text
487
+ examples/case4/video_input/depth_0035.exr filter=lfs diff=lfs merge=lfs -text
488
+ examples/case4/video_input/depth_0036.exr filter=lfs diff=lfs merge=lfs -text
489
+ examples/case4/video_input/depth_0037.exr filter=lfs diff=lfs merge=lfs -text
490
+ examples/case4/video_input/depth_0038.exr filter=lfs diff=lfs merge=lfs -text
491
+ examples/case4/video_input/depth_0039.exr filter=lfs diff=lfs merge=lfs -text
492
+ examples/case4/video_input/depth_0040.exr filter=lfs diff=lfs merge=lfs -text
493
+ examples/case4/video_input/depth_0041.exr filter=lfs diff=lfs merge=lfs -text
494
+ examples/case4/video_input/depth_0042.exr filter=lfs diff=lfs merge=lfs -text
495
+ examples/case4/video_input/depth_0043.exr filter=lfs diff=lfs merge=lfs -text
496
+ examples/case4/video_input/depth_0044.exr filter=lfs diff=lfs merge=lfs -text
497
+ examples/case4/video_input/depth_0045.exr filter=lfs diff=lfs merge=lfs -text
498
+ examples/case4/video_input/depth_0046.exr filter=lfs diff=lfs merge=lfs -text
499
+ examples/case4/video_input/depth_0047.exr filter=lfs diff=lfs merge=lfs -text
500
+ examples/case4/video_input/depth_0048.exr filter=lfs diff=lfs merge=lfs -text
501
+ examples/case4/video_input/render_0000.png filter=lfs diff=lfs merge=lfs -text
502
+ examples/case4/video_input/render_0001.png filter=lfs diff=lfs merge=lfs -text
503
+ examples/case4/video_input/render_0002.png filter=lfs diff=lfs merge=lfs -text
504
+ examples/case4/video_input/render_0003.png filter=lfs diff=lfs merge=lfs -text
505
+ examples/case4/video_input/render_0004.png filter=lfs diff=lfs merge=lfs -text
506
+ examples/case4/video_input/render_0005.png filter=lfs diff=lfs merge=lfs -text
507
+ examples/case4/video_input/render_0006.png filter=lfs diff=lfs merge=lfs -text
508
+ examples/case4/video_input/render_0007.png filter=lfs diff=lfs merge=lfs -text
509
+ examples/case4/video_input/render_0008.png filter=lfs diff=lfs merge=lfs -text
510
+ examples/case4/video_input/render_0009.png filter=lfs diff=lfs merge=lfs -text
511
+ examples/case4/video_input/render_0010.png filter=lfs diff=lfs merge=lfs -text
512
+ examples/case4/video_input/render_0011.png filter=lfs diff=lfs merge=lfs -text
513
+ examples/case4/video_input/render_0012.png filter=lfs diff=lfs merge=lfs -text
514
+ examples/case4/video_input/render_0013.png filter=lfs diff=lfs merge=lfs -text
515
+ examples/case4/video_input/render_0014.png filter=lfs diff=lfs merge=lfs -text
516
+ examples/case4/video_input/render_0015.png filter=lfs diff=lfs merge=lfs -text
517
+ examples/case4/video_input/render_0016.png filter=lfs diff=lfs merge=lfs -text
518
+ examples/case4/video_input/render_0017.png filter=lfs diff=lfs merge=lfs -text
519
+ examples/case4/video_input/render_0018.png filter=lfs diff=lfs merge=lfs -text
520
+ examples/case4/video_input/render_0019.png filter=lfs diff=lfs merge=lfs -text
521
+ examples/case4/video_input/render_0020.png filter=lfs diff=lfs merge=lfs -text
522
+ examples/case4/video_input/render_0021.png filter=lfs diff=lfs merge=lfs -text
523
+ examples/case4/video_input/render_0022.png filter=lfs diff=lfs merge=lfs -text
524
+ examples/case4/video_input/render_0023.png filter=lfs diff=lfs merge=lfs -text
525
+ examples/case4/video_input/render_0024.png filter=lfs diff=lfs merge=lfs -text
526
+ examples/case4/video_input/render_0025.png filter=lfs diff=lfs merge=lfs -text
527
+ examples/case4/video_input/render_0026.png filter=lfs diff=lfs merge=lfs -text
528
+ examples/case4/video_input/render_0027.png filter=lfs diff=lfs merge=lfs -text
529
+ examples/case4/video_input/render_0028.png filter=lfs diff=lfs merge=lfs -text
530
+ examples/case4/video_input/render_0029.png filter=lfs diff=lfs merge=lfs -text
531
+ examples/case4/video_input/render_0030.png filter=lfs diff=lfs merge=lfs -text
532
+ examples/case4/video_input/render_0031.png filter=lfs diff=lfs merge=lfs -text
533
+ examples/case4/video_input/render_0032.png filter=lfs diff=lfs merge=lfs -text
534
+ examples/case4/video_input/render_0033.png filter=lfs diff=lfs merge=lfs -text
535
+ examples/case4/video_input/render_0034.png filter=lfs diff=lfs merge=lfs -text
536
+ examples/case4/video_input/render_0035.png filter=lfs diff=lfs merge=lfs -text
537
+ examples/case4/video_input/render_0036.png filter=lfs diff=lfs merge=lfs -text
538
+ examples/case4/video_input/render_0037.png filter=lfs diff=lfs merge=lfs -text
539
+ examples/case4/video_input/render_0038.png filter=lfs diff=lfs merge=lfs -text
540
+ examples/case4/video_input/render_0039.png filter=lfs diff=lfs merge=lfs -text
541
+ examples/case4/video_input/render_0040.png filter=lfs diff=lfs merge=lfs -text
542
+ examples/case4/video_input/render_0041.png filter=lfs diff=lfs merge=lfs -text
543
+ examples/case4/video_input/render_0042.png filter=lfs diff=lfs merge=lfs -text
544
+ examples/case4/video_input/render_0043.png filter=lfs diff=lfs merge=lfs -text
545
+ examples/case4/video_input/render_0044.png filter=lfs diff=lfs merge=lfs -text
546
+ examples/case4/video_input/render_0045.png filter=lfs diff=lfs merge=lfs -text
547
+ examples/case4/video_input/render_0046.png filter=lfs diff=lfs merge=lfs -text
548
+ examples/case4/video_input/render_0047.png filter=lfs diff=lfs merge=lfs -text
549
+ examples/case4/video_input/render_0048.png filter=lfs diff=lfs merge=lfs -text
550
+ examples/case5/condition.mp4 filter=lfs diff=lfs merge=lfs -text
551
+ examples/case5/ref_depth.exr filter=lfs diff=lfs merge=lfs -text
552
+ examples/case5/ref_image.png filter=lfs diff=lfs merge=lfs -text
553
+ examples/case5/video_input/depth_0000.exr filter=lfs diff=lfs merge=lfs -text
554
+ examples/case5/video_input/depth_0001.exr filter=lfs diff=lfs merge=lfs -text
555
+ examples/case5/video_input/depth_0002.exr filter=lfs diff=lfs merge=lfs -text
556
+ examples/case5/video_input/depth_0003.exr filter=lfs diff=lfs merge=lfs -text
557
+ examples/case5/video_input/depth_0004.exr filter=lfs diff=lfs merge=lfs -text
558
+ examples/case5/video_input/depth_0005.exr filter=lfs diff=lfs merge=lfs -text
559
+ examples/case5/video_input/depth_0006.exr filter=lfs diff=lfs merge=lfs -text
560
+ examples/case5/video_input/depth_0007.exr filter=lfs diff=lfs merge=lfs -text
561
+ examples/case5/video_input/depth_0008.exr filter=lfs diff=lfs merge=lfs -text
562
+ examples/case5/video_input/depth_0009.exr filter=lfs diff=lfs merge=lfs -text
563
+ examples/case5/video_input/depth_0010.exr filter=lfs diff=lfs merge=lfs -text
564
+ examples/case5/video_input/depth_0011.exr filter=lfs diff=lfs merge=lfs -text
565
+ examples/case5/video_input/depth_0012.exr filter=lfs diff=lfs merge=lfs -text
566
+ examples/case5/video_input/depth_0013.exr filter=lfs diff=lfs merge=lfs -text
567
+ examples/case5/video_input/depth_0014.exr filter=lfs diff=lfs merge=lfs -text
568
+ examples/case5/video_input/depth_0015.exr filter=lfs diff=lfs merge=lfs -text
569
+ examples/case5/video_input/depth_0016.exr filter=lfs diff=lfs merge=lfs -text
570
+ examples/case5/video_input/depth_0017.exr filter=lfs diff=lfs merge=lfs -text
571
+ examples/case5/video_input/depth_0018.exr filter=lfs diff=lfs merge=lfs -text
572
+ examples/case5/video_input/depth_0019.exr filter=lfs diff=lfs merge=lfs -text
573
+ examples/case5/video_input/depth_0020.exr filter=lfs diff=lfs merge=lfs -text
574
+ examples/case5/video_input/depth_0021.exr filter=lfs diff=lfs merge=lfs -text
575
+ examples/case5/video_input/depth_0022.exr filter=lfs diff=lfs merge=lfs -text
576
+ examples/case5/video_input/depth_0023.exr filter=lfs diff=lfs merge=lfs -text
577
+ examples/case5/video_input/depth_0024.exr filter=lfs diff=lfs merge=lfs -text
578
+ examples/case5/video_input/depth_0025.exr filter=lfs diff=lfs merge=lfs -text
579
+ examples/case5/video_input/depth_0026.exr filter=lfs diff=lfs merge=lfs -text
580
+ examples/case5/video_input/depth_0027.exr filter=lfs diff=lfs merge=lfs -text
581
+ examples/case5/video_input/depth_0028.exr filter=lfs diff=lfs merge=lfs -text
582
+ examples/case5/video_input/depth_0029.exr filter=lfs diff=lfs merge=lfs -text
583
+ examples/case5/video_input/depth_0030.exr filter=lfs diff=lfs merge=lfs -text
584
+ examples/case5/video_input/depth_0031.exr filter=lfs diff=lfs merge=lfs -text
585
+ examples/case5/video_input/depth_0032.exr filter=lfs diff=lfs merge=lfs -text
586
+ examples/case5/video_input/depth_0033.exr filter=lfs diff=lfs merge=lfs -text
587
+ examples/case5/video_input/depth_0034.exr filter=lfs diff=lfs merge=lfs -text
588
+ examples/case5/video_input/depth_0035.exr filter=lfs diff=lfs merge=lfs -text
589
+ examples/case5/video_input/depth_0036.exr filter=lfs diff=lfs merge=lfs -text
590
+ examples/case5/video_input/depth_0037.exr filter=lfs diff=lfs merge=lfs -text
591
+ examples/case5/video_input/depth_0038.exr filter=lfs diff=lfs merge=lfs -text
592
+ examples/case5/video_input/depth_0039.exr filter=lfs diff=lfs merge=lfs -text
593
+ examples/case5/video_input/depth_0040.exr filter=lfs diff=lfs merge=lfs -text
594
+ examples/case5/video_input/depth_0041.exr filter=lfs diff=lfs merge=lfs -text
595
+ examples/case5/video_input/depth_0042.exr filter=lfs diff=lfs merge=lfs -text
596
+ examples/case5/video_input/depth_0043.exr filter=lfs diff=lfs merge=lfs -text
597
+ examples/case5/video_input/depth_0044.exr filter=lfs diff=lfs merge=lfs -text
598
+ examples/case5/video_input/depth_0045.exr filter=lfs diff=lfs merge=lfs -text
599
+ examples/case5/video_input/depth_0046.exr filter=lfs diff=lfs merge=lfs -text
600
+ examples/case5/video_input/depth_0047.exr filter=lfs diff=lfs merge=lfs -text
601
+ examples/case5/video_input/depth_0048.exr filter=lfs diff=lfs merge=lfs -text
602
+ examples/case5/video_input/render_0000.png filter=lfs diff=lfs merge=lfs -text
603
+ examples/case5/video_input/render_0001.png filter=lfs diff=lfs merge=lfs -text
604
+ examples/case5/video_input/render_0002.png filter=lfs diff=lfs merge=lfs -text
605
+ examples/case5/video_input/render_0003.png filter=lfs diff=lfs merge=lfs -text
606
+ examples/case5/video_input/render_0004.png filter=lfs diff=lfs merge=lfs -text
607
+ examples/case5/video_input/render_0005.png filter=lfs diff=lfs merge=lfs -text
608
+ examples/case5/video_input/render_0006.png filter=lfs diff=lfs merge=lfs -text
609
+ examples/case5/video_input/render_0007.png filter=lfs diff=lfs merge=lfs -text
610
+ examples/case5/video_input/render_0008.png filter=lfs diff=lfs merge=lfs -text
611
+ examples/case5/video_input/render_0009.png filter=lfs diff=lfs merge=lfs -text
612
+ examples/case5/video_input/render_0010.png filter=lfs diff=lfs merge=lfs -text
613
+ examples/case5/video_input/render_0011.png filter=lfs diff=lfs merge=lfs -text
614
+ examples/case5/video_input/render_0012.png filter=lfs diff=lfs merge=lfs -text
615
+ examples/case5/video_input/render_0013.png filter=lfs diff=lfs merge=lfs -text
616
+ examples/case5/video_input/render_0014.png filter=lfs diff=lfs merge=lfs -text
617
+ examples/case5/video_input/render_0015.png filter=lfs diff=lfs merge=lfs -text
618
+ examples/case5/video_input/render_0016.png filter=lfs diff=lfs merge=lfs -text
619
+ examples/case5/video_input/render_0017.png filter=lfs diff=lfs merge=lfs -text
620
+ examples/case5/video_input/render_0018.png filter=lfs diff=lfs merge=lfs -text
621
+ examples/case5/video_input/render_0019.png filter=lfs diff=lfs merge=lfs -text
622
+ examples/case5/video_input/render_0020.png filter=lfs diff=lfs merge=lfs -text
623
+ examples/case5/video_input/render_0021.png filter=lfs diff=lfs merge=lfs -text
624
+ examples/case5/video_input/render_0022.png filter=lfs diff=lfs merge=lfs -text
625
+ examples/case5/video_input/render_0023.png filter=lfs diff=lfs merge=lfs -text
626
+ examples/case5/video_input/render_0024.png filter=lfs diff=lfs merge=lfs -text
627
+ examples/case5/video_input/render_0025.png filter=lfs diff=lfs merge=lfs -text
628
+ examples/case5/video_input/render_0026.png filter=lfs diff=lfs merge=lfs -text
629
+ examples/case5/video_input/render_0027.png filter=lfs diff=lfs merge=lfs -text
630
+ examples/case5/video_input/render_0028.png filter=lfs diff=lfs merge=lfs -text
631
+ examples/case5/video_input/render_0029.png filter=lfs diff=lfs merge=lfs -text
632
+ examples/case5/video_input/render_0030.png filter=lfs diff=lfs merge=lfs -text
633
+ examples/case5/video_input/render_0031.png filter=lfs diff=lfs merge=lfs -text
634
+ examples/case5/video_input/render_0032.png filter=lfs diff=lfs merge=lfs -text
635
+ examples/case5/video_input/render_0033.png filter=lfs diff=lfs merge=lfs -text
636
+ examples/case5/video_input/render_0034.png filter=lfs diff=lfs merge=lfs -text
637
+ examples/case5/video_input/render_0035.png filter=lfs diff=lfs merge=lfs -text
638
+ examples/case5/video_input/render_0036.png filter=lfs diff=lfs merge=lfs -text
639
+ examples/case5/video_input/render_0037.png filter=lfs diff=lfs merge=lfs -text
640
+ examples/case5/video_input/render_0038.png filter=lfs diff=lfs merge=lfs -text
641
+ examples/case5/video_input/render_0039.png filter=lfs diff=lfs merge=lfs -text
642
+ examples/case5/video_input/render_0040.png filter=lfs diff=lfs merge=lfs -text
643
+ examples/case5/video_input/render_0041.png filter=lfs diff=lfs merge=lfs -text
644
+ examples/case5/video_input/render_0042.png filter=lfs diff=lfs merge=lfs -text
645
+ examples/case5/video_input/render_0043.png filter=lfs diff=lfs merge=lfs -text
646
+ examples/case5/video_input/render_0044.png filter=lfs diff=lfs merge=lfs -text
647
+ examples/case5/video_input/render_0045.png filter=lfs diff=lfs merge=lfs -text
648
+ examples/case5/video_input/render_0046.png filter=lfs diff=lfs merge=lfs -text
649
+ examples/case5/video_input/render_0047.png filter=lfs diff=lfs merge=lfs -text
650
+ examples/case5/video_input/render_0048.png filter=lfs diff=lfs merge=lfs -text
651
+ examples/case6/condition.mp4 filter=lfs diff=lfs merge=lfs -text
652
+ examples/case6/ref_depth.exr filter=lfs diff=lfs merge=lfs -text
653
+ examples/case6/ref_image.png filter=lfs diff=lfs merge=lfs -text
654
+ examples/case6/video_input/depth_0000.exr filter=lfs diff=lfs merge=lfs -text
655
+ examples/case6/video_input/depth_0001.exr filter=lfs diff=lfs merge=lfs -text
656
+ examples/case6/video_input/depth_0002.exr filter=lfs diff=lfs merge=lfs -text
657
+ examples/case6/video_input/depth_0003.exr filter=lfs diff=lfs merge=lfs -text
658
+ examples/case6/video_input/depth_0004.exr filter=lfs diff=lfs merge=lfs -text
659
+ examples/case6/video_input/depth_0005.exr filter=lfs diff=lfs merge=lfs -text
660
+ examples/case6/video_input/depth_0006.exr filter=lfs diff=lfs merge=lfs -text
661
+ examples/case6/video_input/depth_0007.exr filter=lfs diff=lfs merge=lfs -text
662
+ examples/case6/video_input/depth_0008.exr filter=lfs diff=lfs merge=lfs -text
663
+ examples/case6/video_input/depth_0009.exr filter=lfs diff=lfs merge=lfs -text
664
+ examples/case6/video_input/depth_0010.exr filter=lfs diff=lfs merge=lfs -text
665
+ examples/case6/video_input/depth_0011.exr filter=lfs diff=lfs merge=lfs -text
666
+ examples/case6/video_input/depth_0012.exr filter=lfs diff=lfs merge=lfs -text
667
+ examples/case6/video_input/depth_0013.exr filter=lfs diff=lfs merge=lfs -text
668
+ examples/case6/video_input/depth_0014.exr filter=lfs diff=lfs merge=lfs -text
669
+ examples/case6/video_input/depth_0015.exr filter=lfs diff=lfs merge=lfs -text
670
+ examples/case6/video_input/depth_0016.exr filter=lfs diff=lfs merge=lfs -text
671
+ examples/case6/video_input/depth_0017.exr filter=lfs diff=lfs merge=lfs -text
672
+ examples/case6/video_input/depth_0018.exr filter=lfs diff=lfs merge=lfs -text
673
+ examples/case6/video_input/depth_0019.exr filter=lfs diff=lfs merge=lfs -text
674
+ examples/case6/video_input/depth_0020.exr filter=lfs diff=lfs merge=lfs -text
675
+ examples/case6/video_input/depth_0021.exr filter=lfs diff=lfs merge=lfs -text
676
+ examples/case6/video_input/depth_0022.exr filter=lfs diff=lfs merge=lfs -text
677
+ examples/case6/video_input/depth_0023.exr filter=lfs diff=lfs merge=lfs -text
678
+ examples/case6/video_input/depth_0024.exr filter=lfs diff=lfs merge=lfs -text
679
+ examples/case6/video_input/depth_0025.exr filter=lfs diff=lfs merge=lfs -text
680
+ examples/case6/video_input/depth_0026.exr filter=lfs diff=lfs merge=lfs -text
681
+ examples/case6/video_input/depth_0027.exr filter=lfs diff=lfs merge=lfs -text
682
+ examples/case6/video_input/depth_0028.exr filter=lfs diff=lfs merge=lfs -text
683
+ examples/case6/video_input/depth_0029.exr filter=lfs diff=lfs merge=lfs -text
684
+ examples/case6/video_input/depth_0030.exr filter=lfs diff=lfs merge=lfs -text
685
+ examples/case6/video_input/depth_0031.exr filter=lfs diff=lfs merge=lfs -text
686
+ examples/case6/video_input/depth_0032.exr filter=lfs diff=lfs merge=lfs -text
687
+ examples/case6/video_input/depth_0033.exr filter=lfs diff=lfs merge=lfs -text
688
+ examples/case6/video_input/depth_0034.exr filter=lfs diff=lfs merge=lfs -text
689
+ examples/case6/video_input/depth_0035.exr filter=lfs diff=lfs merge=lfs -text
690
+ examples/case6/video_input/depth_0036.exr filter=lfs diff=lfs merge=lfs -text
691
+ examples/case6/video_input/depth_0037.exr filter=lfs diff=lfs merge=lfs -text
692
+ examples/case6/video_input/depth_0038.exr filter=lfs diff=lfs merge=lfs -text
693
+ examples/case6/video_input/depth_0039.exr filter=lfs diff=lfs merge=lfs -text
694
+ examples/case6/video_input/depth_0040.exr filter=lfs diff=lfs merge=lfs -text
695
+ examples/case6/video_input/depth_0041.exr filter=lfs diff=lfs merge=lfs -text
696
+ examples/case6/video_input/depth_0042.exr filter=lfs diff=lfs merge=lfs -text
697
+ examples/case6/video_input/depth_0043.exr filter=lfs diff=lfs merge=lfs -text
698
+ examples/case6/video_input/depth_0044.exr filter=lfs diff=lfs merge=lfs -text
699
+ examples/case6/video_input/depth_0045.exr filter=lfs diff=lfs merge=lfs -text
700
+ examples/case6/video_input/depth_0046.exr filter=lfs diff=lfs merge=lfs -text
701
+ examples/case6/video_input/depth_0047.exr filter=lfs diff=lfs merge=lfs -text
702
+ examples/case6/video_input/depth_0048.exr filter=lfs diff=lfs merge=lfs -text
703
+ examples/case6/video_input/render_0000.png filter=lfs diff=lfs merge=lfs -text
704
+ examples/case6/video_input/render_0001.png filter=lfs diff=lfs merge=lfs -text
705
+ examples/case6/video_input/render_0002.png filter=lfs diff=lfs merge=lfs -text
706
+ examples/case6/video_input/render_0003.png filter=lfs diff=lfs merge=lfs -text
707
+ examples/case6/video_input/render_0004.png filter=lfs diff=lfs merge=lfs -text
708
+ examples/case6/video_input/render_0005.png filter=lfs diff=lfs merge=lfs -text
709
+ examples/case6/video_input/render_0006.png filter=lfs diff=lfs merge=lfs -text
710
+ examples/case6/video_input/render_0007.png filter=lfs diff=lfs merge=lfs -text
711
+ examples/case6/video_input/render_0008.png filter=lfs diff=lfs merge=lfs -text
712
+ examples/case6/video_input/render_0009.png filter=lfs diff=lfs merge=lfs -text
713
+ examples/case6/video_input/render_0010.png filter=lfs diff=lfs merge=lfs -text
714
+ examples/case6/video_input/render_0011.png filter=lfs diff=lfs merge=lfs -text
715
+ examples/case6/video_input/render_0012.png filter=lfs diff=lfs merge=lfs -text
716
+ examples/case6/video_input/render_0013.png filter=lfs diff=lfs merge=lfs -text
717
+ examples/case6/video_input/render_0014.png filter=lfs diff=lfs merge=lfs -text
718
+ examples/case6/video_input/render_0015.png filter=lfs diff=lfs merge=lfs -text
719
+ examples/case6/video_input/render_0016.png filter=lfs diff=lfs merge=lfs -text
720
+ examples/case6/video_input/render_0017.png filter=lfs diff=lfs merge=lfs -text
721
+ examples/case6/video_input/render_0018.png filter=lfs diff=lfs merge=lfs -text
722
+ examples/case6/video_input/render_0019.png filter=lfs diff=lfs merge=lfs -text
723
+ examples/case6/video_input/render_0020.png filter=lfs diff=lfs merge=lfs -text
724
+ examples/case6/video_input/render_0021.png filter=lfs diff=lfs merge=lfs -text
725
+ examples/case6/video_input/render_0022.png filter=lfs diff=lfs merge=lfs -text
726
+ examples/case6/video_input/render_0023.png filter=lfs diff=lfs merge=lfs -text
727
+ examples/case6/video_input/render_0024.png filter=lfs diff=lfs merge=lfs -text
728
+ examples/case6/video_input/render_0025.png filter=lfs diff=lfs merge=lfs -text
729
+ examples/case6/video_input/render_0026.png filter=lfs diff=lfs merge=lfs -text
730
+ examples/case6/video_input/render_0027.png filter=lfs diff=lfs merge=lfs -text
731
+ examples/case6/video_input/render_0028.png filter=lfs diff=lfs merge=lfs -text
732
+ examples/case6/video_input/render_0029.png filter=lfs diff=lfs merge=lfs -text
733
+ examples/case6/video_input/render_0030.png filter=lfs diff=lfs merge=lfs -text
734
+ examples/case6/video_input/render_0031.png filter=lfs diff=lfs merge=lfs -text
735
+ examples/case6/video_input/render_0032.png filter=lfs diff=lfs merge=lfs -text
736
+ examples/case6/video_input/render_0033.png filter=lfs diff=lfs merge=lfs -text
737
+ examples/case6/video_input/render_0034.png filter=lfs diff=lfs merge=lfs -text
738
+ examples/case6/video_input/render_0035.png filter=lfs diff=lfs merge=lfs -text
739
+ examples/case6/video_input/render_0036.png filter=lfs diff=lfs merge=lfs -text
740
+ examples/case6/video_input/render_0037.png filter=lfs diff=lfs merge=lfs -text
741
+ examples/case6/video_input/render_0038.png filter=lfs diff=lfs merge=lfs -text
742
+ examples/case6/video_input/render_0039.png filter=lfs diff=lfs merge=lfs -text
743
+ examples/case6/video_input/render_0040.png filter=lfs diff=lfs merge=lfs -text
744
+ examples/case6/video_input/render_0041.png filter=lfs diff=lfs merge=lfs -text
745
+ examples/case6/video_input/render_0042.png filter=lfs diff=lfs merge=lfs -text
746
+ examples/case6/video_input/render_0043.png filter=lfs diff=lfs merge=lfs -text
747
+ examples/case6/video_input/render_0044.png filter=lfs diff=lfs merge=lfs -text
748
+ examples/case6/video_input/render_0045.png filter=lfs diff=lfs merge=lfs -text
749
+ examples/case6/video_input/render_0046.png filter=lfs diff=lfs merge=lfs -text
750
+ examples/case6/video_input/render_0047.png filter=lfs diff=lfs merge=lfs -text
751
+ examples/case6/video_input/render_0048.png filter=lfs diff=lfs merge=lfs -text
752
+ examples/case7/condition.mp4 filter=lfs diff=lfs merge=lfs -text
753
+ examples/case7/ref_depth.exr filter=lfs diff=lfs merge=lfs -text
754
+ examples/case7/ref_image.png filter=lfs diff=lfs merge=lfs -text
755
+ examples/case7/video_input/depth_0000.exr filter=lfs diff=lfs merge=lfs -text
756
+ examples/case7/video_input/depth_0001.exr filter=lfs diff=lfs merge=lfs -text
757
+ examples/case7/video_input/depth_0002.exr filter=lfs diff=lfs merge=lfs -text
758
+ examples/case7/video_input/depth_0003.exr filter=lfs diff=lfs merge=lfs -text
759
+ examples/case7/video_input/depth_0004.exr filter=lfs diff=lfs merge=lfs -text
760
+ examples/case7/video_input/depth_0005.exr filter=lfs diff=lfs merge=lfs -text
761
+ examples/case7/video_input/depth_0006.exr filter=lfs diff=lfs merge=lfs -text
762
+ examples/case7/video_input/depth_0007.exr filter=lfs diff=lfs merge=lfs -text
763
+ examples/case7/video_input/depth_0008.exr filter=lfs diff=lfs merge=lfs -text
764
+ examples/case7/video_input/depth_0009.exr filter=lfs diff=lfs merge=lfs -text
765
+ examples/case7/video_input/depth_0010.exr filter=lfs diff=lfs merge=lfs -text
766
+ examples/case7/video_input/depth_0011.exr filter=lfs diff=lfs merge=lfs -text
767
+ examples/case7/video_input/depth_0012.exr filter=lfs diff=lfs merge=lfs -text
768
+ examples/case7/video_input/depth_0013.exr filter=lfs diff=lfs merge=lfs -text
769
+ examples/case7/video_input/depth_0014.exr filter=lfs diff=lfs merge=lfs -text
770
+ examples/case7/video_input/depth_0015.exr filter=lfs diff=lfs merge=lfs -text
771
+ examples/case7/video_input/depth_0016.exr filter=lfs diff=lfs merge=lfs -text
772
+ examples/case7/video_input/depth_0017.exr filter=lfs diff=lfs merge=lfs -text
773
+ examples/case7/video_input/depth_0018.exr filter=lfs diff=lfs merge=lfs -text
774
+ examples/case7/video_input/depth_0019.exr filter=lfs diff=lfs merge=lfs -text
775
+ examples/case7/video_input/depth_0020.exr filter=lfs diff=lfs merge=lfs -text
776
+ examples/case7/video_input/depth_0021.exr filter=lfs diff=lfs merge=lfs -text
777
+ examples/case7/video_input/depth_0022.exr filter=lfs diff=lfs merge=lfs -text
778
+ examples/case7/video_input/depth_0023.exr filter=lfs diff=lfs merge=lfs -text
779
+ examples/case7/video_input/depth_0024.exr filter=lfs diff=lfs merge=lfs -text
780
+ examples/case7/video_input/depth_0025.exr filter=lfs diff=lfs merge=lfs -text
781
+ examples/case7/video_input/depth_0026.exr filter=lfs diff=lfs merge=lfs -text
782
+ examples/case7/video_input/depth_0027.exr filter=lfs diff=lfs merge=lfs -text
783
+ examples/case7/video_input/depth_0028.exr filter=lfs diff=lfs merge=lfs -text
784
+ examples/case7/video_input/depth_0029.exr filter=lfs diff=lfs merge=lfs -text
785
+ examples/case7/video_input/depth_0030.exr filter=lfs diff=lfs merge=lfs -text
786
+ examples/case7/video_input/depth_0031.exr filter=lfs diff=lfs merge=lfs -text
787
+ examples/case7/video_input/depth_0032.exr filter=lfs diff=lfs merge=lfs -text
788
+ examples/case7/video_input/depth_0033.exr filter=lfs diff=lfs merge=lfs -text
789
+ examples/case7/video_input/depth_0034.exr filter=lfs diff=lfs merge=lfs -text
790
+ examples/case7/video_input/depth_0035.exr filter=lfs diff=lfs merge=lfs -text
791
+ examples/case7/video_input/depth_0036.exr filter=lfs diff=lfs merge=lfs -text
792
+ examples/case7/video_input/depth_0037.exr filter=lfs diff=lfs merge=lfs -text
793
+ examples/case7/video_input/depth_0038.exr filter=lfs diff=lfs merge=lfs -text
794
+ examples/case7/video_input/depth_0039.exr filter=lfs diff=lfs merge=lfs -text
795
+ examples/case7/video_input/depth_0040.exr filter=lfs diff=lfs merge=lfs -text
796
+ examples/case7/video_input/depth_0041.exr filter=lfs diff=lfs merge=lfs -text
797
+ examples/case7/video_input/depth_0042.exr filter=lfs diff=lfs merge=lfs -text
798
+ examples/case7/video_input/depth_0043.exr filter=lfs diff=lfs merge=lfs -text
799
+ examples/case7/video_input/depth_0044.exr filter=lfs diff=lfs merge=lfs -text
800
+ examples/case7/video_input/depth_0045.exr filter=lfs diff=lfs merge=lfs -text
801
+ examples/case7/video_input/depth_0046.exr filter=lfs diff=lfs merge=lfs -text
802
+ examples/case7/video_input/depth_0047.exr filter=lfs diff=lfs merge=lfs -text
803
+ examples/case7/video_input/depth_0048.exr filter=lfs diff=lfs merge=lfs -text
804
+ examples/case7/video_input/render_0000.png filter=lfs diff=lfs merge=lfs -text
805
+ examples/case7/video_input/render_0001.png filter=lfs diff=lfs merge=lfs -text
806
+ examples/case7/video_input/render_0002.png filter=lfs diff=lfs merge=lfs -text
807
+ examples/case7/video_input/render_0003.png filter=lfs diff=lfs merge=lfs -text
808
+ examples/case7/video_input/render_0004.png filter=lfs diff=lfs merge=lfs -text
809
+ examples/case7/video_input/render_0005.png filter=lfs diff=lfs merge=lfs -text
810
+ examples/case7/video_input/render_0006.png filter=lfs diff=lfs merge=lfs -text
811
+ examples/case7/video_input/render_0007.png filter=lfs diff=lfs merge=lfs -text
812
+ examples/case7/video_input/render_0008.png filter=lfs diff=lfs merge=lfs -text
813
+ examples/case7/video_input/render_0009.png filter=lfs diff=lfs merge=lfs -text
814
+ examples/case7/video_input/render_0010.png filter=lfs diff=lfs merge=lfs -text
815
+ examples/case7/video_input/render_0011.png filter=lfs diff=lfs merge=lfs -text
816
+ examples/case7/video_input/render_0012.png filter=lfs diff=lfs merge=lfs -text
817
+ examples/case7/video_input/render_0013.png filter=lfs diff=lfs merge=lfs -text
818
+ examples/case7/video_input/render_0014.png filter=lfs diff=lfs merge=lfs -text
819
+ examples/case7/video_input/render_0015.png filter=lfs diff=lfs merge=lfs -text
820
+ examples/case7/video_input/render_0016.png filter=lfs diff=lfs merge=lfs -text
821
+ examples/case7/video_input/render_0017.png filter=lfs diff=lfs merge=lfs -text
822
+ examples/case7/video_input/render_0018.png filter=lfs diff=lfs merge=lfs -text
823
+ examples/case7/video_input/render_0019.png filter=lfs diff=lfs merge=lfs -text
824
+ examples/case7/video_input/render_0020.png filter=lfs diff=lfs merge=lfs -text
825
+ examples/case7/video_input/render_0021.png filter=lfs diff=lfs merge=lfs -text
826
+ examples/case7/video_input/render_0022.png filter=lfs diff=lfs merge=lfs -text
827
+ examples/case7/video_input/render_0023.png filter=lfs diff=lfs merge=lfs -text
828
+ examples/case7/video_input/render_0024.png filter=lfs diff=lfs merge=lfs -text
829
+ examples/case7/video_input/render_0025.png filter=lfs diff=lfs merge=lfs -text
830
+ examples/case7/video_input/render_0026.png filter=lfs diff=lfs merge=lfs -text
831
+ examples/case7/video_input/render_0027.png filter=lfs diff=lfs merge=lfs -text
832
+ examples/case7/video_input/render_0028.png filter=lfs diff=lfs merge=lfs -text
833
+ examples/case7/video_input/render_0029.png filter=lfs diff=lfs merge=lfs -text
834
+ examples/case7/video_input/render_0030.png filter=lfs diff=lfs merge=lfs -text
835
+ examples/case7/video_input/render_0031.png filter=lfs diff=lfs merge=lfs -text
836
+ examples/case7/video_input/render_0032.png filter=lfs diff=lfs merge=lfs -text
837
+ examples/case7/video_input/render_0033.png filter=lfs diff=lfs merge=lfs -text
838
+ examples/case7/video_input/render_0034.png filter=lfs diff=lfs merge=lfs -text
839
+ examples/case7/video_input/render_0035.png filter=lfs diff=lfs merge=lfs -text
840
+ examples/case7/video_input/render_0036.png filter=lfs diff=lfs merge=lfs -text
841
+ examples/case7/video_input/render_0037.png filter=lfs diff=lfs merge=lfs -text
842
+ examples/case7/video_input/render_0038.png filter=lfs diff=lfs merge=lfs -text
843
+ examples/case7/video_input/render_0039.png filter=lfs diff=lfs merge=lfs -text
844
+ examples/case7/video_input/render_0040.png filter=lfs diff=lfs merge=lfs -text
845
+ examples/case7/video_input/render_0041.png filter=lfs diff=lfs merge=lfs -text
846
+ examples/case7/video_input/render_0042.png filter=lfs diff=lfs merge=lfs -text
847
+ examples/case7/video_input/render_0043.png filter=lfs diff=lfs merge=lfs -text
848
+ examples/case7/video_input/render_0044.png filter=lfs diff=lfs merge=lfs -text
849
+ examples/case7/video_input/render_0045.png filter=lfs diff=lfs merge=lfs -text
850
+ examples/case7/video_input/render_0046.png filter=lfs diff=lfs merge=lfs -text
851
+ examples/case7/video_input/render_0047.png filter=lfs diff=lfs merge=lfs -text
852
+ examples/case7/video_input/render_0048.png filter=lfs diff=lfs merge=lfs -text
853
+ examples/case8/condition.mp4 filter=lfs diff=lfs merge=lfs -text
854
+ examples/case8/ref_depth.exr filter=lfs diff=lfs merge=lfs -text
855
+ examples/case8/ref_image.png filter=lfs diff=lfs merge=lfs -text
856
+ examples/case8/video_input/depth_0000.exr filter=lfs diff=lfs merge=lfs -text
857
+ examples/case8/video_input/depth_0001.exr filter=lfs diff=lfs merge=lfs -text
858
+ examples/case8/video_input/depth_0002.exr filter=lfs diff=lfs merge=lfs -text
859
+ examples/case8/video_input/depth_0003.exr filter=lfs diff=lfs merge=lfs -text
860
+ examples/case8/video_input/depth_0004.exr filter=lfs diff=lfs merge=lfs -text
861
+ examples/case8/video_input/depth_0005.exr filter=lfs diff=lfs merge=lfs -text
862
+ examples/case8/video_input/depth_0006.exr filter=lfs diff=lfs merge=lfs -text
863
+ examples/case8/video_input/depth_0007.exr filter=lfs diff=lfs merge=lfs -text
864
+ examples/case8/video_input/depth_0008.exr filter=lfs diff=lfs merge=lfs -text
865
+ examples/case8/video_input/depth_0009.exr filter=lfs diff=lfs merge=lfs -text
866
+ examples/case8/video_input/depth_0010.exr filter=lfs diff=lfs merge=lfs -text
867
+ examples/case8/video_input/depth_0011.exr filter=lfs diff=lfs merge=lfs -text
868
+ examples/case8/video_input/depth_0012.exr filter=lfs diff=lfs merge=lfs -text
869
+ examples/case8/video_input/depth_0013.exr filter=lfs diff=lfs merge=lfs -text
870
+ examples/case8/video_input/depth_0014.exr filter=lfs diff=lfs merge=lfs -text
871
+ examples/case8/video_input/depth_0015.exr filter=lfs diff=lfs merge=lfs -text
872
+ examples/case8/video_input/depth_0016.exr filter=lfs diff=lfs merge=lfs -text
873
+ examples/case8/video_input/depth_0017.exr filter=lfs diff=lfs merge=lfs -text
874
+ examples/case8/video_input/depth_0018.exr filter=lfs diff=lfs merge=lfs -text
875
+ examples/case8/video_input/depth_0019.exr filter=lfs diff=lfs merge=lfs -text
876
+ examples/case8/video_input/depth_0020.exr filter=lfs diff=lfs merge=lfs -text
877
+ examples/case8/video_input/depth_0021.exr filter=lfs diff=lfs merge=lfs -text
878
+ examples/case8/video_input/depth_0022.exr filter=lfs diff=lfs merge=lfs -text
879
+ examples/case8/video_input/depth_0023.exr filter=lfs diff=lfs merge=lfs -text
880
+ examples/case8/video_input/depth_0024.exr filter=lfs diff=lfs merge=lfs -text
881
+ examples/case8/video_input/depth_0025.exr filter=lfs diff=lfs merge=lfs -text
882
+ examples/case8/video_input/depth_0026.exr filter=lfs diff=lfs merge=lfs -text
883
+ examples/case8/video_input/depth_0027.exr filter=lfs diff=lfs merge=lfs -text
884
+ examples/case8/video_input/depth_0028.exr filter=lfs diff=lfs merge=lfs -text
885
+ examples/case8/video_input/depth_0029.exr filter=lfs diff=lfs merge=lfs -text
886
+ examples/case8/video_input/depth_0030.exr filter=lfs diff=lfs merge=lfs -text
887
+ examples/case8/video_input/depth_0031.exr filter=lfs diff=lfs merge=lfs -text
888
+ examples/case8/video_input/depth_0032.exr filter=lfs diff=lfs merge=lfs -text
889
+ examples/case8/video_input/depth_0033.exr filter=lfs diff=lfs merge=lfs -text
890
+ examples/case8/video_input/depth_0034.exr filter=lfs diff=lfs merge=lfs -text
891
+ examples/case8/video_input/depth_0035.exr filter=lfs diff=lfs merge=lfs -text
892
+ examples/case8/video_input/depth_0036.exr filter=lfs diff=lfs merge=lfs -text
893
+ examples/case8/video_input/depth_0037.exr filter=lfs diff=lfs merge=lfs -text
894
+ examples/case8/video_input/depth_0038.exr filter=lfs diff=lfs merge=lfs -text
895
+ examples/case8/video_input/depth_0039.exr filter=lfs diff=lfs merge=lfs -text
896
+ examples/case8/video_input/depth_0040.exr filter=lfs diff=lfs merge=lfs -text
897
+ examples/case8/video_input/depth_0041.exr filter=lfs diff=lfs merge=lfs -text
898
+ examples/case8/video_input/depth_0042.exr filter=lfs diff=lfs merge=lfs -text
899
+ examples/case8/video_input/depth_0043.exr filter=lfs diff=lfs merge=lfs -text
900
+ examples/case8/video_input/depth_0044.exr filter=lfs diff=lfs merge=lfs -text
901
+ examples/case8/video_input/depth_0045.exr filter=lfs diff=lfs merge=lfs -text
902
+ examples/case8/video_input/depth_0046.exr filter=lfs diff=lfs merge=lfs -text
903
+ examples/case8/video_input/depth_0047.exr filter=lfs diff=lfs merge=lfs -text
904
+ examples/case8/video_input/depth_0048.exr filter=lfs diff=lfs merge=lfs -text
905
+ examples/case8/video_input/render_0000.png filter=lfs diff=lfs merge=lfs -text
906
+ examples/case8/video_input/render_0001.png filter=lfs diff=lfs merge=lfs -text
907
+ examples/case8/video_input/render_0002.png filter=lfs diff=lfs merge=lfs -text
908
+ examples/case8/video_input/render_0003.png filter=lfs diff=lfs merge=lfs -text
909
+ examples/case8/video_input/render_0004.png filter=lfs diff=lfs merge=lfs -text
910
+ examples/case8/video_input/render_0005.png filter=lfs diff=lfs merge=lfs -text
911
+ examples/case8/video_input/render_0006.png filter=lfs diff=lfs merge=lfs -text
912
+ examples/case8/video_input/render_0007.png filter=lfs diff=lfs merge=lfs -text
913
+ examples/case8/video_input/render_0008.png filter=lfs diff=lfs merge=lfs -text
914
+ examples/case8/video_input/render_0009.png filter=lfs diff=lfs merge=lfs -text
915
+ examples/case8/video_input/render_0010.png filter=lfs diff=lfs merge=lfs -text
916
+ examples/case8/video_input/render_0011.png filter=lfs diff=lfs merge=lfs -text
917
+ examples/case8/video_input/render_0012.png filter=lfs diff=lfs merge=lfs -text
918
+ examples/case8/video_input/render_0013.png filter=lfs diff=lfs merge=lfs -text
919
+ examples/case8/video_input/render_0014.png filter=lfs diff=lfs merge=lfs -text
920
+ examples/case8/video_input/render_0015.png filter=lfs diff=lfs merge=lfs -text
921
+ examples/case8/video_input/render_0016.png filter=lfs diff=lfs merge=lfs -text
922
+ examples/case8/video_input/render_0017.png filter=lfs diff=lfs merge=lfs -text
923
+ examples/case8/video_input/render_0018.png filter=lfs diff=lfs merge=lfs -text
924
+ examples/case8/video_input/render_0019.png filter=lfs diff=lfs merge=lfs -text
925
+ examples/case8/video_input/render_0020.png filter=lfs diff=lfs merge=lfs -text
926
+ examples/case8/video_input/render_0021.png filter=lfs diff=lfs merge=lfs -text
927
+ examples/case8/video_input/render_0022.png filter=lfs diff=lfs merge=lfs -text
928
+ examples/case8/video_input/render_0023.png filter=lfs diff=lfs merge=lfs -text
929
+ examples/case8/video_input/render_0024.png filter=lfs diff=lfs merge=lfs -text
930
+ examples/case8/video_input/render_0025.png filter=lfs diff=lfs merge=lfs -text
931
+ examples/case8/video_input/render_0026.png filter=lfs diff=lfs merge=lfs -text
932
+ examples/case8/video_input/render_0027.png filter=lfs diff=lfs merge=lfs -text
933
+ examples/case8/video_input/render_0028.png filter=lfs diff=lfs merge=lfs -text
934
+ examples/case8/video_input/render_0029.png filter=lfs diff=lfs merge=lfs -text
935
+ examples/case8/video_input/render_0030.png filter=lfs diff=lfs merge=lfs -text
936
+ examples/case8/video_input/render_0031.png filter=lfs diff=lfs merge=lfs -text
937
+ examples/case8/video_input/render_0032.png filter=lfs diff=lfs merge=lfs -text
938
+ examples/case8/video_input/render_0033.png filter=lfs diff=lfs merge=lfs -text
939
+ examples/case8/video_input/render_0034.png filter=lfs diff=lfs merge=lfs -text
940
+ examples/case8/video_input/render_0035.png filter=lfs diff=lfs merge=lfs -text
941
+ examples/case8/video_input/render_0036.png filter=lfs diff=lfs merge=lfs -text
942
+ examples/case8/video_input/render_0037.png filter=lfs diff=lfs merge=lfs -text
943
+ examples/case8/video_input/render_0038.png filter=lfs diff=lfs merge=lfs -text
944
+ examples/case8/video_input/render_0039.png filter=lfs diff=lfs merge=lfs -text
945
+ examples/case8/video_input/render_0040.png filter=lfs diff=lfs merge=lfs -text
946
+ examples/case8/video_input/render_0041.png filter=lfs diff=lfs merge=lfs -text
947
+ examples/case8/video_input/render_0042.png filter=lfs diff=lfs merge=lfs -text
948
+ examples/case8/video_input/render_0043.png filter=lfs diff=lfs merge=lfs -text
949
+ examples/case8/video_input/render_0044.png filter=lfs diff=lfs merge=lfs -text
950
+ examples/case8/video_input/render_0045.png filter=lfs diff=lfs merge=lfs -text
951
+ examples/case8/video_input/render_0046.png filter=lfs diff=lfs merge=lfs -text
952
+ examples/case8/video_input/render_0047.png filter=lfs diff=lfs merge=lfs -text
953
+ examples/case8/video_input/render_0048.png filter=lfs diff=lfs merge=lfs -text
954
+ examples/case9/condition.mp4 filter=lfs diff=lfs merge=lfs -text
955
+ examples/case9/ref_depth.exr filter=lfs diff=lfs merge=lfs -text
956
+ examples/case9/ref_image.png filter=lfs diff=lfs merge=lfs -text
957
+ examples/case9/video_input/depth_0000.exr filter=lfs diff=lfs merge=lfs -text
958
+ examples/case9/video_input/depth_0001.exr filter=lfs diff=lfs merge=lfs -text
959
+ examples/case9/video_input/depth_0002.exr filter=lfs diff=lfs merge=lfs -text
960
+ examples/case9/video_input/depth_0003.exr filter=lfs diff=lfs merge=lfs -text
961
+ examples/case9/video_input/depth_0004.exr filter=lfs diff=lfs merge=lfs -text
962
+ examples/case9/video_input/depth_0005.exr filter=lfs diff=lfs merge=lfs -text
963
+ examples/case9/video_input/depth_0006.exr filter=lfs diff=lfs merge=lfs -text
964
+ examples/case9/video_input/depth_0007.exr filter=lfs diff=lfs merge=lfs -text
965
+ examples/case9/video_input/depth_0008.exr filter=lfs diff=lfs merge=lfs -text
966
+ examples/case9/video_input/depth_0009.exr filter=lfs diff=lfs merge=lfs -text
967
+ examples/case9/video_input/depth_0010.exr filter=lfs diff=lfs merge=lfs -text
968
+ examples/case9/video_input/depth_0011.exr filter=lfs diff=lfs merge=lfs -text
969
+ examples/case9/video_input/depth_0012.exr filter=lfs diff=lfs merge=lfs -text
970
+ examples/case9/video_input/depth_0013.exr filter=lfs diff=lfs merge=lfs -text
971
+ examples/case9/video_input/depth_0014.exr filter=lfs diff=lfs merge=lfs -text
972
+ examples/case9/video_input/depth_0015.exr filter=lfs diff=lfs merge=lfs -text
973
+ examples/case9/video_input/depth_0016.exr filter=lfs diff=lfs merge=lfs -text
974
+ examples/case9/video_input/depth_0017.exr filter=lfs diff=lfs merge=lfs -text
975
+ examples/case9/video_input/depth_0018.exr filter=lfs diff=lfs merge=lfs -text
976
+ examples/case9/video_input/depth_0019.exr filter=lfs diff=lfs merge=lfs -text
977
+ examples/case9/video_input/depth_0020.exr filter=lfs diff=lfs merge=lfs -text
978
+ examples/case9/video_input/depth_0021.exr filter=lfs diff=lfs merge=lfs -text
979
+ examples/case9/video_input/depth_0022.exr filter=lfs diff=lfs merge=lfs -text
980
+ examples/case9/video_input/depth_0023.exr filter=lfs diff=lfs merge=lfs -text
981
+ examples/case9/video_input/depth_0024.exr filter=lfs diff=lfs merge=lfs -text
982
+ examples/case9/video_input/depth_0025.exr filter=lfs diff=lfs merge=lfs -text
983
+ examples/case9/video_input/depth_0026.exr filter=lfs diff=lfs merge=lfs -text
984
+ examples/case9/video_input/depth_0027.exr filter=lfs diff=lfs merge=lfs -text
985
+ examples/case9/video_input/depth_0028.exr filter=lfs diff=lfs merge=lfs -text
986
+ examples/case9/video_input/depth_0029.exr filter=lfs diff=lfs merge=lfs -text
987
+ examples/case9/video_input/depth_0030.exr filter=lfs diff=lfs merge=lfs -text
988
+ examples/case9/video_input/depth_0031.exr filter=lfs diff=lfs merge=lfs -text
989
+ examples/case9/video_input/depth_0032.exr filter=lfs diff=lfs merge=lfs -text
990
+ examples/case9/video_input/depth_0033.exr filter=lfs diff=lfs merge=lfs -text
991
+ examples/case9/video_input/depth_0034.exr filter=lfs diff=lfs merge=lfs -text
992
+ examples/case9/video_input/depth_0035.exr filter=lfs diff=lfs merge=lfs -text
993
+ examples/case9/video_input/depth_0036.exr filter=lfs diff=lfs merge=lfs -text
994
+ examples/case9/video_input/depth_0037.exr filter=lfs diff=lfs merge=lfs -text
995
+ examples/case9/video_input/depth_0038.exr filter=lfs diff=lfs merge=lfs -text
996
+ examples/case9/video_input/depth_0039.exr filter=lfs diff=lfs merge=lfs -text
997
+ examples/case9/video_input/depth_0040.exr filter=lfs diff=lfs merge=lfs -text
998
+ examples/case9/video_input/depth_0041.exr filter=lfs diff=lfs merge=lfs -text
999
+ examples/case9/video_input/depth_0042.exr filter=lfs diff=lfs merge=lfs -text
1000
+ examples/case9/video_input/depth_0043.exr filter=lfs diff=lfs merge=lfs -text
1001
+ examples/case9/video_input/depth_0044.exr filter=lfs diff=lfs merge=lfs -text
1002
+ examples/case9/video_input/depth_0045.exr filter=lfs diff=lfs merge=lfs -text
1003
+ examples/case9/video_input/depth_0046.exr filter=lfs diff=lfs merge=lfs -text
1004
+ examples/case9/video_input/depth_0047.exr filter=lfs diff=lfs merge=lfs -text
1005
+ examples/case9/video_input/depth_0048.exr filter=lfs diff=lfs merge=lfs -text
1006
+ examples/case9/video_input/render_0000.png filter=lfs diff=lfs merge=lfs -text
1007
+ examples/case9/video_input/render_0001.png filter=lfs diff=lfs merge=lfs -text
1008
+ examples/case9/video_input/render_0002.png filter=lfs diff=lfs merge=lfs -text
1009
+ examples/case9/video_input/render_0003.png filter=lfs diff=lfs merge=lfs -text
1010
+ examples/case9/video_input/render_0004.png filter=lfs diff=lfs merge=lfs -text
1011
+ examples/case9/video_input/render_0005.png filter=lfs diff=lfs merge=lfs -text
1012
+ examples/case9/video_input/render_0006.png filter=lfs diff=lfs merge=lfs -text
1013
+ examples/case9/video_input/render_0007.png filter=lfs diff=lfs merge=lfs -text
1014
+ examples/case9/video_input/render_0008.png filter=lfs diff=lfs merge=lfs -text
1015
+ examples/case9/video_input/render_0009.png filter=lfs diff=lfs merge=lfs -text
1016
+ examples/case9/video_input/render_0010.png filter=lfs diff=lfs merge=lfs -text
1017
+ examples/case9/video_input/render_0011.png filter=lfs diff=lfs merge=lfs -text
1018
+ examples/case9/video_input/render_0012.png filter=lfs diff=lfs merge=lfs -text
1019
+ examples/case9/video_input/render_0013.png filter=lfs diff=lfs merge=lfs -text
1020
+ examples/case9/video_input/render_0014.png filter=lfs diff=lfs merge=lfs -text
1021
+ examples/case9/video_input/render_0015.png filter=lfs diff=lfs merge=lfs -text
1022
+ examples/case9/video_input/render_0016.png filter=lfs diff=lfs merge=lfs -text
1023
+ examples/case9/video_input/render_0017.png filter=lfs diff=lfs merge=lfs -text
1024
+ examples/case9/video_input/render_0018.png filter=lfs diff=lfs merge=lfs -text
1025
+ examples/case9/video_input/render_0019.png filter=lfs diff=lfs merge=lfs -text
1026
+ examples/case9/video_input/render_0020.png filter=lfs diff=lfs merge=lfs -text
1027
+ examples/case9/video_input/render_0021.png filter=lfs diff=lfs merge=lfs -text
1028
+ examples/case9/video_input/render_0022.png filter=lfs diff=lfs merge=lfs -text
1029
+ examples/case9/video_input/render_0023.png filter=lfs diff=lfs merge=lfs -text
1030
+ examples/case9/video_input/render_0024.png filter=lfs diff=lfs merge=lfs -text
1031
+ examples/case9/video_input/render_0025.png filter=lfs diff=lfs merge=lfs -text
1032
+ examples/case9/video_input/render_0026.png filter=lfs diff=lfs merge=lfs -text
1033
+ examples/case9/video_input/render_0027.png filter=lfs diff=lfs merge=lfs -text
1034
+ examples/case9/video_input/render_0028.png filter=lfs diff=lfs merge=lfs -text
1035
+ examples/case9/video_input/render_0029.png filter=lfs diff=lfs merge=lfs -text
1036
+ examples/case9/video_input/render_0030.png filter=lfs diff=lfs merge=lfs -text
1037
+ examples/case9/video_input/render_0031.png filter=lfs diff=lfs merge=lfs -text
1038
+ examples/case9/video_input/render_0032.png filter=lfs diff=lfs merge=lfs -text
1039
+ examples/case9/video_input/render_0033.png filter=lfs diff=lfs merge=lfs -text
1040
+ examples/case9/video_input/render_0034.png filter=lfs diff=lfs merge=lfs -text
1041
+ examples/case9/video_input/render_0035.png filter=lfs diff=lfs merge=lfs -text
1042
+ examples/case9/video_input/render_0036.png filter=lfs diff=lfs merge=lfs -text
1043
+ examples/case9/video_input/render_0037.png filter=lfs diff=lfs merge=lfs -text
1044
+ examples/case9/video_input/render_0038.png filter=lfs diff=lfs merge=lfs -text
1045
+ examples/case9/video_input/render_0039.png filter=lfs diff=lfs merge=lfs -text
1046
+ examples/case9/video_input/render_0040.png filter=lfs diff=lfs merge=lfs -text
1047
+ examples/case9/video_input/render_0041.png filter=lfs diff=lfs merge=lfs -text
1048
+ examples/case9/video_input/render_0042.png filter=lfs diff=lfs merge=lfs -text
1049
+ examples/case9/video_input/render_0043.png filter=lfs diff=lfs merge=lfs -text
1050
+ examples/case9/video_input/render_0044.png filter=lfs diff=lfs merge=lfs -text
1051
+ examples/case9/video_input/render_0045.png filter=lfs diff=lfs merge=lfs -text
1052
+ examples/case9/video_input/render_0046.png filter=lfs diff=lfs merge=lfs -text
1053
+ examples/case9/video_input/render_0047.png filter=lfs diff=lfs merge=lfs -text
1054
+ examples/case9/video_input/render_0048.png filter=lfs diff=lfs merge=lfs -text
.github/workflows/update_space.yml ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ name: Run Python script
2
+
3
+ on:
4
+ push:
5
+ branches:
6
+ - y
7
+
8
+ jobs:
9
+ build:
10
+ runs-on: ubuntu-latest
11
+
12
+ steps:
13
+ - name: Checkout
14
+ uses: actions/checkout@v2
15
+
16
+ - name: Set up Python
17
+ uses: actions/setup-python@v2
18
+ with:
19
+ python-version: '3.9'
20
+
21
+ - name: Install Gradio
22
+ run: python -m pip install gradio
23
+
24
+ - name: Log in to Hugging Face
25
+ run: python -c 'import huggingface_hub; huggingface_hub.login(token="${{ secrets.hf_token }}")'
26
+
27
+ - name: Deploy to Spaces
28
+ run: gradio deploy
.gitignore ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # Ignore Python bytecode files
2
+ __pycache__/
3
+ *.pyc
.gradio/certificate.pem ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ -----BEGIN CERTIFICATE-----
2
+ MIIFazCCA1OgAwIBAgIRAIIQz7DSQONZRGPgu2OCiwAwDQYJKoZIhvcNAQELBQAw
3
+ TzELMAkGA1UEBhMCVVMxKTAnBgNVBAoTIEludGVybmV0IFNlY3VyaXR5IFJlc2Vh
4
+ cmNoIEdyb3VwMRUwEwYDVQQDEwxJU1JHIFJvb3QgWDEwHhcNMTUwNjA0MTEwNDM4
5
+ WhcNMzUwNjA0MTEwNDM4WjBPMQswCQYDVQQGEwJVUzEpMCcGA1UEChMgSW50ZXJu
6
+ ZXQgU2VjdXJpdHkgUmVzZWFyY2ggR3JvdXAxFTATBgNVBAMTDElTUkcgUm9vdCBY
7
+ MTCCAiIwDQYJKoZIhvcNAQEBBQADggIPADCCAgoCggIBAK3oJHP0FDfzm54rVygc
8
+ h77ct984kIxuPOZXoHj3dcKi/vVqbvYATyjb3miGbESTtrFj/RQSa78f0uoxmyF+
9
+ 0TM8ukj13Xnfs7j/EvEhmkvBioZxaUpmZmyPfjxwv60pIgbz5MDmgK7iS4+3mX6U
10
+ A5/TR5d8mUgjU+g4rk8Kb4Mu0UlXjIB0ttov0DiNewNwIRt18jA8+o+u3dpjq+sW
11
+ T8KOEUt+zwvo/7V3LvSye0rgTBIlDHCNAymg4VMk7BPZ7hm/ELNKjD+Jo2FR3qyH
12
+ B5T0Y3HsLuJvW5iB4YlcNHlsdu87kGJ55tukmi8mxdAQ4Q7e2RCOFvu396j3x+UC
13
+ B5iPNgiV5+I3lg02dZ77DnKxHZu8A/lJBdiB3QW0KtZB6awBdpUKD9jf1b0SHzUv
14
+ KBds0pjBqAlkd25HN7rOrFleaJ1/ctaJxQZBKT5ZPt0m9STJEadao0xAH0ahmbWn
15
+ OlFuhjuefXKnEgV4We0+UXgVCwOPjdAvBbI+e0ocS3MFEvzG6uBQE3xDk3SzynTn
16
+ jh8BCNAw1FtxNrQHusEwMFxIt4I7mKZ9YIqioymCzLq9gwQbooMDQaHWBfEbwrbw
17
+ qHyGO0aoSCqI3Haadr8faqU9GY/rOPNk3sgrDQoo//fb4hVC1CLQJ13hef4Y53CI
18
+ rU7m2Ys6xt0nUW7/vGT1M0NPAgMBAAGjQjBAMA4GA1UdDwEB/wQEAwIBBjAPBgNV
19
+ HRMBAf8EBTADAQH/MB0GA1UdDgQWBBR5tFnme7bl5AFzgAiIyBpY9umbbjANBgkq
20
+ hkiG9w0BAQsFAAOCAgEAVR9YqbyyqFDQDLHYGmkgJykIrGF1XIpu+ILlaS/V9lZL
21
+ ubhzEFnTIZd+50xx+7LSYK05qAvqFyFWhfFQDlnrzuBZ6brJFe+GnY+EgPbk6ZGQ
22
+ 3BebYhtF8GaV0nxvwuo77x/Py9auJ/GpsMiu/X1+mvoiBOv/2X/qkSsisRcOj/KK
23
+ NFtY2PwByVS5uCbMiogziUwthDyC3+6WVwW6LLv3xLfHTjuCvjHIInNzktHCgKQ5
24
+ ORAzI4JMPJ+GslWYHb4phowim57iaztXOoJwTdwJx4nLCgdNbOhdjsnvzqvHu7Ur
25
+ TkXWStAmzOVyyghqpZXjFaH3pO3JLF+l+/+sKAIuvtd7u+Nxe5AW0wdeRlN8NwdC
26
+ jNPElpzVmbUq4JUagEiuTDkHzsxHpFKVK7q4+63SM1N95R1NbdWhscdCb+ZAJzVc
27
+ oyi3B43njTOQ5yOf+1CceWxG1bQVs5ZufpsMljq4Ui0/1lvh+wjChP4kqKOJ2qxq
28
+ 4RgqsahDYVvTH9w7jXbyLeiNdd8XM2w9U/t7y0Ff/9yi0GE44Za4rF2LN9d11TPA
29
+ mRGunUHBcnWEvgJBQl9nJEiU0Zsnvgc/ubhPgXRR4Xq37Z0j4r7g1SgEEzwxA57d
30
+ emyPxgcYxn/eR44/KJ4EBs+lVDR3veyJm+kXQ99b21/+jh5Xos1AnX5iItreGCc=
31
+ -----END CERTIFICATE-----
LICENSE ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ TENCENT HUNYUANWORLD-VOYAGER COMMUNITY LICENSE AGREEMENT
2
+ Tencent HunyuanWorld-Voyager Release Date: September 2, 2025
3
+ THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
4
+ By clicking to agree or by using, reproducing, modifying, distributing, performing or displaying any portion or element of the Tencent HunyuanWorld-Voyager Works, including via any Hosted Service, You will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
5
+ 1. DEFINITIONS.
6
+ a. “Acceptable Use Policy” shall mean the policy made available by Tencent as set forth in the Exhibit A.
7
+ b. “Agreement” shall mean the terms and conditions for use, reproduction, distribution, modification, performance and displaying of Tencent HunyuanWorld-Voyager Works or any portion or element thereof set forth herein.
8
+ c. “Documentation” shall mean the specifications, manuals and documentation for Tencent HunyuanWorld-Voyager made publicly available by Tencent.
9
+ d. “Hosted Service” shall mean a hosted service offered via an application programming interface (API), web access, or any other electronic or remote means.
10
+ e. “Licensee,” “You” or “Your” shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Tencent HunyuanWorld-Voyager Works for any purpose and in any field of use.
11
+ f. “Materials” shall mean, collectively, Tencent’s proprietary Tencent HunyuanWorld-Voyager and Documentation (and any portion thereof) as made available by Tencent under this Agreement.
12
+ g. “Model Derivatives” shall mean all: (i) modifications to Tencent HunyuanWorld-Voyager or any Model Derivative of Tencent HunyuanWorld-Voyager; (ii) works based on Tencent HunyuanWorld-Voyager or any Model Derivative of Tencent HunyuanWorld-Voyager; or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Tencent HunyuanWorld-Voyager or any Model Derivative of Tencent HunyuanWorld-Voyager, to that model in order to cause that model to perform similarly to Tencent HunyuanWorld-Voyager or a Model Derivative of Tencent HunyuanWorld-Voyager, including distillation methods, methods that use intermediate data representations, or methods based on the generation of synthetic data Outputs by Tencent HunyuanWorld-Voyager or a Model Derivative of Tencent HunyuanWorld-Voyager for training that model. For clarity, Outputs by themselves are not deemed Model Derivatives.
13
+ h. “Output” shall mean the information and/or content output of Tencent HunyuanWorld-Voyager or a Model Derivative that results from operating or otherwise using Tencent HunyuanWorld-Voyager or a Model Derivative, including via a Hosted Service.
14
+ i. “Tencent,” “We” or “Us” shall mean the applicable entity or entities in the Tencent corporate family that own(s) intellectual property or other rights embodied in or utilized by the Materials..
15
+ j. “Tencent HunyuanWorld-Voyager” shall mean the 3D generation models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing made publicly available by Us at [https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager].
16
+ k. “Tencent HunyuanWorld-Voyager Works” shall mean: (i) the Materials; (ii) Model Derivatives; and (iii) all derivative works thereof.
17
+ l. “Territory” shall mean the worldwide territory, excluding the territory of the European Union, United Kingdom and South Korea.
18
+ m. “Third Party” or “Third Parties” shall mean individuals or legal entities that are not under common control with Us or You.
19
+ n. “including” shall mean including but not limited to.
20
+ 2. GRANT OF RIGHTS.
21
+ We grant You, for the Territory only, a non-exclusive, non-transferable and royalty-free limited license under Tencent’s intellectual property or other rights owned by Us embodied in or utilized by the Materials to use, reproduce, distribute, create derivative works of (including Model Derivatives), and make modifications to the Materials, only in accordance with the terms of this Agreement and the Acceptable Use Policy, and You must not violate (or encourage or permit anyone else to violate) any term of this Agreement or the Acceptable Use Policy.
22
+ 3. DISTRIBUTION.
23
+ You may, subject to Your compliance with this Agreement, distribute or make available to Third Parties the Tencent HunyuanWorld-Voyager Works, exclusively in the Territory, provided that You meet all of the following conditions:
24
+ a. You must provide all such Third Party recipients of the Tencent HunyuanWorld-Voyager Works or products or services using them a copy of this Agreement;
25
+ b. You must cause any modified files to carry prominent notices stating that You changed the files;
26
+ c. You are encouraged to: (i) publish at least one technology introduction blogpost or one public statement expressing Your experience of using the Tencent HunyuanWorld-Voyager Works; and (ii) mark the products or services developed by using the Tencent HunyuanWorld-Voyager Works to indicate that the product/service is “Powered by Tencent Hunyuan”; and
27
+ d. All distributions to Third Parties (other than through a Hosted Service) must be accompanied by a “Notice” text file that contains the following notice: “Tencent HunyuanWorld-Voyager is licensed under the Tencent HunyuanWorld-Voyager Community License Agreement, Copyright © 2025 Tencent. All Rights Reserved. The trademark rights of “Tencent Hunyuan” are owned by Tencent or its affiliate.”
28
+ You may add Your own copyright statement to Your modifications and, except as set forth in this Section and in Section 5, may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Model Derivatives as a whole, provided Your use, reproduction, modification, distribution, performance and display of the work otherwise complies with the terms and conditions of this Agreement (including as regards the Territory). If You receive Tencent HunyuanWorld-Voyager Works from a Licensee as part of an integrated end user product, then this Section 3 of this Agreement will not apply to You.
29
+ 4. ADDITIONAL COMMERCIAL TERMS.
30
+ If, on the Tencent HunyuanWorld-Voyager version release date, the monthly active users of all products or services made available by or for Licensee is greater than 1 million monthly active users in the preceding calendar month, You must request a license from Tencent, which Tencent may grant to You in its sole discretion, and You are not authorized to exercise any of the rights under this Agreement unless or until Tencent otherwise expressly grants You such rights.
31
+ Subject to Tencent's written approval, you may request a license for the use of Tencent HunyuanWorld-Voyager by submitting the following information to [email protected]:
32
+ a. Your company’s name and associated business sector that plans to use Tencent HunyuanWorld-Voyager.
33
+ b. Your intended use case and the purpose of using Tencent HunyuanWorld-Voyager.
34
+ c. Your plans to modify Tencent HunyuanWorld-Voyager or create Model Derivatives.
35
+ 5. RULES OF USE.
36
+ a. Your use of the Tencent HunyuanWorld-Voyager Works must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Tencent HunyuanWorld-Voyager Works, which is hereby incorporated by reference into this Agreement. You must include the use restrictions referenced in these Sections 5(a) and 5(b) as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Tencent HunyuanWorld-Voyager Works and You must provide notice to subsequent users to whom You distribute that Tencent HunyuanWorld-Voyager Works are subject to the use restrictions in these Sections 5(a) and 5(b).
37
+ b. You must not use the Tencent HunyuanWorld-Voyager Works or any Output or results of the Tencent HunyuanWorld-Voyager Works to improve any other AI model (other than Tencent HunyuanWorld-Voyager or Model Derivatives thereof).
38
+ c. You must not use, reproduce, modify, distribute, or display the Tencent HunyuanWorld-Voyager Works, Output or results of the Tencent HunyuanWorld-Voyager Works outside the Territory. Any such use outside the Territory is unlicensed and unauthorized under this Agreement.
39
+ 6. INTELLECTUAL PROPERTY.
40
+ a. Subject to Tencent’s ownership of Tencent HunyuanWorld-Voyager Works made by or for Tencent and intellectual property rights therein, conditioned upon Your compliance with the terms and conditions of this Agreement, as between You and Tencent, You will be the owner of any derivative works and modifications of the Materials and any Model Derivatives that are made by or for You.
41
+ b. No trademark licenses are granted under this Agreement, and in connection with the Tencent HunyuanWorld-Voyager Works, Licensee may not use any name or mark owned by or associated with Tencent or any of its affiliates, except as required for reasonable and customary use in describing and distributing the Tencent HunyuanWorld-Voyager Works. Tencent hereby grants You a license to use “Tencent Hunyuan” (the “Mark”) in the Territory solely as required to comply with the provisions of Section 3(c), provided that You comply with any applicable laws related to trademark protection. All goodwill arising out of Your use of the Mark will inure to the benefit of Tencent.
42
+ c. If You commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any person or entity alleging that the Materials or any Output, or any portion of any of the foregoing, infringe any intellectual property or other right owned or licensable by You, then all licenses granted to You under this Agreement shall terminate as of the date such lawsuit or other proceeding is filed. You will defend, indemnify and hold harmless Us from and against any claim by any Third Party arising out of or related to Your or the Third Party’s use or distribution of the Tencent HunyuanWorld-Voyager Works.
43
+ d. Tencent claims no rights in Outputs You generate. You and Your users are solely responsible for Outputs and their subsequent uses.
44
+ 7. DISCLAIMERS OF WARRANTY AND LIMITATIONS OF LIABILITY.
45
+ a. We are not obligated to support, update, provide training for, or develop any further version of the Tencent HunyuanWorld-Voyager Works or to grant any license thereto.
46
+ b. UNLESS AND ONLY TO THE EXTENT REQUIRED BY APPLICABLE LAW, THE TENCENT HUNYUANWORLD-VOYAGER WORKS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED “AS IS” WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES OF ANY KIND INCLUDING ANY WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, COURSE OF DEALING, USAGE OF TRADE, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING, MODIFYING, PERFORMING, DISPLAYING OR DISTRIBUTING ANY OF THE TENCENT HUNYUANWORLD-VOYAGER WORKS OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR OR A THIRD PARTY’S USE OR DISTRIBUTION OF ANY OF THE TENCENT HUNYUANWORLD-VOYAGER WORKS OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND PERMISSIONS UNDER THIS AGREEMENT.
47
+ c. TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL TENCENT OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, FOR ANY DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL OR PUNITIVE DAMAGES, OR LOST PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO ANY OF THE TENCENT HUNYUANWORLD-VOYAGER WORKS OR OUTPUTS, EVEN IF TENCENT OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
48
+ 8. SURVIVAL AND TERMINATION.
49
+ a. The term of this Agreement shall commence upon Your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
50
+ b. We may terminate this Agreement if You breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, You must promptly delete and cease use of the Tencent HunyuanWorld-Voyager Works. Sections 6(a), 6(c), 7 and 9 shall survive the termination of this Agreement.
51
+ 9. GOVERNING LAW AND JURISDICTION.
52
+ a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of the Hong Kong Special Administrative Region of the People’s Republic of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
53
+ b. Exclusive jurisdiction and venue for any dispute arising out of or relating to this Agreement will be a court of competent jurisdiction in the Hong Kong Special Administrative Region of the People’s Republic of China, and Tencent and Licensee consent to the exclusive jurisdiction of such court with respect to any such dispute.
54
+
55
+ EXHIBIT A
56
+ ACCEPTABLE USE POLICY
57
+
58
+ Tencent reserves the right to update this Acceptable Use Policy from time to time.
59
+ Last modified: November 5, 2024
60
+
61
+ Tencent endeavors to promote safe and fair use of its tools and features, including Tencent HunyuanWorld-Voyager. You agree not to use Tencent HunyuanWorld-Voyager or Model Derivatives:
62
+ 1. Outside the Territory;
63
+ 2. In any way that violates any applicable national, federal, state, local, international or any other law or regulation;
64
+ 3. To harm Yourself or others;
65
+ 4. To repurpose or distribute output from Tencent HunyuanWorld-Voyager or any Model Derivatives to harm Yourself or others;
66
+ 5. To override or circumvent the safety guardrails and safeguards We have put in place;
67
+ 6. For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
68
+ 7. To generate or disseminate verifiably false information and/or content with the purpose of harming others or influencing elections;
69
+ 8. To generate or facilitate false online engagement, including fake reviews and other means of fake online engagement;
70
+ 9. To intentionally defame, disparage or otherwise harass others;
71
+ 10. To generate and/or disseminate malware (including ransomware) or any other content to be used for the purpose of harming electronic systems;
72
+ 11. To generate or disseminate personal identifiable information with the purpose of harming others;
73
+ 12. To generate or disseminate information (including images, code, posts, articles), and place the information in any public context (including –through the use of bot generated tweets), without expressly and conspicuously identifying that the information and/or content is machine generated;
74
+ 13. To impersonate another individual without consent, authorization, or legal right;
75
+ 14. To make high-stakes automated decisions in domains that affect an individual’s safety, rights or wellbeing (e.g., law enforcement, migration, medicine/health, management of critical infrastructure, safety components of products, essential services, credit, employment, housing, education, social scoring, or insurance);
76
+ 15. In a manner that violates or disrespects the social ethics and moral standards of other countries or regions;
77
+ 16. To perform, facilitate, threaten, incite, plan, promote or encourage violent extremism or terrorism;
78
+ 17. For any use intended to discriminate against or harm individuals or groups based on protected characteristics or categories, online or offline social behavior or known or predicted personal or personality characteristics;
79
+ 18. To intentionally exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
80
+ 19. For military purposes;
81
+ 20. To engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or other professional practices.
NOTICE ADDED
@@ -0,0 +1,104 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Usage and Legal Notices:
2
+
3
+ Tencent is pleased to support the open source community by making Tencent HunyuanWorld-Voyager available.
4
+
5
+ Copyright (C) 2025 Tencent. All rights reserved. The below model in this distribution may have been modified by Tencent ("Tencent Modifications"). All Tencent Modifications are Copyright (C) Tencent.
6
+
7
+ Tencent HunyuanWorld-Voyager is licensed under TENCENT HUNYUANWORLD-VOYAGER COMMUNITY LICENSE AGREEMENT, which can be found in this repository called "LICENSE", except for the third-party components listed below. Tencent HunyuanWorld-Voyager does not impose any additional limitations beyond what is outlined in the respective licenses of these third-party components. Users must comply with all terms and conditions of original licenses of these third-party components and must ensure that the usage of the third party components adheres to all relevant laws and regulations.
8
+
9
+ For avoidance of doubts, Tencent HunyuanWorld-Voyager means the large language models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing made publicly available by Tencent in accordance with the TENCENT HUNYUANWORLD-VOYAGER COMMUNITY LICENSE AGREEMENT.
10
+
11
+
12
+ Other dependencies and licenses:
13
+
14
+
15
+ Open Source Software Licensed under the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT and Other Licenses of the Third-Party Components therein:
16
+ The below software in this distribution may have been modified by Tencent ("Tencent Modifications"). All Tencent Modifications are Copyright (C) 2025 Tencent.
17
+ --------------------------------------------------------------------
18
+ 1. HunyuanVideo-I2V
19
+ Copyright (C) 2025 THL A29 Limited, a Tencent company. All rights reserved.
20
+
21
+
22
+ Terms of the TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT:
23
+ --------------------------------------------------------------------
24
+ TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT
25
+ Tencent HunyuanVideo-I2V Release Date: March 5, 2025
26
+ THIS LICENSE AGREEMENT DOES NOT APPLY IN THE EUROPEAN UNION, UNITED KINGDOM AND SOUTH KOREA AND IS EXPRESSLY LIMITED TO THE TERRITORY, AS DEFINED BELOW.
27
+ By clicking to agree or by using, reproducing, modifying, distributing, performing or displaying any portion or element of the Tencent Hunyuan Works, including via any Hosted Service, You will be deemed to have recognized and accepted the content of this Agreement, which is effective immediately.
28
+ 1. DEFINITIONS.
29
+ a. “Acceptable Use Policy” shall mean the policy made available by Tencent as set forth in the Exhibit A.
30
+ b. “Agreement” shall mean the terms and conditions for use, reproduction, distribution, modification, performance and displaying of Tencent Hunyuan Works or any portion or element thereof set forth herein.
31
+ c. “Documentation” shall mean the specifications, manuals and documentation for Tencent Hunyuan made publicly available by Tencent.
32
+ d. “Hosted Service” shall mean a hosted service offered via an application programming interface (API), web access, or any other electronic or remote means.
33
+ e. “Licensee,” “You” or “Your” shall mean a natural person or legal entity exercising the rights granted by this Agreement and/or using the Tencent Hunyuan Works for any purpose and in any field of use.
34
+ f. “Materials” shall mean, collectively, Tencent’s proprietary Tencent Hunyuan and Documentation (and any portion thereof) as made available by Tencent under this Agreement.
35
+ g. “Model Derivatives” shall mean all: (i) modifications to Tencent Hunyuan or any Model Derivative of Tencent Hunyuan; (ii) works based on Tencent Hunyuan or any Model Derivative of Tencent Hunyuan; or (iii) any other machine learning model which is created by transfer of patterns of the weights, parameters, operations, or Output of Tencent Hunyuan or any Model Derivative of Tencent Hunyuan, to that model in order to cause that model to perform similarly to Tencent Hunyuan or a Model Derivative of Tencent Hunyuan, including distillation methods, methods that use intermediate data representations, or methods based on the generation of synthetic data Outputs by Tencent Hunyuan or a Model Derivative of Tencent Hunyuan for training that model. For clarity, Outputs by themselves are not deemed Model Derivatives.
36
+ h. “Output” shall mean the information and/or content output of Tencent Hunyuan or a Model Derivative that results from operating or otherwise using Tencent Hunyuan or a Model Derivative, including via a Hosted Service.
37
+ i. “Tencent,” “We” or “Us” shall mean THL A29 Limited.
38
+ j. “Tencent Hunyuan” shall mean the large language models, text/image/video/audio/3D generation models, and multimodal large language models and their software and algorithms, including trained model weights, parameters (including optimizer states), machine-learning model code, inference-enabling code, training-enabling code, fine-tuning enabling code and other elements of the foregoing made publicly available by Us, including, without limitation to, Tencent HunyuanVideo-I2V released at [ https://github.com/Tencent/HunyuanVideo-I2V ].
39
+ k. “Tencent Hunyuan Works” shall mean: (i) the Materials; (ii) Model Derivatives; and (iii) all derivative works thereof.
40
+ l. “Territory” shall mean the worldwide territory, excluding the territory of the European Union, United Kingdom and South Korea.
41
+ m. “Third Party” or “Third Parties” shall mean individuals or legal entities that are not under common control with Us or You.
42
+ n. “including” shall mean including but not limited to.
43
+ 2. GRANT OF RIGHTS.
44
+ We grant You, for the Territory only, a non-exclusive, non-transferable and royalty-free limited license under Tencent’s intellectual property or other rights owned by Us embodied in or utilized by the Materials to use, reproduce, distribute, create derivative works of (including Model Derivatives), and make modifications to the Materials, only in accordance with the terms of this Agreement and the Acceptable Use Policy, and You must not violate (or encourage or permit anyone else to violate) any term of this Agreement or the Acceptable Use Policy.
45
+ 3. DISTRIBUTION.
46
+ You may, subject to Your compliance with this Agreement, distribute or make available to Third Parties the Tencent Hunyuan Works, exclusively in the Territory, provided that You meet all of the following conditions:
47
+ a. You must provide all such Third Party recipients of the Tencent Hunyuan Works or products or services using them a copy of this Agreement;
48
+ b. You must cause any modified files to carry prominent notices stating that You changed the files;
49
+ c. You are encouraged to: (i) publish at least one technology introduction blogpost or one public statement expressing Your experience of using the Tencent Hunyuan Works; and (ii) mark the products or services developed by using the Tencent Hunyuan Works to indicate that the product/service is “Powered by Tencent Hunyuan”; and
50
+ d. All distributions to Third Parties (other than through a Hosted Service) must be accompanied by a “Notice” text file that contains the following notice: “Tencent Hunyuan is licensed under the Tencent Hunyuan Community License Agreement, Copyright © 2025 Tencent. All Rights Reserved. The trademark rights of “Tencent Hunyuan” are owned by Tencent or its affiliate.”
51
+ You may add Your own copyright statement to Your modifications and, except as set forth in this Section and in Section 5, may provide additional or different license terms and conditions for use, reproduction, or distribution of Your modifications, or for any such Model Derivatives as a whole, provided Your use, reproduction, modification, distribution, performance and display of the work otherwise complies with the terms and conditions of this Agreement (including as regards the Territory). If You receive Tencent Hunyuan Works from a Licensee as part of an integrated end user product, then this Section 3 of this Agreement will not apply to You.
52
+ 4. ADDITIONAL COMMERCIAL TERMS.
53
+ If, on the Tencent Hunyuan version release date, the monthly active users of all products or services made available by or for Licensee is greater than 100 million monthly active users in the preceding calendar month, You must request a license from Tencent, which Tencent may grant to You in its sole discretion, and You are not authorized to exercise any of the rights under this Agreement unless or until Tencent otherwise expressly grants You such rights.
54
+ 5. RULES OF USE.
55
+ a. Your use of the Tencent Hunyuan Works must comply with applicable laws and regulations (including trade compliance laws and regulations) and adhere to the Acceptable Use Policy for the Tencent Hunyuan Works, which is hereby incorporated by reference into this Agreement. You must include the use restrictions referenced in these Sections 5(a) and 5(b) as an enforceable provision in any agreement (e.g., license agreement, terms of use, etc.) governing the use and/or distribution of Tencent Hunyuan Works and You must provide notice to subsequent users to whom You distribute that Tencent Hunyuan Works are subject to the use restrictions in these Sections 5(a) and 5(b).
56
+ b. You must not use the Tencent Hunyuan Works or any Output or results of the Tencent Hunyuan Works to improve any other AI model (other than Tencent Hunyuan or Model Derivatives thereof).
57
+ c. You must not use, reproduce, modify, distribute, or display the Tencent Hunyuan Works, Output or results of the Tencent Hunyuan Works outside the Territory. Any such use outside the Territory is unlicensed and unauthorized under this Agreement.
58
+ 6. INTELLECTUAL PROPERTY.
59
+ a. Subject to Tencent’s ownership of Tencent Hunyuan Works made by or for Tencent and intellectual property rights therein, conditioned upon Your compliance with the terms and conditions of this Agreement, as between You and Tencent, You will be the owner of any derivative works and modifications of the Materials and any Model Derivatives that are made by or for You.
60
+ b. No trademark licenses are granted under this Agreement, and in connection with the Tencent Hunyuan Works, Licensee may not use any name or mark owned by or associated with Tencent or any of its affiliates, except as required for reasonable and customary use in describing and distributing the Tencent Hunyuan Works. Tencent hereby grants You a license to use “Tencent Hunyuan” (the “Mark”) in the Territory solely as required to comply with the provisions of Section 3(c), provided that You comply with any applicable laws related to trademark protection. All goodwill arising out of Your use of the Mark will inure to the benefit of Tencent.
61
+ c. If You commence a lawsuit or other proceedings (including a cross-claim or counterclaim in a lawsuit) against Us or any person or entity alleging that the Materials or any Output, or any portion of any of the foregoing, infringe any intellectual property or other right owned or licensable by You, then all licenses granted to You under this Agreement shall terminate as of the date such lawsuit or other proceeding is filed. You will defend, indemnify and hold harmless Us from and against any claim by any Third Party arising out of or related to Your or the Third Party’s use or distribution of the Tencent Hunyuan Works.
62
+ d. Tencent claims no rights in Outputs You generate. You and Your users are solely responsible for Outputs and their subsequent uses.
63
+ 7. DISCLAIMERS OF WARRANTY AND LIMITATIONS OF LIABILITY.
64
+ a. We are not obligated to support, update, provide training for, or develop any further version of the Tencent Hunyuan Works or to grant any license thereto.
65
+ b. UNLESS AND ONLY TO THE EXTENT REQUIRED BY APPLICABLE LAW, THE TENCENT HUNYUAN WORKS AND ANY OUTPUT AND RESULTS THEREFROM ARE PROVIDED “AS IS” WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES OF ANY KIND INCLUDING ANY WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, COURSE OF DEALING, USAGE OF TRADE, OR FITNESS FOR A PARTICULAR PURPOSE. YOU ARE SOLELY RESPONSIBLE FOR DETERMINING THE APPROPRIATENESS OF USING, REPRODUCING, MODIFYING, PERFORMING, DISPLAYING OR DISTRIBUTING ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS AND ASSUME ANY AND ALL RISKS ASSOCIATED WITH YOUR OR A THIRD PARTY’S USE OR DISTRIBUTION OF ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS AND YOUR EXERCISE OF RIGHTS AND PERMISSIONS UNDER THIS AGREEMENT.
66
+ c. TO THE FULLEST EXTENT PERMITTED BY APPLICABLE LAW, IN NO EVENT SHALL TENCENT OR ITS AFFILIATES BE LIABLE UNDER ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, TORT, NEGLIGENCE, PRODUCTS LIABILITY, OR OTHERWISE, FOR ANY DAMAGES, INCLUDING ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, EXEMPLARY, CONSEQUENTIAL OR PUNITIVE DAMAGES, OR LOST PROFITS OF ANY KIND ARISING FROM THIS AGREEMENT OR RELATED TO ANY OF THE TENCENT HUNYUAN WORKS OR OUTPUTS, EVEN IF TENCENT OR ITS AFFILIATES HAVE BEEN ADVISED OF THE POSSIBILITY OF ANY OF THE FOREGOING.
67
+ 8. SURVIVAL AND TERMINATION.
68
+ a. The term of this Agreement shall commence upon Your acceptance of this Agreement or access to the Materials and will continue in full force and effect until terminated in accordance with the terms and conditions herein.
69
+ b. We may terminate this Agreement if You breach any of the terms or conditions of this Agreement. Upon termination of this Agreement, You must promptly delete and cease use of the Tencent Hunyuan Works. Sections 6(a), 6(c), 7 and 9 shall survive the termination of this Agreement.
70
+ 9. GOVERNING LAW AND JURISDICTION.
71
+ a. This Agreement and any dispute arising out of or relating to it will be governed by the laws of the Hong Kong Special Administrative Region of the People’s Republic of China, without regard to conflict of law principles, and the UN Convention on Contracts for the International Sale of Goods does not apply to this Agreement.
72
+ b. Exclusive jurisdiction and venue for any dispute arising out of or relating to this Agreement will be a court of competent jurisdiction in the Hong Kong Special Administrative Region of the People’s Republic of China, and Tencent and Licensee consent to the exclusive jurisdiction of such court with respect to any such dispute.
73
+
74
+ EXHIBIT A
75
+ ACCEPTABLE USE POLICY
76
+
77
+ Tencent reserves the right to update this Acceptable Use Policy from time to time.
78
+ Last modified: November 5, 2024
79
+
80
+ Tencent endeavors to promote safe and fair use of its tools and features, including Tencent Hunyuan. You agree not to use Tencent Hunyuan or Model Derivatives:
81
+ 1. Outside the Territory;
82
+ 2. In any way that violates any applicable national, federal, state, local, international or any other law or regulation;
83
+ 3. To harm Yourself or others;
84
+ 4. To repurpose or distribute output from Tencent Hunyuan or any Model Derivatives to harm Yourself or others;
85
+ 5. To override or circumvent the safety guardrails and safeguards We have put in place;
86
+ 6. For the purpose of exploiting, harming or attempting to exploit or harm minors in any way;
87
+ 7. To generate or disseminate verifiably false information and/or content with the purpose of harming others or influencing elections;
88
+ 8. To generate or facilitate false online engagement, including fake reviews and other means of fake online engagement;
89
+ 9. To intentionally defame, disparage or otherwise harass others;
90
+ 10. To generate and/or disseminate malware (including ransomware) or any other content to be used for the purpose of harming electronic systems;
91
+ 11. To generate or disseminate personal identifiable information with the purpose of harming others;
92
+ 12. To generate or disseminate information (including images, code, posts, articles), and place the information in any public context (including –through the use of bot generated tweets), without expressly and conspicuously identifying that the information and/or content is machine generated;
93
+ 13. To impersonate another individual without consent, authorization, or legal right;
94
+ 14. To make high-stakes automated decisions in domains that affect an individual’s safety, rights or wellbeing (e.g., law enforcement, migration, medicine/health, management of critical infrastructure, safety components of products, essential services, credit, employment, housing, education, social scoring, or insurance);
95
+ 15. In a manner that violates or disrespects the social ethics and moral standards of other countries or regions;
96
+ 16. To perform, facilitate, threaten, incite, plan, promote or encourage violent extremism or terrorism;
97
+ 17. For any use intended to discriminate against or harm individuals or groups based on protected characteristics or categories, online or offline social behavior or known or predicted personal or personality characteristics;
98
+ 18. To intentionally exploit any of the vulnerabilities of a specific group of persons based on their age, social, physical or mental characteristics, in order to materially distort the behavior of a person pertaining to that group in a manner that causes or is likely to cause that person or another person physical or psychological harm;
99
+ 19. For military purposes;
100
+ 20. To engage in the unauthorized or unlicensed practice of any profession including, but not limited to, financial, legal, medical/health, or other professional practices.
101
+
102
+
103
+ For the license of other third party components, please refer to the following URL:
104
+ https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V/blob/main/Notice
README.md CHANGED
@@ -1,12 +1,402 @@
1
  ---
2
- title: Voyager
3
- emoji: 📚
4
- colorFrom: yellow
5
- colorTo: indigo
6
  sdk: gradio
7
  sdk_version: 5.49.1
8
- app_file: app.py
9
- pinned: false
10
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
  ---
2
+ title: voyager
3
+ app_file: app.py
 
 
4
  sdk: gradio
5
  sdk_version: 5.49.1
 
 
6
  ---
7
+ [中文阅读](README_zh.md)
8
+
9
+ # **HunyuanWorld-Voyager**
10
+
11
+ <p align="center">
12
+ <img src="assets/teaser.png">
13
+ </p>
14
+
15
+ <div align="center">
16
+ <a href="https://3d-models.hunyuan.tencent.com/world/" target="_blank"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Web&color=green" height=22px></a>
17
+ <a href="https://3d-models.hunyuan.tencent.com/voyager/voyager_en/assets/HYWorld_Voyager.pdf" target="_blank"><img src="https://img.shields.io/static/v1?label=Tech%20Report&message=arxiv&color=red" height=22px></a>
18
+ <a href="https://huggingface.co/tencent/HunyuanWorld-Voyager" target="_blank"><img src="https://img.shields.io/static/v1?label=HunyuanWorld-Voyager&message=HuggingFace&color=yellow" height=22px></a>
19
+ </div>
20
+
21
+ -----
22
+
23
+ We introduce HunyuanWorld-Voyager, a novel video diffusion framework that generates world-consistent 3D point-cloud sequences from a single image with user-defined camera path. Voyager can generate 3D-consistent scene videos for world exploration following custom camera trajectories. It can also generate aligned depth and RGB video for efficient and direct 3D reconstruction.
24
+
25
+
26
+ ## 🔥🔥🔥 News!!
27
+ * October 22, 2025: 👋 We release [HunyuanWorld-1.1 (WorldMirror)](https://github.com/Tencent-Hunyuan/HunyuanWorld-Mirror), supporting 3D world creation from videos or multi-view images!
28
+ * October 16, 2025: 👋 We recently propose [FlashWorld](https://github.com/imlixinyang/FlashWorld), enabling 3DGS world generation in 5~10 seconds on a single GPU!
29
+ * Sep 2, 2025: 👋 We release the code and model weights of HunyuanWorld-Voyager. [Download](ckpts/README.md).
30
+
31
+ > Join our **[Wechat](#)** and **[Discord](https://discord.gg/dNBrdrGGMa)** group to discuss and find help from us.
32
+
33
+ | Wechat Group | Xiaohongshu | X | Discord |
34
+ |--------------------------------------------------|-------------------------------------------------------|---------------------------------------------|---------------------------------------------------|
35
+ | <img src="assets/qrcode/wechat.png" height=140> | <img src="assets/qrcode/xiaohongshu.png" height=140> | <img src="assets/qrcode/x.png" height=140> | <img src="assets/qrcode/discord.png" height=140> |
36
+
37
+ ## 🎥 Demo
38
+ ### Demo Video
39
+
40
+ <div align="center">
41
+ <video src="https://github.com/user-attachments/assets/2eb844c9-30ba-4770-8066-189c123affee" width="80%" poster=""> </video>
42
+ </div>
43
+
44
+ ### Camera-Controllable Video Generation
45
+
46
+ | Input | Generated Video |
47
+ |:----------------:|:----------------:|
48
+ | <img src="assets/demo/camera/input1.png" width="80%"> | <video src="https://github.com/user-attachments/assets/2b03ecd5-9a8f-455c-bf04-c668d3a61b04" width="100%"> </video> |
49
+ | <img src="assets/demo/camera/input2.png" width="80%"> | <video src="https://github.com/user-attachments/assets/45844ac0-c65a-4e04-9f7d-4c72d47e0339" width="100%"> </video> |
50
+ | <img src="assets/demo/camera/input3.png" width="80%"> | <video src="https://github.com/user-attachments/assets/f7f48473-3bb5-4a30-bd22-af3ca95ee8dc" width="100%"> </video> |
51
+
52
+ ### Multiple Applications
53
+
54
+ - Video Reconstruction
55
+
56
+ | Generated Video | Reconstructed Point Cloud |
57
+ |:---------------:|:--------------------------------:|
58
+ | <video src="https://github.com/user-attachments/assets/72a41804-63fc-4596-963d-1497e68f7790" width="100%"> </video> | <video src="https://github.com/user-attachments/assets/67574e9c-9e21-4ed6-9503-e65d187086a2" width="100%"> </video> |
59
+
60
+ - Image-to-3D Generation
61
+
62
+ | | |
63
+ |:---------------:|:---------------:|
64
+ | <video src="https://github.com/user-attachments/assets/886aa86d-990e-4b86-97a5-0b9110862d14" width="100%"> </video> | <video src="https://github.com/user-attachments/assets/4c1734ba-4e78-4979-b30e-3c8c97aa984b" width="100%"> </video> |
65
+
66
+ - Video Depth Estimation
67
+
68
+ | | |
69
+ |:---------------:|:---------------:|
70
+ | <video src="https://github.com/user-attachments/assets/e4c8b729-e880-4be3-826f-429a5c1f12cd" width="100%"> </video> | <video src="https://github.com/user-attachments/assets/7ede0745-cde7-42f1-9c28-e4dca90dac52" width="100%"> </video> |
71
+
72
+
73
+ ## ☯️ **HunyuanWorld-Voyager Introduction**
74
+ ### Architecture
75
+
76
+ Voyager consists of two key components:
77
+
78
+ (1) World-Consistent Video Diffusion: A unified architecture that jointly generates aligned RGB and depth video sequences, conditioned on existing world observation to ensure global coherence.
79
+
80
+ (2) Long-Range World Exploration: An efficient world cache with point culling and an auto-regressive inference with smooth video sampling for iterative scene extension with context-aware consistency.
81
+
82
+ To train Voyager, we propose a scalable data engine, i.e., a video reconstruction pipeline that automates camera pose estimation and metric depth prediction for arbitrary videos, enabling large-scale, diverse training data curation without manual 3D annotations. Using this pipeline, we compile a dataset of over 100,000 video clips, combining real-world captures and synthetic Unreal Engine renders.
83
+
84
+ <p align="center">
85
+ <img src="assets/backbone.jpg" height=500>
86
+ </p>
87
+
88
+ ### Performance
89
+
90
+ <table class="comparison-table">
91
+ <thead>
92
+ <tr>
93
+ <th>Method</th>
94
+ <th>WorldScore Average</th>
95
+ <th>Camera Control</th>
96
+ <th>Object Control</th>
97
+ <th>Content Alignment</th>
98
+ <th>3D Consistency</th>
99
+ <th>Photometric Consistency</th>
100
+ <th>Style Consistency</th>
101
+ <th>Subjective Quality</th>
102
+ </tr>
103
+ </thead>
104
+ <tbody>
105
+ <tr>
106
+ <td>WonderJourney</td>
107
+ <td>🟡63.75</td>
108
+ <td>🟡84.6</td>
109
+ <td>37.1</td>
110
+ <td>35.54</td>
111
+ <td>80.6</td>
112
+ <td>79.03</td>
113
+ <td>62.82</td>
114
+ <td>🟢66.56</td>
115
+ </tr>
116
+ <tr>
117
+ <td>WonderWorld</td>
118
+ <td>🟢72.69</td>
119
+ <td>🔴92.98</td>
120
+ <td>51.76</td>
121
+ <td>🔴71.25</td>
122
+ <td>🔴86.87</td>
123
+ <td>85.56</td>
124
+ <td>70.57</td>
125
+ <td>49.81</td>
126
+ </tr>
127
+ <tr>
128
+ <td>EasyAnimate</td>
129
+ <td>52.85</td>
130
+ <td>26.72</td>
131
+ <td>54.5</td>
132
+ <td>50.76</td>
133
+ <td>67.29</td>
134
+ <td>47.35</td>
135
+ <td>🟡73.05</td>
136
+ <td>50.31</td>
137
+ </tr>
138
+ <tr>
139
+ <td>Allegro</td>
140
+ <td>55.31</td>
141
+ <td>24.84</td>
142
+ <td>🟡57.47</td>
143
+ <td>🟡51.48</td>
144
+ <td>70.5</td>
145
+ <td>69.89</td>
146
+ <td>65.6</td>
147
+ <td>47.41</td>
148
+ </tr>
149
+ <tr>
150
+ <td>Gen-3</td>
151
+ <td>60.71</td>
152
+ <td>29.47</td>
153
+ <td>🟢62.92</td>
154
+ <td>50.49</td>
155
+ <td>68.31</td>
156
+ <td>🟢87.09</td>
157
+ <td>62.82</td>
158
+ <td>🟡63.85</td>
159
+ </tr>
160
+ <tr>
161
+ <td>CogVideoX-I2V</td>
162
+ <td>62.15</td>
163
+ <td>38.27</td>
164
+ <td>40.07</td>
165
+ <td>36.73</td>
166
+ <td>🟢86.21</td>
167
+ <td>🔴88.12</td>
168
+ <td>🟢83.22</td>
169
+ <td>62.44</td>
170
+ </tr>
171
+ <tr class="voyager-row">
172
+ <td><b>Voyager</b></td>
173
+ <td>🔴77.62</td>
174
+ <td>🟢85.95</td>
175
+ <td>🔴66.92</td>
176
+ <td>🟢68.92</td>
177
+ <td>🟡81.56</td>
178
+ <td>🟡85.99</td>
179
+ <td>🔴84.89</td>
180
+ <td>🔴71.09</td>
181
+ </tr>
182
+ </tbody>
183
+ <caption>Quantitative comparison on <i>WorldScore Benchmark</i>. 🔴 indicates the 1st, 🟢 indicates the 2nd, 🟡 indicates the 3rd.</caption>
184
+ </table>
185
+
186
+
187
+ ## 📜 Requirements
188
+
189
+ The following table shows the requirements for running Voyager (batch size = 1) to generate videos:
190
+
191
+ | Model | Resolution | GPU Peak Memory |
192
+ |:----------------:|:-----------:|:----------------:|
193
+ | HunyuanWorld-Voyager | 540p | 60GB |
194
+
195
+ * An NVIDIA GPU with CUDA support is required.
196
+ * The model is tested on a single 80G GPU.
197
+ * **Minimum**: The minimum GPU memory required is 60GB for 540p.
198
+ * **Recommended**: We recommend using a GPU with 80GB of memory for better generation quality.
199
+ * Tested operating system: Linux
200
+
201
+
202
+ ## 🛠️ Dependencies and Installation
203
+
204
+ Begin by cloning the repository:
205
+ ```shell
206
+ git clone https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager
207
+ cd HunyuanWorld-Voyager
208
+ ```
209
+
210
+ ### Installation Guide for Linux
211
+
212
+ We recommend CUDA versions 12.4 or 11.8 for the manual installation.
213
+
214
+ ```shell
215
+ # 1. Create conda environment
216
+ conda create -n voyager python==3.11.9
217
+
218
+ # 2. Activate the environment
219
+ conda activate voyager
220
+
221
+ # 3. Install PyTorch and other dependencies using conda
222
+ # For CUDA 12.4
223
+ conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
224
+
225
+ # 4. Install pip dependencies
226
+ python -m pip install -r requirements.txt
227
+ python -m pip install transformers==4.39.3
228
+
229
+ # 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
230
+ python -m pip install flash-attn
231
+
232
+ # 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3)
233
+ python -m pip install xfuser==0.4.2
234
+ ```
235
+
236
+ In case of running into float point exception(core dump) on the specific GPU type, you may try the following solutions:
237
+
238
+ ```shell
239
+ # Making sure you have installed CUDA 12.4, CUBLAS>=12.4.5.8, and CUDNN>=9.00 (or simply using our CUDA 12 docker image).
240
+ pip install nvidia-cublas-cu12==12.4.5.8
241
+ export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/
242
+ ```
243
+
244
+ To create your own input conditions, you also need to install the following dependencies:
245
+ ```shell
246
+ pip install --no-deps git+https://github.com/microsoft/MoGe.git
247
+ pip install scipy==1.11.4
248
+ pip install git+https://github.com/EasternJournalist/utils3d.git@c5daf6f6c244d251f252102d09e9b7bcef791a38
249
+ ```
250
+
251
+
252
+ ## 🧱 Download Pretrained Models
253
+
254
+ A detailed guidance for downloading pretrained models is shown [here](ckpts/README.md). Briefly,
255
+ ```
256
+ huggingface-cli download tencent/HunyuanWorld-Voyager --local-dir ./ckpts
257
+ ```
258
+
259
+
260
+ ## 🔑 Inference
261
+ ### Create Input Condition
262
+
263
+ We provide several input examples in the `examples` folder. You can find the corresponding input text in the `prompt.txt` file. If you'd like to use your own input image, you can run the following command:
264
+ ```bash
265
+ cd data_engine
266
+
267
+ python3 create_input.py --image_path "your_input_image" --render_output_dir "examples/case/" --type "forward"
268
+ ```
269
+ We provide the following types of camera path:
270
+ - forward
271
+ - backward
272
+ - left
273
+ - right
274
+ - turn_left
275
+ - turn_right
276
+ You can also modify the camera path in the `create_input.py` file.
277
+
278
+ ### Single-GPU Inference
279
+
280
+ ```bash
281
+ cd HunyuanWorld-Voyager
282
+
283
+ python3 sample_image2video.py \
284
+ --model HYVideo-T/2 \
285
+ --input-path "examples/case1" \
286
+ --prompt "An old-fashioned European village with thatched roofs on the houses." \
287
+ --i2v-stability \
288
+ --infer-steps 50 \
289
+ --flow-reverse \
290
+ --flow-shift 7.0 \
291
+ --seed 0 \
292
+ --embedded-cfg-scale 6.0 \
293
+ --use-cpu-offload \
294
+ --save-path ./results
295
+ ```
296
+ You can add "--use-context-block" to add the context block in the inference.
297
+
298
+ ### Parallel Inference on Multiple GPUs by xDiT
299
+
300
+ [xDiT](https://github.com/xdit-project/xDiT) is a Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters.
301
+ It has successfully provided low-latency parallel inference solutions for a variety of DiTs models, including mochi-1, CogVideoX, Flux.1, SD3, etc. This repo adopted the [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) APIs for parallel inference of the HunyuanVideo-I2V model.
302
+
303
+ For example, to generate a video with 8 GPUs, you can use the following command:
304
+
305
+ ```bash
306
+ cd HunyuanWorld-Voyager
307
+
308
+ ALLOW_RESIZE_FOR_SP=1 torchrun --nproc_per_node=8 \
309
+ sample_image2video.py \
310
+ --model HYVideo-T/2 \
311
+ --input-path "examples/case1" \
312
+ --prompt "An old-fashioned European village with thatched roofs on the houses." \
313
+ --i2v-stability \
314
+ --infer-steps 50 \
315
+ --flow-reverse \
316
+ --flow-shift 7.0 \
317
+ --seed 0 \
318
+ --embedded-cfg-scale 6.0 \
319
+ --save-path ./results \
320
+ --ulysses-degree 8 \
321
+ --ring-degree 1
322
+ ```
323
+
324
+ The number of GPUs equals the product of `--ulysses-degree` and `--ring-degree.` Feel free to adjust these parallel configurations to optimize performance.
325
+
326
+ <p align="center">
327
+ <table align="center">
328
+ <thead>
329
+ <tr>
330
+ <th colspan="4">Latency (Sec) for 512x768 (49 frames 50 steps) on 8 x H20 GPU</th>
331
+ </tr>
332
+ <tr>
333
+ <th>1</th>
334
+ <th>2</th>
335
+ <th>4</th>
336
+ <th>8</th>
337
+ </tr>
338
+ </thead>
339
+ <tbody>
340
+ <tr>
341
+ <th>1925</th>
342
+ <th>1018 (1.89x)</th>
343
+ <th>534 (3.60x)</th>
344
+ <th>288 (6.69x)</th>
345
+ </tr>
346
+
347
+ </tbody>
348
+ </table>
349
+ </p>
350
+
351
+ ### Gradio Demo
352
+
353
+ We also provide a Gradio demo for the HunyuanWorld-Voyager model.
354
+
355
+ <p align="center">
356
+ <img src="assets/gradio.png" height=500>
357
+ </p>
358
+
359
+ You can run the following command to start the demo:
360
+ ```bash
361
+ cd HunyuanWorld-Voyager
362
+
363
+ python3 app.py
364
+ ```
365
+ You need to first upload an image and choose a camera direction to create a condition video. Then, you can type your text prompt and generate the final RGB-D video.
366
+
367
+ ### Export Point Cloud
368
+ After generating RGB-D video content, you can export `ply` file as follows:
369
+ ```bash
370
+ cd data_engine
371
+
372
+ python3 convert_point.py --folder_path "your_input_condition_folder" --video_path "your_output_video_path"
373
+ ```
374
+
375
+ ## ⚙️ Data Engine
376
+
377
+ We also release the data engine of HunyuanWorld-Voyager, which can be used to generate scalable data for RGB-D video training. Please refer to [data_engine](data_engine/README.md) for more details.
378
+
379
+ <p align="center">
380
+ <img src="assets/data_engine.jpg" height=500>
381
+ </p>
382
+
383
+
384
+ ## 🔗 BibTeX
385
+
386
+ If you find [Voyager](https://arxiv.org/abs/2506.04225) useful for your research and applications, please cite using this BibTeX:
387
+
388
+ ```BibTeX
389
+ @article{huang2025voyager,
390
+ title={Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation},
391
+ author={Huang, Tianyu and Zheng, Wangguandong and Wang, Tengfei and Liu, Yuhao and Wang, Zhenwei and Wu, Junta and Jiang, Jie and Li, Hui and Lau, Rynson WH and Zuo, Wangmeng and Guo, Chunchao},
392
+ journal={arXiv preprint arXiv:2506.04225},
393
+ year={2025}
394
+ }
395
+ ```
396
+
397
+ ## 📧 Contact
398
+ Please send emails to [email protected] if there is any question
399
+
400
+ ## Acknowledgements
401
 
402
+ We would like to thank [HunyuanWorld](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0), [Hunyuan3D](https://github.com/Tencent-Hunyuan/Hunyuan3D-2), and [HunyuanVideo](https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V). We also thank [VGGT](https://github.com/facebookresearch/vggt), [MoGE](https://github.com/microsoft/MoGe), [Metric3D](https://github.com/YvanYin/Metric3D), for their open research and exploration.
README_zh.md ADDED
@@ -0,0 +1,395 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [English](README.md)
2
+
3
+ # **HunyuanWorld-Voyager**
4
+
5
+ <p align="center">
6
+ <img src="assets/teaser_zh.png">
7
+ </p>
8
+
9
+ <div align="center">
10
+ <a href="https://3d-models.hunyuan.tencent.com/world/"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Web&color=green"></a> &ensp;
11
+ <a href="https://3d-models.hunyuan.tencent.com/voyager/voyager_en/assets/HYWorld_Voyager.pdf" target="_blank"><img src="https://img.shields.io/static/v1?label=Tech%20Report&message=arxiv&color=red" height=22px></a>
12
+ <a href="https://huggingface.co/tencent/HunyuanWorld-Voyager"><img src="https://img.shields.io/static/v1?label=HunyuanWorld-Voyager&message=HuggingFace&color=yellow"></a>
13
+ </div>
14
+
15
+ -----
16
+
17
+ 我们正式发布混元世界模型-Voyager(HunyuanWorld-Voyager),一种创新的视频扩散框架。该模型能够基于单张输入图像生成具有世界一致性的3D点云,并支持用户按照自定义的相机路径进行沉浸式世界探索。同时,HunyuanWorld-Voyager 还能够同时生成精确对齐的深度信息与RGB视频,无需后处理即可直接用于实时、高质量三维重建。
18
+
19
+
20
+ ## 🔥🔥🔥 最新消息!!
21
+ * Sep 2, 2025: 👋 我们发布了HunyuanWorld-Voyager的推理代码和模型权重。[下载](ckpts/README.md).
22
+
23
+ 微信群 and Discord 社区
24
+ > 加入我们的 **[微信群](#)** 和 **[Discord 社区](https://discord.gg/dNBrdrGGMa)** 讨论,获取最新进展以及帮助吧。
25
+
26
+ | 微信群 | 小红书 | X | Discord |
27
+ |--------------------------------------------------|-------------------------------------------------------|---------------------------------------------|---------------------------------------------------|
28
+ | <img src="assets/qrcode/wechat.png" height=140> | <img src="assets/qrcode/xiaohongshu.png" height=140> | <img src="assets/qrcode/x.png" height=140> | <img src="assets/qrcode/discord.png" height=140> |
29
+
30
+ ## 🎥 演示
31
+ ### 演示视频
32
+
33
+ <div align="center">
34
+ <video src="https://github.com/user-attachments/assets/d095a4fd-22a6-41c6-bedd-3e45b468eb98" width="80%" poster=""> </video>
35
+ </div>
36
+
37
+ ### 相机可控视频生成
38
+
39
+ | 输入 | 生成视频 |
40
+ |:----------------:|:----------------:|
41
+ | <img src="assets/demo/camera/input1.png" width="80%"> | <video src="https://github.com/user-attachments/assets/2b03ecd5-9a8f-455c-bf04-c668d3a61b04" width="100%"> </video> |
42
+ | <img src="assets/demo/camera/input2.png" width="80%"> | <video src="https://github.com/user-attachments/assets/45844ac0-c65a-4e04-9f7d-4c72d47e0339" width="100%"> </video> |
43
+ | <img src="assets/demo/camera/input3.png" width="80%"> | <video src="https://github.com/user-attachments/assets/f7f48473-3bb5-4a30-bd22-af3ca95ee8dc" width="100%"> </video> |
44
+
45
+ ### 多样化应用
46
+
47
+ - 视频重建
48
+
49
+ | 生成视频 | 重建点云 |
50
+ |:---------------:|:--------------------------------:|
51
+ | <video src="https://github.com/user-attachments/assets/72a41804-63fc-4596-963d-1497e68f7790" width="100%"> </video> | <video src="https://github.com/user-attachments/assets/67574e9c-9e21-4ed6-9503-e65d187086a2" width="100%"> </video> |
52
+
53
+ - 图像到3D生成
54
+
55
+ | | |
56
+ |:---------------:|:---------------:|
57
+ | <video src="https://github.com/user-attachments/assets/886aa86d-990e-4b86-97a5-0b9110862d14" width="100%"> </video> | <video src="https://github.com/user-attachments/assets/4c1734ba-4e78-4979-b30e-3c8c97aa984b" width="100%"> </video> |
58
+
59
+ - 视频深度估计
60
+
61
+ | | |
62
+ |:---------------:|:---------------:|
63
+ | <video src="https://github.com/user-attachments/assets/e4c8b729-e880-4be3-826f-429a5c1f12cd" width="100%"> </video> | <video src="https://github.com/user-attachments/assets/7ede0745-cde7-42f1-9c28-e4dca90dac52" width="100%"> </video> |
64
+
65
+
66
+ ## ☯️ **混元世界模型-Voyager 介绍**
67
+ ### 架构
68
+
69
+ HunyuanWorld-Voyager 包含两个关键组件:
70
+
71
+ (1) 世界一致的视频扩散:提出了一种统一的架构,能够基于现有世界观测,同时生成精确对齐的RGB视频与深度视频序列,并确保全局场景的一致性。
72
+
73
+ (2) 长距离世界探索:提出了一种高效的世界缓存机制,该机制融合了点云剔除与自回归推理能力,可支持迭代式的场景扩展,并通过上下文感知的一致性技术实现平滑的视频采样。
74
+
75
+ 为训练 HunyuanWorld-Voyager 模型,我们构建了一套可扩展的数据构建引擎——该引擎是一个自动化视频重建流水线,能够对任意输入视频自动估计相机位姿以及度量深度,从而无需依赖人工标注,即可实现大规模、多样化训练数据的构建。
76
+ 基于此流水线,HunyuanWorld-Voyager 整合了真实世界采集与虚幻引擎渲染的视频资源,构建了一个包含超过10 万个视频片段的大规模数据集。
77
+
78
+ <p align="center">
79
+ <img src="assets/backbone.jpg" height=500>
80
+ </p>
81
+
82
+ ### 性能
83
+
84
+ <table class="comparison-table">
85
+ <thead>
86
+ <tr>
87
+ <th>Method</th>
88
+ <th>WorldScore Average</th>
89
+ <th>Camera Control</th>
90
+ <th>Object Control</th>
91
+ <th>Content Alignment</th>
92
+ <th>3D Consistency</th>
93
+ <th>Photometric Consistency</th>
94
+ <th>Style Consistency</th>
95
+ <th>Subjective Quality</th>
96
+ </tr>
97
+ </thead>
98
+ <tbody>
99
+ <tr>
100
+ <td>WonderJourney</td>
101
+ <td>🟡63.75</td>
102
+ <td>🟡84.6</td>
103
+ <td>37.1</td>
104
+ <td>35.54</td>
105
+ <td>80.6</td>
106
+ <td>79.03</td>
107
+ <td>62.82</td>
108
+ <td>🟢66.56</td>
109
+ </tr>
110
+ <tr>
111
+ <td>WonderWorld</td>
112
+ <td>🟢72.69</td>
113
+ <td>🔴92.98</td>
114
+ <td>51.76</td>
115
+ <td>🔴71.25</td>
116
+ <td>🔴86.87</td>
117
+ <td>85.56</td>
118
+ <td>70.57</td>
119
+ <td>49.81</td>
120
+ </tr>
121
+ <tr>
122
+ <td>EasyAnimate</td>
123
+ <td>52.85</td>
124
+ <td>26.72</td>
125
+ <td>54.5</td>
126
+ <td>50.76</td>
127
+ <td>67.29</td>
128
+ <td>47.35</td>
129
+ <td>🟡73.05</td>
130
+ <td>50.31</td>
131
+ </tr>
132
+ <tr>
133
+ <td>Allegro</td>
134
+ <td>55.31</td>
135
+ <td>24.84</td>
136
+ <td>🟡57.47</td>
137
+ <td>🟡51.48</td>
138
+ <td>70.5</td>
139
+ <td>69.89</td>
140
+ <td>65.6</td>
141
+ <td>47.41</td>
142
+ </tr>
143
+ <tr>
144
+ <td>Gen-3</td>
145
+ <td>60.71</td>
146
+ <td>29.47</td>
147
+ <td>🟢62.92</td>
148
+ <td>50.49</td>
149
+ <td>68.31</td>
150
+ <td>🟢87.09</td>
151
+ <td>62.82</td>
152
+ <td>🟡63.85</td>
153
+ </tr>
154
+ <tr>
155
+ <td>CogVideoX-I2V</td>
156
+ <td>62.15</td>
157
+ <td>38.27</td>
158
+ <td>40.07</td>
159
+ <td>36.73</td>
160
+ <td>🟢86.21</td>
161
+ <td>🔴88.12</td>
162
+ <td>🟢83.22</td>
163
+ <td>62.44</td>
164
+ </tr>
165
+ <tr class="voyager-row">
166
+ <td><b>Voyager</b></td>
167
+ <td>🔴77.62</td>
168
+ <td>🟢85.95</td>
169
+ <td>🔴66.92</td>
170
+ <td>🟢68.92</td>
171
+ <td>🟡81.56</td>
172
+ <td>🟡85.99</td>
173
+ <td>🔴84.89</td>
174
+ <td>🔴71.09</td>
175
+ </tr>
176
+ </tbody>
177
+ <caption><i>WorldScore Benchmark</i>的定量比较结果. 🔴 表示第1名, 🟢 表示第2名, 🟡 表示第3名.</caption>
178
+ </table>
179
+
180
+
181
+ ## 📜 要求
182
+
183
+ 以下表格展示了运行Voyager(批量大小 = 1)生成视频的要求:
184
+
185
+ | 模型 | 分辨率 | GPU 峰值内存 |
186
+ |:----------------:|:-----------:|:----------------:|
187
+ | 混元世界模型-Voyager | 540p | 60GB |
188
+
189
+ * 需要NVIDIA GPU支持CUDA。
190
+ * 模型在单个80G GPU上测试。
191
+ * **最小值**: 最小GPU内存要求为540p的60GB。
192
+ * **推荐**: 我们推荐使用80GB内存的GPU以获得更好的生成质量。
193
+ * 测试操作系统: Linux
194
+
195
+
196
+ ## 🛠️ 依赖和安装
197
+
198
+ 首先克隆仓库:
199
+ ```shell
200
+ git clone https://github.com/Tencent-Hunyuan/HunyuanWorld-Voyager
201
+ cd HunyuanWorld-Voyager
202
+ ```
203
+
204
+ ### Linux 安装指南
205
+
206
+ 我们推荐CUDA版本12.4或11.8进行手动安装。
207
+
208
+ ```shell
209
+ # 1. Create conda environment
210
+ conda create -n voyager python==3.11.9
211
+
212
+ # 2. Activate the environment
213
+ conda activate voyager
214
+
215
+ # 3. Install PyTorch and other dependencies using conda
216
+ # For CUDA 12.4
217
+ conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
218
+
219
+ # 4. Install pip dependencies
220
+ python -m pip install -r requirements.txt
221
+ python -m pip install transformers==4.39.3
222
+
223
+ # 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
224
+ python -m pip install flash-attn
225
+
226
+ # 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3)
227
+ python -m pip install xfuser==0.4.2
228
+ ```
229
+
230
+ 在特定GPU类型上运行时,如果出现浮点异常(core dump),您可以尝试以下解决方案:
231
+
232
+ ```shell
233
+ # Making sure you have installed CUDA 12.4, CUBLAS>=12.4.5.8, and CUDNN>=9.00 (or simply using our CUDA 12 docker image).
234
+ pip install nvidia-cublas-cu12==12.4.5.8
235
+ export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/
236
+ ```
237
+
238
+ 为了创建自己的输入条件,您还需要安装以下依赖:
239
+ ```shell
240
+ pip install --no-deps git+https://github.com/microsoft/MoGe.git
241
+ pip install scipy==1.11.4
242
+ pip install git+https://github.com/EasternJournalist/utils3d.git@c5daf6f6c244d251f252102d09e9b7bcef791a38
243
+ ```
244
+
245
+
246
+ ## 🧱 下载预训练模型
247
+
248
+ 下载预训练模型的详细信息请参考[这里](ckpts/README.md)。简单来讲,
249
+ ```
250
+ huggingface-cli download tencent/HunyuanWorld-Voyager --local-dir ./ckpts
251
+ ```
252
+
253
+ ## 🔑 推理
254
+
255
+ ### 创建输入条件
256
+
257
+ ```bash
258
+ cd data_engine
259
+
260
+ python3 create_input.py --image_path "your_input_image" --render_output_dir "examples/case/" --type "forward"
261
+ ```
262
+ 我们提供了以下类型的相机路径:
263
+ - forward
264
+ - backward
265
+ - left
266
+ - right
267
+ - turn_left
268
+ - turn_right
269
+ 您也可以在`create_input.py`文件中修改相机路径。
270
+
271
+ ### 单GPU推理
272
+
273
+ ```bash
274
+ cd HunyuanWorld-Voyager
275
+
276
+ python3 sample_image2video.py \
277
+ --model HYVideo-T/2 \
278
+ --input-path "examples/case1" \
279
+ --prompt "An old-fashioned European village with thatched roofs on the houses." \
280
+ --i2v-stability \
281
+ --infer-steps 50 \
282
+ --flow-reverse \
283
+ --flow-shift 7.0 \
284
+ --seed 0 \
285
+ --embedded-cfg-scale 6.0 \
286
+ --use-cpu-offload \
287
+ --save-path ./results
288
+ ```
289
+ 您可以添加"--use-context-block"来添加推理中的上下文块。
290
+
291
+ ### 多GPU并行推理
292
+
293
+ [xDiT](https://github.com/xdit-project/xDiT) 是一个可扩展的推理引擎,用于多GPU集群上的扩散Transformer(DiTs)。
294
+ 它成功地为各种DiTs模型(包括mochi-1、CogVideoX、Flux.1、SD3等)提供了低延迟的并行推理解决方案。这个仓库采用了[统一序列并行(USP)](https://arxiv.org/abs/2405.07719) API来并行推理HunyuanVideo-I2V模型。
295
+
296
+ 例如,要使用8个GPU生成视频,您可以使用以下命令:
297
+
298
+ ```bash
299
+ cd HunyuanWorld-Voyager
300
+
301
+ ALLOW_RESIZE_FOR_SP=1 torchrun --nproc_per_node=8 \
302
+ sample_image2video.py \
303
+ --model HYVideo-T/2 \
304
+ --input-path "examples/case1" \
305
+ --prompt "An old-fashioned European village with thatched roofs on the houses." \
306
+ --i2v-stability \
307
+ --infer-steps 50 \
308
+ --flow-reverse \
309
+ --flow-shift 7.0 \
310
+ --seed 0 \
311
+ --embedded-cfg-scale 6.0 \
312
+ --save-path ./results \
313
+ --ulysses-degree 8 \
314
+ --ring-degree 1
315
+ ```
316
+
317
+ GPU数量等于`--ulysses-degree`和`--ring-degree`的乘积。您可以自由调整这些并行配置以优化性能。
318
+
319
+ <p align="center">
320
+ <table align="center">
321
+ <thead>
322
+ <tr>
323
+ <th colspan="4">512x768(49帧,50步)在8 x H20 GPU上的延迟(秒)</th>
324
+ </tr>
325
+ <tr>
326
+ <th>1</th>
327
+ <th>2</th>
328
+ <th>4</th>
329
+ <th>8</th>
330
+ </tr>
331
+ </thead>
332
+ <tbody>
333
+ <tr>
334
+ <th>1925</th>
335
+ <th>1018 (1.89x)</th>
336
+ <th>534 (3.60x)</th>
337
+ <th>288 (6.69x)</th>
338
+ </tr>
339
+
340
+ </tbody>
341
+ </table>
342
+ </p>
343
+
344
+
345
+ ### Gradio 演示
346
+
347
+ 我们也提供了一个Gradio演示,
348
+
349
+ <p align="center">
350
+ <img src="assets/gradio.png" height=500>
351
+ </p>
352
+
353
+ 您可以使用以下命令启动:
354
+ ```bash
355
+ cd HunyuanWorld-Voyager
356
+
357
+ python3 app.py
358
+ ```
359
+
360
+ 您需要首先上传一张图片并选择相机的运动方向,来生成一个条件视频。接下来,您就可以输入文本提示词来生成最终的RGB-D视频。
361
+
362
+ ### 导出点云
363
+ 生成RGB-D视频结果之后,你可以用如下方式导出`ply`文件:
364
+ ```bash
365
+ cd data_engine
366
+
367
+ python3 convert_point.py --folder_path "your_input_condition_folder" --video_path "your_output_video_path"
368
+ ```
369
+
370
+ ## ⚙️ 数据引擎
371
+
372
+ 我们发布了混元世界模型-Voyager的数据引擎,可以用于生成可扩展的RGB-D视频训练数据。请参考[data_engine](data_engine/README.md)了解更多细节。
373
+
374
+ <p align="center">
375
+ <img src="assets/data_engine.jpg" height=500>
376
+ </p>
377
+
378
+
379
+ ## 🔗 引用
380
+
381
+ 如果您发现[Voyager](https://arxiv.org/abs/2506.04225)对您的研究或应用有用,请使用以下BibTeX引用:
382
+
383
+ ```BibTeX
384
+ @article{huang2025voyager,
385
+ title={Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation},
386
+ author={Huang, Tianyu and Zheng, Wangguandong and Wang, Tengfei and Liu, Yuhao and Wang, Zhenwei and Wu, Junta and Jiang, Jie and Li, Hui and Lau, Rynson WH and Zuo, Wangmeng and Guo, Chunchao},
387
+ journal={arXiv preprint arXiv:2506.04225},
388
+ year={2025}
389
+ }
390
+ ```
391
+
392
+
393
+ ## 致谢
394
+
395
+ 我们感谢[HunyuanWorld](https://github.com/Tencent-Hunyuan/HunyuanWorld-1.0)、[Hunyuan3D-2](https://github.com/Tencent-Hunyuan/Hunyuan3D-2)和[HunyuanVideo-I2V](https://github.com/Tencent-Hunyuan/HunyuanVideo-I2V)。我们也感谢[VGGT](https://github.com/facebookresearch/vggt)、[MoGE](https://github.com/microsoft/MoGe)、[Metric3D](https://github.com/YvanYin/Metric3D)的贡献者。
app.py ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+
3
+ def demo_fn(texto):
4
+ return f"Respuesta del modelo a: {texto}"
5
+
6
+ demo = gr.Interface(
7
+ fn=demo_fn,
8
+ inputs="text",
9
+ outputs="text",
10
+ title="HunyuanWorld-Voyager Demo",
11
+ description="Demostración del modelo real"
12
+ )
13
+
14
+ if __name__ == "__main__":
15
+ demo.launch()
assets/HYWorld_Voyager.pdf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44223eb320d2fa8f4f1721d4058f0c270d3d2f7dc1da3508e8a586fac70e7bdc
3
+ size 37966526
assets/backbone.jpg ADDED

Git LFS Details

  • SHA256: c0a5139d78a7488547473970c19260694fff90890a649c7e904ff249f3e464ea
  • Pointer size: 132 Bytes
  • Size of remote file: 1.22 MB
assets/data_engine.jpg ADDED

Git LFS Details

  • SHA256: 28e094093d8383acf393556bab6497a4969889b282e90ea6e71921a089092c94
  • Pointer size: 131 Bytes
  • Size of remote file: 773 kB
assets/demo/camera/input1.png ADDED

Git LFS Details

  • SHA256: e2ae970622a8d8750d1d2a06dc5a1415a79ae43790e8dd4b1ea65d40f730fb86
  • Pointer size: 132 Bytes
  • Size of remote file: 1.55 MB
assets/demo/camera/input2.png ADDED

Git LFS Details

  • SHA256: 24ecb4118b96c1082f24a9912ede04cb5189344e98148a19c5856459d14b52db
  • Pointer size: 132 Bytes
  • Size of remote file: 1.58 MB
assets/demo/camera/input3.png ADDED

Git LFS Details

  • SHA256: 3b496461509ba67e249eb636fe93477864e2d667225382d5fd5a2508c36d37d3
  • Pointer size: 132 Bytes
  • Size of remote file: 1.42 MB
assets/gradio.png ADDED

Git LFS Details

  • SHA256: 7896c33cf0c718a37a7093370e5f2ee714cbd8e6c45d5cda5859487ff451145d
  • Pointer size: 132 Bytes
  • Size of remote file: 3.33 MB
assets/qrcode/discord.png ADDED
assets/qrcode/wechat.png ADDED
assets/qrcode/x.png ADDED
assets/qrcode/xiaohongshu.png ADDED
assets/teaser.png ADDED

Git LFS Details

  • SHA256: 04771c0f2ff35e5c63945a034ae8418a45d14e736ebc1694c7e30eb63c318a9f
  • Pointer size: 131 Bytes
  • Size of remote file: 840 kB
assets/teaser_zh.png ADDED

Git LFS Details

  • SHA256: 0f7b97305dc854c6a5f557dc81e87b98432985bf6b1fe4cd6fba46eac4d074fa
  • Pointer size: 131 Bytes
  • Size of remote file: 837 kB
ckpts/README.md ADDED
@@ -0,0 +1,57 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ All models are stored in `HunyuanWorld-Voyager/ckpts` by default, and the file structure is as follows
3
+ ```shell
4
+ HunyuanWorld-Voyager
5
+ ├──ckpts
6
+ │ ├──README.md
7
+ │ ├──Voyager
8
+ │ │ ├──transformers
9
+ │ │ │ ├──mp_rank_00_model_states.pt
10
+ │ │ │ ├──mp_rank_00_model_states_context.pt
11
+ │ ├──hunyuan-video-i2v-720p
12
+ │ │ ├──vae
13
+ │ ├──text_encoder_i2v
14
+ │ ├──text_encoder_2
15
+ ├──...
16
+ ```
17
+
18
+ ## Download HunyuanWorld-Voyager model
19
+ To download the HunyuanWorld-Voyager model, first install the huggingface-cli. (Detailed instructions are available [here](https://huggingface.co/docs/huggingface_hub/guides/cli).)
20
+
21
+ ```shell
22
+ python -m pip install "huggingface_hub[cli]"
23
+ ```
24
+
25
+ Then download the model using the following commands:
26
+
27
+ ```shell
28
+ # Switch to the directory named 'HunyuanWorld-Voyager'
29
+ cd HunyuanWorld-Voyager
30
+ # Use the huggingface-cli tool to download HunyuanWorld-Voyager model in HunyuanWorld-Voyager/ckpts dir.
31
+ # The download time may vary from 10 minutes to 1 hour depending on network conditions.
32
+ huggingface-cli download tencent/HunyuanWorld-Voyager --local-dir ./ckpts
33
+ ```
34
+
35
+ <details>
36
+ <summary>💡Tips for using huggingface-cli (network problem)</summary>
37
+
38
+ ##### 1. Using HF-Mirror
39
+
40
+ If you encounter slow download speeds in China, you can try a mirror to speed up the download process. For example,
41
+
42
+ ```shell
43
+ HF_ENDPOINT=https://hf-mirror.com huggingface-cli download tencent/HunyuanWorld-Voyager --local-dir ./ckpts
44
+ ```
45
+
46
+ ##### 2. Resume Download
47
+
48
+ `huggingface-cli` supports resuming downloads. If the download is interrupted, you can just rerun the download
49
+ command to resume the download process.
50
+
51
+ Note: If an `No such file or directory: 'ckpts/.huggingface/.gitignore.lock'` like error occurs during the download
52
+ process, you can ignore the error and rerun the download command.
53
+
54
+ </details>
55
+
56
+
57
+
data_engine/README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This is the data engine for processing training data of Voyager.
2
+
3
+
4
+ ## 🛠️ Dependencies and Installation
5
+
6
+ Begin by cloning required repositories:
7
+ ```shell
8
+ # VGGT
9
+ git clone https://github.com/facebookresearch/vggt.git
10
+ touch vggt/vggt/__init__.py # create an empty init.py
11
+
12
+ # MoGe
13
+ git clone https://github.com/microsoft/MoGe.git
14
+
15
+ # Metric3D
16
+ git clone https://github.com/YvanYin/Metric3D.git
17
+ # comment out line 8-12 in Metric3D/mono/utils/comm.py
18
+ # add from mono.model.backbones import * to Metric3D/mono/utils/comm.py
19
+ ```
20
+
21
+ Install required dependencies:
22
+
23
+ ```shell
24
+ conda create -n data_engine python=3.10
25
+ conda activate data_engine
26
+ pip install -r requirements.txt
27
+ ```
28
+
29
+ ## 🛠️ Install Environment
30
+
31
+ ```shell
32
+ # project path
33
+ cd data_engine
34
+
35
+ # VGGT
36
+ git clone https://github.com/facebookresearch/vggt.git
37
+ touch vggt/vggt/__init__.py
38
+
39
+ # MoGe
40
+ git clone https://github.com/microsoft/MoGe.git
41
+
42
+ # Metric3D
43
+ git clone https://github.com/YvanYin/Metric3D.git
44
+ # !!! important steps:
45
+ # comment out line 8-12 in Metric3D/mono/utils/comm.py
46
+ # and then add from mono.model.backbones import * to Metric3D/mono/utils/comm.py
47
+
48
+ # pip install environment
49
+ conda create -n voyager_dataengine python=3.10
50
+ conda activate voyager_dataengine
51
+ pip install -r requirements.txt
52
+
53
+ # run dataEngine
54
+ bash dataEngine.sh
55
+ ```
56
+
57
+ ## 🔑 Run Data Engine
58
+
59
+ We provide a script to run the data engine.
60
+ ```shell
61
+ bash run.sh
62
+ ```
data_engine/convert_point.py ADDED
@@ -0,0 +1,72 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import argparse
2
+ import os
3
+ import json
4
+ import numpy as np
5
+ import imageio
6
+
7
+ from create_input import depth_to_world_coords_points, camera_list
8
+
9
+
10
+ def parse_args():
11
+ parser = argparse.ArgumentParser()
12
+ parser.add_argument("--folder_path", type=str)
13
+ parser.add_argument("--video_path", type=str)
14
+ parser.add_argument("--frame_id", type=int, default=0)
15
+ parser.add_argument("--max_depth", type=float, default=25)
16
+ return parser.parse_args()
17
+
18
+
19
+ def save_ply(points: np.ndarray, colors: np.ndarray, out_path: str):
20
+ os.makedirs(os.path.dirname(out_path), exist_ok=True)
21
+ n = points.shape[0]
22
+ colors = np.clip(colors, 0, 255).astype(np.uint8)
23
+ header = (
24
+ "ply\n"
25
+ "format ascii 1.0\n"
26
+ f"element vertex {n}\n"
27
+ "property float x\n"
28
+ "property float y\n"
29
+ "property float z\n"
30
+ "property uchar red\n"
31
+ "property uchar green\n"
32
+ "property uchar blue\n"
33
+ "end_header\n"
34
+ )
35
+ with open(out_path, "w") as f:
36
+ f.write(header)
37
+ for p, c in zip(points, colors):
38
+ f.write(f"{float(p[0])} {float(p[1])} {float(p[2])} {int(c[0])} {int(c[1])} {int(c[2])}\n")
39
+
40
+
41
+ if __name__ == "__main__":
42
+ args = parse_args()
43
+ folder_path = args.folder_path
44
+ video_path = args.video_path
45
+ frame_id = args.frame_id
46
+ max_depth = args.max_depth
47
+
48
+ reader = imageio.v2.get_reader(video_path)
49
+ for i, frame in enumerate(reader):
50
+ if i == frame_id:
51
+ frame = frame.astype(np.uint8)
52
+ break
53
+
54
+ with open(os.path.join(folder_path, "depth_range.json"), "r") as f:
55
+ depth_range = json.load(f)[frame_id]
56
+
57
+ rgb = frame[:512]
58
+ depth = frame[512:, :, 0] / 255.0
59
+ depth = depth * (depth_range[1] - depth_range[0]) + depth_range[0]
60
+ depth = 1 / (depth + 1e-6)
61
+ valid_mask = np.logical_and(depth > 0, depth < max_depth)
62
+
63
+ intrinsics, extrinsics = camera_list(
64
+ num_frames=1, type="forward", Width=512, Height=512, fx=256, fy=256
65
+ )
66
+ point_map = depth_to_world_coords_points(depth, extrinsics[0], intrinsics[0])
67
+ points = point_map[valid_mask].reshape(-1, 3)
68
+ colors = rgb[valid_mask].reshape(-1, 3)
69
+
70
+ out_ply = os.path.join(folder_path, f"frame_{frame_id:06d}.ply")
71
+ save_ply(points, colors, out_ply)
72
+ print(f"Saved point cloud: {out_ply}, number of points: {points.shape[0]}")
data_engine/create_input.py ADDED
@@ -0,0 +1,391 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import numpy as np
2
+ from PIL import Image
3
+ import torch
4
+ import argparse
5
+ import os
6
+ import json
7
+ import imageio
8
+ import pyexr
9
+ import cv2
10
+
11
+ try:
12
+ from moge.model.v1 import MoGeModel
13
+ except:
14
+ from MoGe.moge.model.v1 import MoGeModel
15
+
16
+
17
+ def parse_args():
18
+ parser = argparse.ArgumentParser()
19
+ parser.add_argument("--image_path", type=str, default="./example.png")
20
+ parser.add_argument("--render_output_dir", type=str, default="../demo/example/")
21
+ parser.add_argument("--type", type=str, default="forward",
22
+ choices=["forward", "backward", "left", "right", "turn_left", "turn_right"])
23
+ return parser.parse_args()
24
+
25
+
26
+ def camera_list(
27
+ num_frames=49,
28
+ type="forward",
29
+ Width=512,
30
+ Height=512,
31
+ fx=256,
32
+ fy=256
33
+ ):
34
+ assert type in ["forward", "backward", "left", "right", "turn_left", "turn_right"], "Invalid camera type"
35
+
36
+ start_pos = np.array([0, 0, 0])
37
+ end_pos = np.array([0, 0, 0])
38
+ if type == "forward":
39
+ end_pos = np.array([0, 0, 1])
40
+ elif type == "backward":
41
+ end_pos = np.array([0, 0, -1])
42
+ elif type == "left":
43
+ end_pos = np.array([-1, 0, 0])
44
+ elif type == "right":
45
+ end_pos = np.array([1, 0, 0])
46
+
47
+ cx = Width // 2
48
+ cy = Height // 2
49
+
50
+ intrinsic = np.array([
51
+ [fx, 0, cx],
52
+ [0, fy, cy],
53
+ [0, 0, 1]
54
+ ])
55
+ intrinsics = np.stack([intrinsic] * num_frames)
56
+
57
+ # Interpolate camera positions along a straight line
58
+ camera_centers = np.linspace(start_pos, end_pos, num_frames)
59
+ target_start = np.array([0, 0, 100]) # Target point
60
+ if type == "turn_left":
61
+ target_end = np.array([-100, 0, 0])
62
+ elif type == "turn_right":
63
+ target_end = np.array([100, 0, 0])
64
+ else:
65
+ target_end = np.array([0, 0, 100])
66
+ target_points = np.linspace(target_start, target_end, num_frames * 2)[:num_frames]
67
+
68
+ extrinsics = []
69
+ for t, target_point in zip(camera_centers, target_points):
70
+ if type == "left" or type == "right":
71
+ target_point = t + target_point
72
+
73
+ z = (target_point - t)
74
+ z = z / np.linalg.norm(z)
75
+ x = np.array([1, 0, 0])
76
+ y = np.cross(z, x)
77
+ y = y / np.linalg.norm(y)
78
+ x = np.cross(y, z)
79
+
80
+ R = np.stack([x, y, z], axis=0)
81
+ w2c = np.eye(4)
82
+ w2c[:3, :3] = R
83
+ w2c[:3, 3] = -R @ t
84
+ extrinsics.append(w2c)
85
+ extrinsics = np.stack(extrinsics)
86
+
87
+ return intrinsics, extrinsics
88
+
89
+
90
+ # from VGGT: https://github.com/facebookresearch/vggt/blob/main/vggt/utils/geometry.py
91
+ def depth_to_cam_coords_points(depth_map: np.ndarray, intrinsic: np.ndarray) -> tuple[np.ndarray, np.ndarray]:
92
+ """
93
+ Convert a depth map to camera coordinates.
94
+
95
+ Args:
96
+ depth_map (np.ndarray): Depth map of shape (H, W).
97
+ intrinsic (np.ndarray): Camera intrinsic matrix of shape (3, 3).
98
+
99
+ Returns:
100
+ tuple[np.ndarray, np.ndarray]: Camera coordinates (H, W, 3)
101
+ """
102
+ H, W = depth_map.shape
103
+ assert intrinsic.shape == (3, 3), "Intrinsic matrix must be 3x3"
104
+ assert intrinsic[0, 1] == 0 and intrinsic[1, 0] == 0, "Intrinsic matrix must have zero skew"
105
+
106
+ # Intrinsic parameters
107
+ fu, fv = intrinsic[0, 0], intrinsic[1, 1]
108
+ cu, cv = intrinsic[0, 2], intrinsic[1, 2]
109
+
110
+ # Generate grid of pixel coordinates
111
+ u, v = np.meshgrid(np.arange(W), np.arange(H))
112
+
113
+ # Unproject to camera coordinates
114
+ x_cam = (u - cu) * depth_map / fu
115
+ y_cam = (v - cv) * depth_map / fv
116
+ z_cam = depth_map
117
+
118
+ # Stack to form camera coordinates
119
+ cam_coords = np.stack((x_cam, y_cam, z_cam), axis=-1).astype(np.float32)
120
+
121
+ return cam_coords
122
+
123
+
124
+ def closed_form_inverse_se3(se3, R=None, T=None):
125
+ """
126
+ Compute the inverse of each 4x4 (or 3x4) SE3 matrix in a batch.
127
+
128
+ If `R` and `T` are provided, they must correspond to the rotation and translation
129
+ components of `se3`. Otherwise, they will be extracted from `se3`.
130
+
131
+ Args:
132
+ se3: Nx4x4 or Nx3x4 array or tensor of SE3 matrices.
133
+ R (optional): Nx3x3 array or tensor of rotation matrices.
134
+ T (optional): Nx3x1 array or tensor of translation vectors.
135
+
136
+ Returns:
137
+ Inverted SE3 matrices with the same type and device as `se3`.
138
+
139
+ Shapes:
140
+ se3: (N, 4, 4)
141
+ R: (N, 3, 3)
142
+ T: (N, 3, 1)
143
+ """
144
+ # Check if se3 is a numpy array or a torch tensor
145
+ is_numpy = isinstance(se3, np.ndarray)
146
+
147
+ # Validate shapes
148
+ if se3.shape[-2:] != (4, 4) and se3.shape[-2:] != (3, 4):
149
+ raise ValueError(f"se3 must be of shape (N,4,4), got {se3.shape}.")
150
+
151
+ # Extract R and T if not provided
152
+ if R is None:
153
+ R = se3[:, :3, :3] # (N,3,3)
154
+ if T is None:
155
+ T = se3[:, :3, 3:] # (N,3,1)
156
+
157
+ # Transpose R
158
+ if is_numpy:
159
+ # Compute the transpose of the rotation for NumPy
160
+ R_transposed = np.transpose(R, (0, 2, 1))
161
+ # -R^T t for NumPy
162
+ top_right = -np.matmul(R_transposed, T)
163
+ inverted_matrix = np.tile(np.eye(4), (len(R), 1, 1))
164
+ else:
165
+ R_transposed = R.transpose(1, 2) # (N,3,3)
166
+ top_right = -torch.bmm(R_transposed, T) # (N,3,1)
167
+ inverted_matrix = torch.eye(4, 4)[None].repeat(len(R), 1, 1)
168
+ inverted_matrix = inverted_matrix.to(R.dtype).to(R.device)
169
+
170
+ inverted_matrix[:, :3, :3] = R_transposed
171
+ inverted_matrix[:, :3, 3:] = top_right
172
+
173
+ return inverted_matrix
174
+
175
+
176
+ def depth_to_world_coords_points(
177
+ depth_map: np.ndarray,
178
+ extrinsic: np.ndarray,
179
+ intrinsic: np.ndarray,
180
+ eps=1e-8,
181
+ ) -> tuple[np.ndarray, np.ndarray, np.ndarray]:
182
+ """
183
+ Convert a depth map to world coordinates.
184
+
185
+ Args:
186
+ depth_map (np.ndarray): Depth map of shape (H, W).
187
+ intrinsic (np.ndarray): Camera intrinsic matrix of shape (3, 3).
188
+ extrinsic (np.ndarray): Camera extrinsic matrix of shape (3, 4).
189
+
190
+ Returns:
191
+ tuple[np.ndarray, np.ndarray]: World coordinates (H, W, 3) and valid depth mask (H, W).
192
+ """
193
+ if depth_map is None:
194
+ return None, None, None
195
+
196
+ # Valid depth mask
197
+ point_mask = depth_map > eps
198
+
199
+ # Convert depth map to camera coordinates
200
+ cam_coords_points = depth_to_cam_coords_points(depth_map, intrinsic)
201
+
202
+ # Multiply with the inverse of extrinsic matrix to transform to world coordinates
203
+ # extrinsic_inv is 4x4 (note closed_form_inverse_OpenCV is batched, the output is (N, 4, 4))
204
+ cam_to_world_extrinsic = closed_form_inverse_se3(extrinsic[None])[0]
205
+
206
+ R_cam_to_world = cam_to_world_extrinsic[:3, :3]
207
+ t_cam_to_world = cam_to_world_extrinsic[:3, 3]
208
+
209
+ # Apply the rotation and translation to the camera coordinates
210
+ world_coords_points = np.dot(cam_coords_points, R_cam_to_world.T) + t_cam_to_world # HxWx3, 3x3 -> HxWx3
211
+ # world_coords_points = np.einsum("ij,hwj->hwi", R_cam_to_world, cam_coords_points) + t_cam_to_world
212
+
213
+ return world_coords_points
214
+
215
+
216
+ def render_from_cameras_videos(points, colors, extrinsics, intrinsics, height, width):
217
+
218
+ homogeneous_points = np.hstack((points, np.ones((points.shape[0], 1))))
219
+
220
+ render_list = []
221
+ mask_list = []
222
+ depth_list = []
223
+ # Render from each camera
224
+ for frame_idx in range(len(extrinsics)):
225
+ # Get corresponding camera parameters
226
+ extrinsic = extrinsics[frame_idx]
227
+ intrinsic = intrinsics[frame_idx]
228
+
229
+ camera_coords = (extrinsic @ homogeneous_points.T).T[:, :3]
230
+ projected = (intrinsic @ camera_coords.T).T
231
+ uv = projected[:, :2] / projected[:, 2].reshape(-1, 1)
232
+ depths = projected[:, 2]
233
+
234
+ pixel_coords = np.round(uv).astype(int) # pixel_coords (h*w, 2)
235
+ valid_pixels = ( # valid_pixels (h*w, ) valid_pixels is the valid pixels in width and height
236
+ (pixel_coords[:, 0] >= 0) &
237
+ (pixel_coords[:, 0] < width) &
238
+ (pixel_coords[:, 1] >= 0) &
239
+ (pixel_coords[:, 1] < height)
240
+ )
241
+
242
+ pixel_coords_valid = pixel_coords[valid_pixels] # (h*w, 2) to (valid_count, 2)
243
+ colors_valid = colors[valid_pixels]
244
+ depths_valid = depths[valid_pixels]
245
+ uv_valid = uv[valid_pixels]
246
+
247
+
248
+ valid_mask = (depths_valid > 0) & (depths_valid < 60000) # & normal_angle_mask
249
+ colors_valid = colors_valid[valid_mask]
250
+ depths_valid = depths_valid[valid_mask]
251
+ pixel_coords_valid = pixel_coords_valid[valid_mask]
252
+
253
+ # Initialize depth buffer
254
+ depth_buffer = np.full((height, width), np.inf)
255
+ image = np.zeros((height, width, 3), dtype=np.uint8)
256
+
257
+ # Vectorized depth buffer update
258
+ if len(pixel_coords_valid) > 0:
259
+ rows = pixel_coords_valid[:, 1]
260
+ cols = pixel_coords_valid[:, 0]
261
+
262
+ # Sort by depth (near to far)
263
+ sorted_idx = np.argsort(depths_valid)
264
+ rows = rows[sorted_idx]
265
+ cols = cols[sorted_idx]
266
+ depths_sorted = depths_valid[sorted_idx]
267
+ colors_sorted = colors_valid[sorted_idx]
268
+
269
+ # Vectorized depth buffer update
270
+ depth_buffer[rows, cols] = np.minimum(
271
+ depth_buffer[rows, cols],
272
+ depths_sorted
273
+ )
274
+
275
+ # Get the minimum depth index for each pixel
276
+ flat_indices = rows * width + cols # Flatten 2D coordinates to 1D index
277
+ unique_indices, idx = np.unique(flat_indices, return_index=True)
278
+
279
+ # Recover 2D coordinates from flattened indices
280
+ final_rows = unique_indices // width
281
+ final_cols = unique_indices % width
282
+
283
+ image[final_rows, final_cols] = colors_sorted[idx, :3].astype(np.uint8)
284
+
285
+ mask = np.zeros_like(depth_buffer, dtype=np.uint8)
286
+ mask[depth_buffer != np.inf] = 255
287
+
288
+ render_list.append(image)
289
+ mask_list.append(mask)
290
+ depth_list.append(depth_buffer)
291
+
292
+ return render_list, mask_list, depth_list
293
+
294
+
295
+ def create_video_input(
296
+ render_list, mask_list, depth_list, render_output_dir,
297
+ separate=True, ref_image=None, ref_depth=None,
298
+ Width=512, Height=512,
299
+ min_percentile=2, max_percentile=98
300
+ ):
301
+ video_output_dir = os.path.join(render_output_dir)
302
+ os.makedirs(video_output_dir, exist_ok=True)
303
+ video_input_dir = os.path.join(render_output_dir, "video_input")
304
+ os.makedirs(video_input_dir, exist_ok=True)
305
+
306
+ value_list = []
307
+ for i, (render, mask, depth) in enumerate(zip(render_list, mask_list, depth_list)):
308
+
309
+ # Sky part is the region where depth_max is, also included in mask
310
+ mask = mask > 0
311
+ # depth_max = np.max(depth)
312
+ # non_sky_mask = (depth != depth_max)
313
+ # mask = mask & non_sky_mask
314
+ depth[mask] = 1 / (depth[mask] + 1e-6)
315
+ depth_values = depth[mask]
316
+
317
+ min_percentile = np.percentile(depth_values, 2)
318
+ max_percentile = np.percentile(depth_values, 98)
319
+ value_list.append((min_percentile, max_percentile))
320
+
321
+ depth[mask] = (depth[mask] - min_percentile) / (max_percentile - min_percentile)
322
+ depth[~mask] = depth[mask].min()
323
+
324
+
325
+ # resize to 512x512
326
+ render = cv2.resize(render, (Width, Height), interpolation=cv2.INTER_LINEAR)
327
+ mask = cv2.resize((mask.astype(np.float32) * 255).astype(np.uint8), \
328
+ (Width, Height), interpolation=cv2.INTER_NEAREST)
329
+ depth = cv2.resize(depth, (Width, Height), interpolation=cv2.INTER_LINEAR)
330
+
331
+ # Save mask as png
332
+ mask_path = os.path.join(video_input_dir, f"mask_{i:04d}.png")
333
+ imageio.imwrite(mask_path, mask)
334
+
335
+ if separate:
336
+ render_path = os.path.join(video_input_dir, f"render_{i:04d}.png")
337
+ imageio.imwrite(render_path, render)
338
+ depth_path = os.path.join(video_input_dir, f"depth_{i:04d}.exr")
339
+ pyexr.write(depth_path, depth)
340
+ else:
341
+ render = np.concatenate([render, depth], axis=-3)
342
+ render_path = os.path.join(video_input_dir, f"render_{i:04d}.png")
343
+ imageio.imwrite(render_path, render)
344
+
345
+ if i == 0:
346
+ if separate:
347
+ ref_image_path = os.path.join(video_output_dir, f"ref_image.png")
348
+ imageio.imwrite(ref_image_path, ref_image)
349
+ ref_depth_path = os.path.join(video_output_dir, f"ref_depth.exr")
350
+ pyexr.write(ref_depth_path, depth)
351
+ else:
352
+ ref_image = np.concatenate([ref_image, depth], axis=-3)
353
+ ref_image_path = os.path.join(video_output_dir, f"ref_image.png")
354
+ imageio.imwrite(ref_image_path, ref_image)
355
+
356
+ with open(os.path.join(video_output_dir, f"depth_range.json"), "w") as f:
357
+ json.dump(value_list, f)
358
+
359
+
360
+ if __name__ == "__main__":
361
+ args = parse_args()
362
+
363
+ device = torch.device("cuda")
364
+ model = MoGeModel.from_pretrained("Ruicheng/moge-vitl", local_files_only=False).to(device)
365
+
366
+ image = np.array(Image.open(args.image_path).convert("RGB").resize((1280, 720)))
367
+ image_tensor = torch.tensor(image / 255, dtype=torch.float32, device=device).permute(2, 0, 1)
368
+ output = model.infer(image_tensor)
369
+ depth = np.array(output['depth'].detach().cpu())
370
+ depth[np.isinf(depth)] = depth[~np.isinf(depth)].max() + 1e4
371
+
372
+ Height, Width = image.shape[:2]
373
+ intrinsics, extrinsics = camera_list(
374
+ num_frames=1, type=args.type, Width=Width, Height=Height, fx=256, fy=256
375
+ )
376
+
377
+ # Backproject point cloud
378
+ point_map = depth_to_world_coords_points(depth, extrinsics[0], intrinsics[0])
379
+ points = point_map.reshape(-1, 3)
380
+ colors = image.reshape(-1, 3)
381
+
382
+ intrinsics, extrinsics = camera_list(
383
+ num_frames=49, type=args.type, Width=Width//2, Height=Height//2, fx=128, fy=128
384
+ )
385
+ render_list, mask_list, depth_list = render_from_cameras_videos(
386
+ points, colors, extrinsics, intrinsics, height=Height//2, width=Width//2
387
+ )
388
+
389
+ create_video_input(
390
+ render_list, mask_list, depth_list, args.render_output_dir, separate=True,
391
+ ref_image=image, ref_depth=depth, Width=Width, Height=Height)
data_engine/depth_align.py ADDED
@@ -0,0 +1,418 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os, re
2
+ import json
3
+ import numpy as np
4
+ import cv2
5
+ import torch
6
+ import imageio
7
+ import pyexr
8
+ import trimesh
9
+ from PIL import Image
10
+
11
+ from create_input import render_from_cameras_videos
12
+
13
+
14
+ class DepthAlignMetric:
15
+ """
16
+ 深度缩放与相机参数更新处理器
17
+
18
+ Attributes:
19
+ moge_depth_dir (str): MOGe待处理深度目录
20
+ vggt_depth_dir (str): VGGT待处理深度目录
21
+ vggt_camera_json_file (str): VGGT关联的JSON文件目录
22
+ output_root (str): 输出根目录
23
+ """
24
+
25
+ def __init__(self,
26
+ input_rgb_dir: str,
27
+ moge_depth_dir: str,
28
+ vggt_depth_dir: str,
29
+ metric3d_depth_dir: str,
30
+ vggt_camera_json_file: str,
31
+ output_root: str):
32
+ """
33
+ Args:
34
+ moge_depth_dir: MOGe原始深度路径
35
+ vggt_depth_dir: VGGT原始深度路径
36
+ vggt_camera_json_file: VGGT关联JSON路径
37
+ output_root: 输出根目录,默认为./processed
38
+ """
39
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
40
+
41
+ # align depth and camera pose to metric level
42
+ self.moge_depth_dir = moge_depth_dir
43
+ self.vggt_depth_dir = vggt_depth_dir
44
+ self.metric3d_depth_dir = metric3d_depth_dir
45
+ self.vggt_camera_json_file = vggt_camera_json_file
46
+ self.output_root = output_root
47
+
48
+ # depth to pointmap
49
+ self.metric_intrinsic = None
50
+ self.metric_w2c = None
51
+ self.input_rgb_dir = input_rgb_dir
52
+ self.input_color_paths = []
53
+
54
+
55
+ # output depth / camera pose / pointmap
56
+ self.output_metric_depth_dir = os.path.join(output_root, "output_metric_depth_dir")
57
+ self.output_metric_camera_json = os.path.join(output_root, "output_metric_camera_json")
58
+ self.output_metric_pointmap_dir = os.path.join(output_root, "output_metric_pointmap_dir")
59
+ os.makedirs(self.output_metric_depth_dir, exist_ok=True)
60
+ os.makedirs(self.output_metric_camera_json, exist_ok=True)
61
+ os.makedirs(self.output_metric_pointmap_dir, exist_ok=True)
62
+
63
+ def align_depth_scale(self):
64
+ # align Moge depth to VGGT
65
+ moge_align_depth_list, valid_mask_list = self.scale_moge_depth()
66
+
67
+ # align moge depth and camera pose to metric depth
68
+ self.align_metric_depth(moge_align_depth_list, valid_mask_list)
69
+
70
+
71
+
72
+ def segment_sky_with_oneformer(self, image_path, skyseg_processor, skyseg_model, SKY_CLASS_ID, save_path=None):
73
+ from PIL import Image
74
+ image = Image.open(image_path)
75
+ inputs = skyseg_processor(images=image, task_inputs=["semantic"], return_tensors="pt").to(skyseg_model.device)
76
+
77
+ with torch.no_grad():
78
+ outputs = skyseg_model(**inputs)
79
+
80
+ # 获取语义分割结果
81
+ predicted_semantic_map = skyseg_processor.post_process_semantic_segmentation(outputs, \
82
+ target_sizes=[image.size[::-1]])[0]
83
+
84
+ # 提取天空区域
85
+ sky_mask = (predicted_semantic_map == SKY_CLASS_ID).cpu().numpy().astype(np.uint8) * 255
86
+
87
+ # erosion sky
88
+ kernel = np.ones((3,3), np.uint8)
89
+ sky_mask = cv2.erode(sky_mask, kernel, iterations=1)
90
+
91
+ # 如果需要保存
92
+ if save_path:
93
+ cv2.imwrite(save_path, sky_mask)
94
+
95
+ return sky_mask
96
+
97
+ def get_valid_depth(self, vggt_files, moge_files, input_rgb_files, skyseg_processor, skyseg_model, SKY_CLASS_ID):
98
+ moge_align_depth_list = []
99
+ valid_mask_list = []
100
+ all_valid_max_list = []
101
+
102
+ for vggt_file, moge_file, input_rgb_file in zip(vggt_files, moge_files, input_rgb_files):
103
+ # 读取深度数据
104
+ depth_moge = pyexr.read(os.path.join(self.moge_depth_dir, moge_file)).squeeze()
105
+ depth_vggt = pyexr.read(os.path.join(self.vggt_depth_dir, vggt_file)).squeeze()
106
+ depth_vggt = cv2.resize(depth_vggt, dsize=(depth_moge.shape[1], depth_moge.shape[0]), \
107
+ interpolation=cv2.INTER_LINEAR)
108
+
109
+ depth_vggt = torch.from_numpy(depth_vggt).float().to(self.device)
110
+ depth_moge = torch.from_numpy(depth_moge).float().to(self.device)
111
+
112
+
113
+ # segmentation sky
114
+ sky_ima_path = os.path.join(self.input_rgb_dir, input_rgb_file)
115
+ sky_mask = self.segment_sky_with_oneformer(sky_ima_path, skyseg_processor, skyseg_model, SKY_CLASS_ID)
116
+ sky_mask_tensor = torch.from_numpy(sky_mask).float().to(self.device)
117
+ sky_mask = (sky_mask_tensor > 0) # 天空区域为True
118
+
119
+ valid_masks = ( # (H, W)
120
+ torch.isfinite(depth_moge) &
121
+ (depth_moge > 0) &
122
+ torch.isfinite(depth_vggt) &
123
+ (depth_vggt > 0) &
124
+ ~sky_mask # 非天空区域
125
+ )
126
+
127
+ # depth_moge 无效部分 设置为 有效部分最大值的1.5倍 避免final_align_depth出现负数
128
+ depth_moge[~valid_masks] = depth_moge[valid_masks].max() * 1
129
+
130
+ source_inv_depth = 1.0 / depth_moge
131
+ target_inv_depth = 1.0 / depth_vggt
132
+
133
+ # print(f'倒数值:{source_inv_depth.min()}, {source_inv_depth.max()}') # 0.03 ~ 2.2
134
+
135
+ source_mask, target_mask = valid_masks, valid_masks
136
+
137
+ # Remove outliers 2/8分最合适
138
+ outlier_quantiles = torch.tensor([0.2, 0.8], device=self.device)
139
+
140
+ source_data_low, source_data_high = torch.quantile(
141
+ source_inv_depth[source_mask], outlier_quantiles
142
+ )
143
+ target_data_low, target_data_high = torch.quantile(
144
+ target_inv_depth[target_mask], outlier_quantiles
145
+ )
146
+ source_mask = (source_inv_depth > source_data_low) & (
147
+ source_inv_depth < source_data_high
148
+ )
149
+ target_mask = (target_inv_depth > target_data_low) & (
150
+ target_inv_depth < target_data_high
151
+ )
152
+
153
+
154
+ mask = torch.logical_and(source_mask, target_mask)
155
+ mask = torch.logical_and(mask, valid_masks)
156
+
157
+ source_data = source_inv_depth[mask].view(-1, 1)
158
+ target_data = target_inv_depth[mask].view(-1, 1)
159
+
160
+ ones = torch.ones((source_data.shape[0], 1), device=self.device)
161
+ source_data_h = torch.cat([source_data, ones], dim=1)
162
+ transform_matrix = torch.linalg.lstsq(source_data_h, target_data).solution
163
+
164
+ scale, bias = transform_matrix[0, 0], transform_matrix[1, 0]
165
+ aligned_inv_depth = source_inv_depth * scale + bias
166
+
167
+
168
+ valid_inv_depth = aligned_inv_depth > 0 # 创建新的有效掩码
169
+ valid_masks = valid_masks & valid_inv_depth # 合并到原有效掩码
170
+ valid_mask_list.append(valid_masks)
171
+
172
+ final_align_depth = 1.0 / aligned_inv_depth
173
+ moge_align_depth_list.append(final_align_depth)
174
+
175
+ all_valid_max_list.append(final_align_depth[valid_masks].max().item())
176
+
177
+ return moge_align_depth_list, valid_mask_list, all_valid_max_list
178
+
179
+
180
+ def scale_moge_depth(self):
181
+ vggt_files = sorted(f for f in os.listdir(self.vggt_depth_dir) if f.endswith('.exr'))
182
+ moge_files = sorted(f for f in os.listdir(self.moge_depth_dir) if f.endswith('.exr'))
183
+ input_rgb_files = sorted(f for f in os.listdir(self.input_rgb_dir) if f.endswith('.png'))
184
+
185
+ if len(vggt_files) != len(moge_files):
186
+ raise ValueError("文件数量不匹配")
187
+
188
+ from transformers import OneFormerProcessor, OneFormerForUniversalSegmentation
189
+ skyseg_processor = OneFormerProcessor.from_pretrained("shi-labs/oneformer_coco_swin_large")
190
+ skyseg_model = OneFormerForUniversalSegmentation.from_pretrained("shi-labs/oneformer_coco_swin_large")
191
+ skyseg_model.to(self.device)
192
+ # 定义天空类别的ID 119
193
+ SKY_CLASS_ID = 119
194
+
195
+ moge_align_depth_list, valid_mask_list, all_valid_max_list = self.get_valid_depth(
196
+ vggt_files, moge_files, input_rgb_files, skyseg_processor, skyseg_model, SKY_CLASS_ID
197
+ )
198
+
199
+ # 计算所有帧的有效最大值的中位数
200
+ valid_max_array = np.array(all_valid_max_list)
201
+ q50 = np.quantile(valid_max_array, 0.50) # 计算50%分位点
202
+ filtered_max = valid_max_array[valid_max_array <= q50] # 过滤超过分位点的异常值
203
+
204
+ # 取过滤后数据的最大值(正常范围内的最大值)
205
+ global_avg_max = np.max(filtered_max)
206
+ max_sky_value = global_avg_max * 5
207
+ max_sky_value = np.minimum(max_sky_value, 1000) # 相对深度最远不能超过 1000
208
+
209
+ # 统一设置所有帧的无效区域值
210
+ for i, (moge_depth, valid_mask) in enumerate(zip(moge_align_depth_list, valid_mask_list)):
211
+ moge_depth[~valid_mask] = max_sky_value
212
+
213
+ # 统计超限点占比(在clamp之前)
214
+ over_count = torch.sum(moge_depth > max_sky_value).item()
215
+ total_pixels = moge_depth.numel()
216
+ over_ratio = over_count / total_pixels * 100
217
+
218
+
219
+ moge_depth = torch.clamp(moge_depth, max=max_sky_value)
220
+ moge_align_depth_list[i] = moge_depth # 更新处理后的深度图
221
+
222
+ return moge_align_depth_list, valid_mask_list
223
+
224
+
225
+
226
+ def align_metric_depth(self, moge_align_depth_list, valid_mask_list):
227
+ # 获取metric文件列表
228
+ metric_files = sorted(f for f in os.listdir(self.metric3d_depth_dir) if f.endswith('.exr'))
229
+
230
+ metric_scales_list = []
231
+ # 遍历所有深度图对
232
+ for idx, (metric_file, moge_depth) in enumerate(zip(metric_files, moge_align_depth_list)):
233
+
234
+ depth_metric3d = pyexr.read(os.path.join(self.metric3d_depth_dir, metric_file)).squeeze()
235
+ depth_metric3d = torch.from_numpy(depth_metric3d).float().to(self.device)
236
+
237
+ # 获取对应帧的掩码
238
+ valid_mask = valid_mask_list[idx].to(self.device)
239
+
240
+ # 提取有效区域数据
241
+ valid_metric = depth_metric3d[valid_mask]
242
+ valid_moge = moge_depth[valid_mask]
243
+
244
+ # 分位数差计算
245
+ metric_diff = torch.quantile(valid_metric, 0.8) - torch.quantile(valid_metric, 0.2)
246
+ moge_diff = torch.quantile(valid_moge, 0.8) - torch.quantile(valid_moge, 0.2)
247
+ metric_scale = metric_diff / moge_diff
248
+ metric_scales_list.append(metric_scale.cpu().numpy())
249
+
250
+ # 计算全局平均缩放因子
251
+ metric_scales_mean = np.mean(metric_scales_list)
252
+
253
+ # 应用全局缩放 保存 metric depth
254
+ for idx, (metric_file, moge_depth) in enumerate(zip(metric_files, moge_align_depth_list)):
255
+ metric_moge_depth = (moge_depth * metric_scales_mean).cpu().numpy()
256
+
257
+ # 保存深度文件
258
+ output_path = os.path.join(
259
+ self.output_metric_depth_dir,
260
+ f"{os.path.splitext(metric_file)[0]}_metric.exr"
261
+ )
262
+ pyexr.write(output_path, metric_moge_depth, channel_names=["Y"])
263
+
264
+ # 阶段3:更新相机参数
265
+ with open(self.vggt_camera_json_file, 'r') as f:
266
+ camera_data = json.load(f)
267
+
268
+ # 更新所有帧的平移分量
269
+ for frame_info in camera_data.values():
270
+ w2c_matrix = np.array(frame_info['w2c'])
271
+ w2c_matrix[:3, 3] *= metric_scales_mean # 直接使用计算好的全局缩放因子
272
+ frame_info['w2c'] = w2c_matrix.tolist()
273
+
274
+ # 保存更新后的相机参数
275
+ output_json_path = os.path.join(
276
+ self.output_metric_camera_json,
277
+ os.path.basename(self.vggt_camera_json_file)
278
+ )
279
+ with open(output_json_path, 'w') as f:
280
+ json.dump(camera_data, f, indent=4)
281
+
282
+
283
+ def load_metirc_camera_parameters(self): # 修改:增加color_dir参数
284
+ metric_camera_json = os.path.join(self.output_metric_camera_json, 'colmap_data.json')
285
+ with open(metric_camera_json, 'r') as f:
286
+ data = json.load(f)
287
+
288
+ # load metric camera parameters
289
+ sorted_frames = sorted(data.items(), key=lambda x: int(x[0]))
290
+ first_frame_key, first_frame_data = sorted_frames[0]
291
+ self.metric_intrinsic = [np.array(frame['intrinsic']) for frame in data.values()]
292
+ self.metric_w2c = [np.array(frame['w2c']) for frame in data.values()]
293
+
294
+ # 加载pointmap input rgb 文件路径
295
+ self.input_color_paths = sorted(
296
+ [os.path.join(self.input_rgb_dir, f) for f in os.listdir(self.input_rgb_dir) if f.endswith(".png")],
297
+ key=lambda x: int(os.path.basename(x).split("_")[1].split(".")[0])
298
+ )
299
+
300
+
301
+
302
+ def depth_to_pointmap(self):
303
+
304
+ num_frames = len(self.metric_w2c)
305
+ for frame_index in range(num_frames):
306
+
307
+ exr_path = os.path.join(self.output_metric_depth_dir, f"frame_{frame_index+1:05d}_metric.exr")
308
+ depth_data = pyexr.read(exr_path).squeeze()
309
+ depth_tensor = torch.from_numpy(depth_data).to(self.device, torch.float32)
310
+
311
+
312
+ # 生成点云
313
+ height, width = depth_tensor.shape
314
+ K_tensor = torch.from_numpy(self.metric_intrinsic[frame_index]).to(device=self.device, dtype=torch.float32)
315
+ w2c = torch.from_numpy(self.metric_w2c[frame_index]).to(device=self.device, dtype=torch.float32)
316
+
317
+ camtoworld = torch.inverse(w2c)
318
+
319
+ # 生成相机坐标系坐标
320
+ u = torch.arange(width, device=self.device).float()
321
+ v = torch.arange(height, device=self.device).float()
322
+ u_grid, v_grid = torch.meshgrid(u, v, indexing='xy')
323
+
324
+ fx, fy = K_tensor[0, 0], K_tensor[1, 1]
325
+ cx, cy = K_tensor[0, 2], K_tensor[1, 2]
326
+
327
+ x_cam = (u_grid - cx) * depth_tensor / fx
328
+ y_cam = (v_grid - cy) * depth_tensor / fy
329
+ z_cam = depth_tensor
330
+
331
+ cam_coords_points = torch.stack([x_cam, y_cam, z_cam], dim=-1)
332
+
333
+ R_cam_to_world = camtoworld[:3, :3]
334
+ t_cam_to_world = camtoworld[:3, 3]
335
+ world_coords_points = torch.matmul(cam_coords_points, R_cam_to_world.T) + t_cam_to_world
336
+
337
+
338
+ # # 保存带颜色的点云
339
+ color_numpy = np.array(Image.open(self.input_color_paths[frame_index])) # 读取为HWC
340
+ colors_rgb = color_numpy.reshape(-1, 3) # 转换回HWC并展平
341
+ vertices_3d = world_coords_points.reshape(-1, 3).cpu().numpy()
342
+ point_cloud_data = trimesh.PointCloud(vertices=vertices_3d, colors=colors_rgb)
343
+ point_cloud_data.export(f"{self.output_metric_pointmap_dir}/pcd_{frame_index+1:04d}.ply")
344
+
345
+
346
+ # 保存为pointmap npy
347
+ pointmap_data = world_coords_points.cpu().numpy()
348
+ np.save(f"{self.output_metric_pointmap_dir}/pointmap_{frame_index+1:04d}.npy", pointmap_data)
349
+
350
+
351
+
352
+
353
+ def render_from_cameras(self):
354
+ render_output_dir = os.path.join(self.output_root, "rendered_views")
355
+ os.makedirs(render_output_dir, exist_ok=True)
356
+
357
+ select_frame = 0
358
+ npy_files = sorted(
359
+ [f for f in os.listdir(self.output_metric_pointmap_dir) if f.endswith(".npy")],
360
+ key=lambda x: int(re.findall(r'\d+', x)[0])
361
+ )
362
+
363
+ npy_path = os.path.join(self.output_metric_pointmap_dir, npy_files[select_frame])
364
+
365
+
366
+ # 读取npy_path
367
+ pointmap = np.load(npy_path)
368
+ points = pointmap.reshape(-1, 3)
369
+
370
+ color_numpy = np.array(Image.open(self.input_color_paths[select_frame])) # 读取为HWC
371
+ colors_rgb = color_numpy.reshape(-1, 3) # 转换回HWC并展平
372
+ colors = colors_rgb[:, :3]
373
+
374
+ height, width = cv2.imread(self.input_color_paths[0]).shape[:2]
375
+ renders, masks, _ = render_from_cameras_videos(
376
+ points, colors, self.metric_w2c, self.metric_intrinsic, height, width
377
+ )
378
+
379
+ # 使用imageio保存所有结果
380
+ for i, (render, mask) in enumerate(zip(renders, masks)):
381
+ # 保存渲染图
382
+ render_path = os.path.join(render_output_dir, f"render_{i:04d}.png")
383
+ imageio.imwrite(render_path, render)
384
+
385
+ # 保存掩码图
386
+ mask_path = os.path.join(render_output_dir, f"mask_{i:04d}.png")
387
+ imageio.imwrite(mask_path, mask)
388
+
389
+ print(f"All results saved to: {render_output_dir}")
390
+
391
+
392
+
393
+
394
+
395
+ if __name__ == "__main__":
396
+ import argparse
397
+ parser = argparse.ArgumentParser(description="Depth alignment and metric processing.")
398
+ parser.add_argument('--image_dir', type=str, required=True, help='Input RGB directory')
399
+ parser.add_argument('--moge_depth_dir', type=str, required=True, help='MOGe depth directory')
400
+ parser.add_argument('--vggt_depth_dir', type=str, required=True, help='VGGT depth directory')
401
+ parser.add_argument('--metric3d_depth_dir', type=str, required=True, help='Metric3D depth directory')
402
+ parser.add_argument('--vggt_camera_json_file', type=str, required=True, help='VGGT camera JSON file')
403
+ parser.add_argument('--output_dir', type=str, required=True, help='Output root directory')
404
+ args = parser.parse_args()
405
+
406
+ depth_align_processor = DepthAlignMetric(
407
+ input_rgb_dir=args.image_dir,
408
+ moge_depth_dir=args.moge_depth_dir,
409
+ vggt_depth_dir=args.vggt_depth_dir,
410
+ metric3d_depth_dir=args.metric3d_depth_dir,
411
+ vggt_camera_json_file=args.vggt_camera_json_file,
412
+ output_root=args.output_dir
413
+ )
414
+
415
+ depth_align_processor.align_depth_scale()
416
+ depth_align_processor.load_metirc_camera_parameters()
417
+ depth_align_processor.depth_to_pointmap()
418
+ depth_align_processor.render_from_cameras()
data_engine/metric3d_infer.py ADDED
@@ -0,0 +1,115 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ import cv2
3
+ import argparse
4
+ import torch
5
+ import itertools
6
+ import json
7
+ from pathlib import Path
8
+ from typing import *
9
+ import pyexr
10
+
11
+ def main(image_dir, intrinsic_path, output_dir):
12
+ os.makedirs(output_dir, exist_ok=True)
13
+
14
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
15
+
16
+ include_suffices = ['jpg', 'png', 'jpeg', 'JPG', 'PNG', 'JPEG']
17
+ image_paths = sorted(itertools.chain(*(Path(image_dir).rglob(f'*.{suffix}') for suffix in include_suffices)))
18
+
19
+ # load model
20
+ model = torch.hub.load("Metric3D", 'metric3d_vit_giant2', pretrain=True, source='local')
21
+ model = model.to(device)
22
+ model.eval()
23
+
24
+ with open(intrinsic_path, 'r') as f:
25
+ colmap_data = json.load(f)
26
+
27
+ # Sort JSON keys by frame number (001, 002...109)
28
+ sorted_frame_ids = sorted(colmap_data.keys(), key=lambda x: int(x))
29
+ # Generate intrinsic list in order
30
+ intrinsic_list = [colmap_data[frame_id]['intrinsic'] for frame_id in sorted_frame_ids]
31
+
32
+ if len(image_paths) != len(intrinsic_list):
33
+ raise ValueError(f"Number of images ({len(image_paths)}) does not match JSON frames ({len(intrinsic_list)})")
34
+
35
+ # Check existing EXR files in output directory
36
+ output_exr_files = list(Path(output_dir).glob('*.exr'))
37
+ if len(output_exr_files) >= len(image_paths):
38
+ return
39
+
40
+ for idx, image_path in enumerate(image_paths):
41
+ # Get corresponding intrinsic data by index
42
+ intrinsic_data = intrinsic_list[idx]
43
+ fx = intrinsic_data[0][0]
44
+ fy = intrinsic_data[1][1]
45
+ cx = intrinsic_data[0][2]
46
+ cy = intrinsic_data[1][2]
47
+ intrinsic = [fx, fy, cx, cy]
48
+
49
+ # print(f"Processing image {image_path}")
50
+
51
+ # load image
52
+ rgb_origin = cv2.imread(str(image_path))[:, :, ::-1]
53
+
54
+ # Adjust input size to fit pretrained model
55
+ input_size = (616, 1064) # for vit model
56
+ h, w = rgb_origin.shape[:2]
57
+ scale = min(input_size[0] / h, input_size[1] / w)
58
+ rgb = cv2.resize(rgb_origin, (int(w * scale), int(h * scale)), interpolation=cv2.INTER_LINEAR)
59
+ # Remember to scale intrinsic, hold depth
60
+ intrinsic = [intrinsic[0] * scale, intrinsic[1] * scale, intrinsic[2] * scale, intrinsic[3] * scale]
61
+ # Padding to input_size
62
+ padding = [123.675, 116.28, 103.53]
63
+ h, w = rgb.shape[:2]
64
+ pad_h = input_size[0] - h
65
+ pad_w = input_size[1] - w
66
+ pad_h_half = pad_h // 2
67
+ pad_w_half = pad_w // 2
68
+ rgb = cv2.copyMakeBorder(rgb, pad_h_half, pad_h - pad_h_half, \
69
+ pad_w_half, pad_w - pad_w_half, cv2.BORDER_CONSTANT, value=padding)
70
+ pad_info = [pad_h_half, pad_h - pad_h_half, pad_w_half, pad_w - pad_w_half]
71
+
72
+ # Normalize
73
+ mean = torch.tensor([123.675, 116.28, 103.53]).float()[:, None, None]
74
+ std = torch.tensor([58.395, 57.12, 57.375]).float()[:, None, None]
75
+ rgb = torch.from_numpy(rgb.transpose((2, 0, 1))).float()
76
+ rgb = torch.div((rgb - mean), std)
77
+ rgb = rgb[None, :, :, :].cuda()
78
+
79
+ # Canonical camera space
80
+ # inference
81
+ with torch.no_grad():
82
+ pred_depth, _, _ = model.inference({'input': rgb})
83
+
84
+ # Unpad
85
+ pred_depth = pred_depth.squeeze()
86
+ pred_depth = pred_depth[pad_info[0] : pred_depth.shape[0] - pad_info[1], \
87
+ pad_info[2] : pred_depth.shape[1] - pad_info[3]]
88
+
89
+ # Upsample to original size
90
+ pred_depth = torch.nn.functional.interpolate(pred_depth[None, None, :, :], \
91
+ rgb_origin.shape[:2], mode='bilinear').squeeze()
92
+
93
+ # Canonical camera space
94
+
95
+ # De-canonical transform
96
+ canonical_to_real_scale = intrinsic[0] / 1000.0 # 1000.0 is the focal length of canonical camera
97
+ pred_depth = pred_depth * canonical_to_real_scale # now the depth is metric
98
+
99
+ depth = pred_depth.cpu().numpy()
100
+
101
+ exr_output_dir = Path(output_dir)
102
+ exr_output_dir.mkdir(exist_ok=True, parents=True)
103
+
104
+ # Construct filename (use image_path stem directly)
105
+ filename = f"{image_path.stem}.exr"
106
+ save_file = exr_output_dir.joinpath(filename)
107
+ pyexr.write(save_file, depth[..., None], channel_names=["Y"])
108
+
109
+ if __name__ == "__main__":
110
+ parser = argparse.ArgumentParser(description="Run metric3d data engine.")
111
+ parser.add_argument('--image_dir', type=str, required=True, help='Path to input images directory')
112
+ parser.add_argument('--intrinsic_path', type=str, required=True, help='Path to intrinsic file')
113
+ parser.add_argument('--output_dir', type=str, required=True, help='Path to output directory')
114
+ args = parser.parse_args()
115
+ main(args.image_dir, args.intrinsic_path, args.output_dir)
data_engine/moge_infer.py ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
3
+ from pathlib import Path
4
+ import sys
5
+ if (_package_root := str(Path(__file__).absolute().parents[2])) not in sys.path:
6
+ sys.path.insert(0, _package_root)
7
+ from typing import *
8
+ import itertools
9
+ import cv2
10
+ import torch
11
+
12
+
13
+ original_cwd = os.getcwd()
14
+ moge_dir = os.path.join(original_cwd, 'MoGe')
15
+ try:
16
+ os.chdir(moge_dir)
17
+ if moge_dir not in sys.path:
18
+ sys.path.insert(0, moge_dir)
19
+ from moge.model.v1 import MoGeModel
20
+ finally:
21
+ os.chdir(original_cwd)
22
+
23
+
24
+ def main(image_dir, output_dir):
25
+ os.makedirs(output_dir, exist_ok=True)
26
+
27
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
28
+
29
+ # load model
30
+ model = MoGeModel.from_pretrained("Ruicheng/moge-vitl").to(device)
31
+ model.eval()
32
+
33
+ include_suffices = ['jpg', 'png', 'jpeg', 'JPG', 'PNG', 'JPEG']
34
+ image_paths = sorted(itertools.chain(*(Path(image_dir).rglob(f'*.{suffix}') for suffix in include_suffices)))
35
+
36
+
37
+ # 检查输出目录中已有的EXR文件数量
38
+ output_exr_files = list(Path(output_dir).glob('*.exr'))
39
+ if len(output_exr_files) >= len(image_paths):
40
+ return
41
+
42
+ for image_path in image_paths:
43
+ image = cv2.cvtColor(cv2.imread(str(image_path)), cv2.COLOR_BGR2RGB)
44
+ image_tensor = torch.tensor(image / 255, dtype=torch.float32, device=device).permute(2, 0, 1)
45
+
46
+ # Inference
47
+ output = model.infer(image_tensor, fov_x=None, resolution_level=9, num_tokens=None, use_fp16=True)
48
+ depth = output['depth'].cpu().numpy()
49
+
50
+ exr_output_dir = Path(output_dir)
51
+ exr_output_dir.mkdir(exist_ok=True, parents=True)
52
+
53
+ # 构造文件名(直接使用 image_path 的 stem)
54
+ filename = f"{image_path.stem}.exr"
55
+
56
+ # 路径拼接(不使用 / 符号)
57
+ save_file = exr_output_dir.joinpath(filename)
58
+
59
+ # 保存深度图
60
+ cv2.imwrite(str(save_file), depth, [cv2.IMWRITE_EXR_TYPE, cv2.IMWRITE_EXR_TYPE_FLOAT])
61
+
62
+
63
+ if __name__ == "__main__":
64
+ import argparse
65
+ parser = argparse.ArgumentParser(description="Run MoGe depth estimation.")
66
+ parser.add_argument('--image_dir', type=str, required=True, help='Path to input images directory')
67
+ parser.add_argument('--output_dir', type=str, required=True, help='Path to output directory')
68
+ args = parser.parse_args()
69
+ main(args.image_dir, args.output_dir)
70
+
71
+
72
+
73
+
data_engine/requirements.txt ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ torch==2.3.1
2
+ torchvision==0.18.1
3
+ numpy==1.26.1
4
+ Pillow
5
+ scipy
6
+ huggingface_hub
7
+ einops
8
+ safetensors
9
+ opencv-python
10
+ pyexr
11
+ mmengine
12
+ timm
13
+ imageio
14
+ trimesh
15
+ transformers==4.49
16
+ git+https://github.com/EasternJournalist/utils3d.git@c5daf6f6c244d251f252102d09e9b7bcef791a38
data_engine/run.sh ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/bin/bash
2
+
3
+ IMAGE_DIR="your_input_path"
4
+ OUTPUT_DIR="your_output_path"
5
+ CUDA_DEVICE=0
6
+
7
+ # 1. run vggt.py
8
+ CUDA_VISIBLE_DEVICES=$CUDA_DEVICE python3 vggt_infer.py --image_dir "$IMAGE_DIR" --output_dir "$OUTPUT_DIR/vggt"
9
+
10
+ # # 2. run moge.py
11
+ CUDA_VISIBLE_DEVICES=$CUDA_DEVICE python3 moge_infer.py --image_dir "$IMAGE_DIR" --output_dir "$OUTPUT_DIR/moge"
12
+
13
+ # # 3. run metric3d.py
14
+ INTRINSIC_PATH="$OUTPUT_DIR/vggt/colmap_data.json"
15
+ CUDA_VISIBLE_DEVICES=$CUDA_DEVICE python3 metric3d_infer.py --image_dir "$IMAGE_DIR" --output_dir "$OUTPUT_DIR/metric3d" --intrinsic_path "$INTRINSIC_PATH"
16
+
17
+ # # 4. conduct depth alignment
18
+ MOGE_DEPTH_DIR="$OUTPUT_DIR/moge"
19
+ VGGT_DEPTH_DIR="$OUTPUT_DIR/vggt"
20
+ METRIC3D_DEPTH_DIR="$OUTPUT_DIR/metric3d"
21
+ CUDA_VISIBLE_DEVICES=$CUDA_DEVICE python3 depth_align.py \
22
+ --image_dir "$IMAGE_DIR" \
23
+ --moge_depth_dir "$MOGE_DEPTH_DIR" \
24
+ --vggt_depth_dir "$VGGT_DEPTH_DIR/depth" \
25
+ --metric3d_depth_dir "$METRIC3D_DEPTH_DIR" \
26
+ --vggt_camera_json_file "$OUTPUT_DIR/vggt/colmap_data.json" \
27
+ --output_dir "$OUTPUT_DIR/final"
data_engine/vggt_infer.py ADDED
@@ -0,0 +1,242 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import os
2
+ os.environ["OPENCV_IO_ENABLE_OPENEXR"]="1"
3
+ import argparse
4
+ import numpy as np
5
+ import torch
6
+ import glob
7
+ from scipy.spatial.transform import Rotation
8
+ import sys
9
+ from PIL import Image
10
+ import cv2
11
+ import json
12
+
13
+ # Store original working directory and add VGGT to path
14
+ original_cwd = os.getcwd()
15
+ vggt_dir = os.path.join(original_cwd, 'vggt')
16
+ try:
17
+ os.chdir(vggt_dir)
18
+ if vggt_dir not in sys.path:
19
+ sys.path.insert(0, vggt_dir)
20
+ # Import VGGT modules for pose estimation and depth prediction
21
+ from vggt.models.vggt import VGGT
22
+ from vggt.utils.load_fn import load_and_preprocess_images
23
+ from vggt.utils.pose_enc import pose_encoding_to_extri_intri
24
+ from vggt.utils.geometry import unproject_depth_map_to_point_map
25
+ finally:
26
+ os.chdir(original_cwd)
27
+
28
+
29
+ def process_images_with_vggt(info, image_names, model, device):
30
+ original_images, original_width, original_height = info
31
+ # Preprocess images for VGGT model input
32
+ images = load_and_preprocess_images(image_names).to(device)
33
+
34
+ # Use bfloat16 for newer GPUs, float16 for older ones
35
+ dtype = torch.bfloat16 if torch.cuda.get_device_capability()[0] >= 8 else torch.float16
36
+
37
+ # Run inference with automatic mixed precision
38
+ with torch.no_grad():
39
+ with torch.cuda.amp.autocast(dtype=dtype):
40
+ predictions = model(images)
41
+
42
+ # Convert pose encoding to extrinsic and intrinsic matrices
43
+ extrinsic, intrinsic = pose_encoding_to_extri_intri(predictions["pose_enc"], images.shape[-2:])
44
+ predictions["extrinsic"] = extrinsic
45
+ predictions["intrinsic"] = intrinsic
46
+
47
+ # Convert tensors to numpy arrays and remove batch dimension
48
+ for key in predictions.keys():
49
+ if isinstance(predictions[key], torch.Tensor):
50
+ predictions[key] = predictions[key].cpu().numpy().squeeze(0) # remove batch dimension
51
+
52
+ # Extract depth map and convert to world coordinates
53
+ depth_map = predictions["depth"] # (S, H, W, 1)
54
+ world_points = unproject_depth_map_to_point_map(depth_map, predictions["extrinsic"], predictions["intrinsic"])
55
+ predictions["world_points_from_depth"] = world_points
56
+
57
+ # Store original images and their metadata
58
+ predictions["original_images"] = original_images
59
+
60
+ # Normalize images to [0, 1] range and resize to match depth map dimensions
61
+ S, H, W = world_points.shape[:3]
62
+ normalized_images = np.zeros((S, H, W, 3), dtype=np.float32)
63
+
64
+ for i, img in enumerate(original_images):
65
+ resized_img = cv2.resize(img, (W, H))
66
+ normalized_images[i] = resized_img / 255.0
67
+
68
+ predictions["images"] = normalized_images
69
+ predictions["original_width"] = original_width
70
+ predictions["original_height"] = original_height
71
+
72
+ return predictions, image_names
73
+
74
+
75
+ def process_images(image_dir, model, device):
76
+ """
77
+ Process images with VGGT model to extract pose, depth, and camera parameters.
78
+
79
+ Args:
80
+ image_dir (str): Directory containing input images
81
+ model: VGGT model instance
82
+ device: PyTorch device (CPU/GPU)
83
+
84
+ Returns:
85
+ tuple: (predictions dict, image_names list)
86
+ """
87
+ # Find all image files in the directory
88
+ image_names = glob.glob(os.path.join(image_dir, "*"))
89
+ image_names = sorted([f for f in image_names if f.lower().endswith(('.png', '.jpg', '.jpeg'))])
90
+
91
+ # Limit to 400 images to prevent memory issues
92
+ if len(image_names) > 400:
93
+ image_names = image_names[:400]
94
+
95
+ if len(image_names) == 0:
96
+ raise ValueError(f"No images found in {image_dir}")
97
+
98
+ # Store original images and their dimensions
99
+ original_images = []
100
+ original_width = None
101
+ original_height = None
102
+
103
+ # Get dimensions from the first image
104
+ first_image = Image.open(image_names[0])
105
+ original_width, original_height = first_image.size
106
+
107
+ # Load all images as numpy arrays
108
+ for img_path in image_names:
109
+ img = Image.open(img_path).convert('RGB')
110
+ original_images.append(np.array(img))
111
+
112
+ return process_images_with_vggt((original_images, original_width, original_height), image_names, model, device)
113
+
114
+
115
+ def extrinsic_to_colmap_format(extrinsics):
116
+ """
117
+ Convert extrinsic matrices from VGGT format to COLMAP format.
118
+
119
+ VGGT uses camera-to-world transformation matrices (R|t),
120
+ while COLMAP uses quaternion + translation format.
121
+
122
+ Args:
123
+ extrinsics (np.ndarray): Extrinsic matrices in shape (N, 4, 4)
124
+
125
+ Returns:
126
+ tuple: (quaternions array, translations array)
127
+ """
128
+ num_cameras = extrinsics.shape[0]
129
+ quaternions = []
130
+ translations = []
131
+
132
+ for i in range(num_cameras):
133
+ # Extract rotation matrix and translation vector
134
+ # VGGT's extrinsic is camera-to-world (R|t) format
135
+ R = extrinsics[i, :3, :3]
136
+ t = extrinsics[i, :3, 3]
137
+
138
+ # Convert rotation matrix to quaternion
139
+ # COLMAP quaternion format is [qw, qx, qy, qz]
140
+ rot = Rotation.from_matrix(R)
141
+ quat = rot.as_quat() # scipy returns [x, y, z, w]
142
+ quat = np.array([quat[3], quat[0], quat[1], quat[2]]) # Convert to [w, x, y, z]
143
+
144
+ quaternions.append(quat)
145
+ translations.append(t)
146
+
147
+ return np.array(quaternions), np.array(translations)
148
+
149
+ def ToR(q):
150
+ """
151
+ Convert quaternion to rotation matrix.
152
+
153
+ Args:
154
+ q (np.ndarray): Quaternion in [w, x, y, z] format
155
+
156
+ Returns:
157
+ np.ndarray: 3x3 rotation matrix
158
+ """
159
+ return np.eye(3) + 2 * np.array((
160
+ (-q[2] * q[2] - q[3] * q[3],
161
+ q[1] * q[2] - q[3] * q[0],
162
+ q[1] * q[3] + q[2] * q[0]),
163
+ ( q[1] * q[2] + q[3] * q[0],
164
+ -q[1] * q[1] - q[3] * q[3],
165
+ q[2] * q[3] - q[1] * q[0]),
166
+ ( q[1] * q[3] - q[2] * q[0],
167
+ q[2] * q[3] + q[1] * q[0],
168
+ -q[1] * q[1] - q[2] * q[2])))
169
+
170
+ def main(image_dir, output_dir):
171
+ """
172
+ Main function to process images with VGGT and save results in COLMAP format.
173
+
174
+ Args:
175
+ image_dir (str): Directory containing input images
176
+ output_dir (str): Directory to save output files
177
+ """
178
+ # Create output directories
179
+ os.makedirs(output_dir, exist_ok=True)
180
+ os.makedirs(os.path.join(output_dir, 'depth'), exist_ok=True)
181
+
182
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
183
+
184
+ # Load pre-trained VGGT model
185
+ model = VGGT.from_pretrained("facebook/VGGT-1B").to(device)
186
+ model.eval()
187
+
188
+ # Process images to get predictions
189
+ predictions, image_names = process_images(image_dir, model, device)
190
+
191
+ # Convert extrinsic matrices to COLMAP format
192
+ quaternions, translations = extrinsic_to_colmap_format(predictions["extrinsic"])
193
+
194
+ save_dict = {}
195
+
196
+ # Extract predictions
197
+ depth = predictions["depth"]
198
+ intrinsic = predictions["intrinsic"]
199
+ height, width = predictions["depth"].shape[1:3]
200
+ ori_height, ori_width = predictions["original_height"], predictions["original_width"]
201
+
202
+ # Calculate scaling factors for intrinsic matrix adjustment
203
+ s_height, s_width = ori_height / height, ori_width / width
204
+
205
+ # Process each frame and save results
206
+ for i, (image_name, depth, intrinsic, quaternion, translation) \
207
+ in enumerate(zip(image_names, depth, intrinsic, quaternions, translations)):
208
+ # Convert quaternion back to rotation matrix
209
+ qw, qx, qy, qz = quaternion
210
+ rot = ToR(np.array([qw, qx, qy, qz]))
211
+ trans = translation.reshape(3,1)
212
+
213
+ # Construct world-to-camera transformation matrix
214
+ bottom = np.array([[0, 0, 0, 1]])
215
+ w2c = np.concatenate([np.concatenate([rot, trans], 1), bottom], axis=0)
216
+
217
+ # Scale intrinsic matrix to original image dimensions
218
+ intrinsic[0, :] = intrinsic[0, :] * s_width
219
+ intrinsic[1, :] = intrinsic[1, :] * s_height
220
+
221
+ # Save depth map as EXR file
222
+ cv2.imwrite(os.path.join(output_dir, 'depth', f"frame_{(i+1):05d}.exr"), depth, \
223
+ [cv2.IMWRITE_EXR_TYPE, cv2.IMWRITE_EXR_TYPE_FLOAT])
224
+
225
+ # Store metadata for this frame
226
+ save_dict[f"{(i+1):03d}"] = {
227
+ 'image_path': image_name,
228
+ 'depth_path': os.path.join(output_dir, 'depth', f"frame_{(i+1):05d}.exr"),
229
+ 'intrinsic': intrinsic.tolist(),
230
+ 'w2c': w2c.tolist()
231
+ }
232
+
233
+ # Save all metadata to JSON file
234
+ with open(os.path.join(output_dir, "colmap_data.json"), "w") as f:
235
+ json.dump(save_dict, f, indent=2, sort_keys=True)
236
+
237
+ if __name__ == "__main__":
238
+ parser = argparse.ArgumentParser(description="Run VGGT data engine.")
239
+ parser.add_argument('--image_dir', type=str, required=True, help='Path to input images directory')
240
+ parser.add_argument('--output_dir', type=str, required=True, help='Path to output directory')
241
+ args = parser.parse_args()
242
+ main(args.image_dir, args.output_dir)
examples/case1/condition.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a43037f6ba360f5ba23057915bcde3c30e4fa8243a1eee71a56ed64e5e4e1dbf
3
+ size 1643970
examples/case1/depth_range.json ADDED
@@ -0,0 +1 @@
 
 
1
+ [[9.91478993182587e-05, 2.647638951211892], [9.914810411672214e-05, 2.4355530913722396], [9.914830891603162e-05, 2.3077817822460434], [9.914851371618718e-05, 2.194297811262402], [9.91487185171888e-05, 2.196958148950279], [9.914892331903648e-05, 2.2072505367747435], [9.914912812173028e-05, 2.1358896445754723], [9.914933292527014e-05, 2.047063472773168], [9.914953772965609e-05, 2.002349494087814], [9.914974253488813e-05, 1.9712810579735458], [9.914994734096629e-05, 1.9542765140541523], [9.915015214789053e-05, 1.9459478144620017], [9.915035695566091e-05, 1.9334704446874382], [9.91505617642774e-05, 1.9175207106434453], [9.915076657373999e-05, 1.9033502356026064], [9.915097138404873e-05, 1.8968486332301608], [9.915117619520361e-05, 1.8871119904036115], [9.91513810072046e-05, 1.8756899634765285], [9.915158582005176e-05, 1.864566286809656], [9.915179063374506e-05, 1.8588014983352468], [9.91519954482845e-05, 1.8482962660155475], [9.915220026367011e-05, 1.8378977369352547], [9.915240507990189e-05, 1.8292827854210325], [9.915260989697982e-05, 1.8225532686417305], [9.915281471490394e-05, 1.8168845699173484], [9.915301953367424e-05, 1.8049296168769355], [9.915322435329072e-05, 1.7979497614950548], [9.91534291737534e-05, 1.7899648952690277], [9.915363399506226e-05, 1.7831444061130879], [9.915383881721732e-05, 1.77311570913456], [9.91540436402186e-05, 1.7608935808988886], [9.91542484640661e-05, 1.7496406485611566], [9.915445328875979e-05, 1.7404270656010208], [9.915465811429972e-05, 1.7232678913959096], [9.915486294068587e-05, 1.718055030436657], [9.915506776791825e-05, 1.7060874697098474], [9.915527259599688e-05, 1.7015925585539982], [9.915547742492175e-05, 1.698395584543132], [9.915568225469284e-05, 1.6881041196009434], [9.915588708531022e-05, 1.675175538019994], [9.915609191677385e-05, 1.6656212871015514], [9.915629674908372e-05, 1.7131719886227728], [9.915650158223988e-05, 1.7684776113947345], [9.915670641624231e-05, 1.8252192219306747], [9.915691125109102e-05, 1.8831473435605108], [9.915711608678602e-05, 1.9464383136952426], [9.91573209233273e-05, 2.011410073075329], [9.915752576071487e-05, 2.0713420149009756], [9.915773059894876e-05, 2.129099831803688]]
examples/case1/prompt.txt ADDED
@@ -0,0 +1 @@
 
 
1
+ An old-fashioned European village with thatched roofs on the houses.
examples/case1/ref_depth.exr ADDED

Git LFS Details

  • SHA256: 3e854ac67a0bcd78007d4d7c2a6a0f5aebbaad182d0fea0ee507c58b9cf5ff1b
  • Pointer size: 132 Bytes
  • Size of remote file: 2.3 MB
examples/case1/ref_image.png ADDED

Git LFS Details

  • SHA256: d34dc5e0f5562b94dcd692c3860b01acbc807be72ee57c333db362f1b9c68ac1
  • Pointer size: 132 Bytes
  • Size of remote file: 1.05 MB
examples/case1/video_input/depth_0000.exr ADDED

Git LFS Details

  • SHA256: 3e854ac67a0bcd78007d4d7c2a6a0f5aebbaad182d0fea0ee507c58b9cf5ff1b
  • Pointer size: 132 Bytes
  • Size of remote file: 2.3 MB
examples/case1/video_input/depth_0001.exr ADDED

Git LFS Details

  • SHA256: f401874d6a377b4e598b6e02f69b85c35d5cf739d655da021d55b3ef3bab4475
  • Pointer size: 132 Bytes
  • Size of remote file: 2.21 MB
examples/case1/video_input/depth_0002.exr ADDED

Git LFS Details

  • SHA256: 68e24522e23b0aee855b99333061443e3aa7702dd423f11ecef80e4070329510
  • Pointer size: 132 Bytes
  • Size of remote file: 2.22 MB
examples/case1/video_input/depth_0003.exr ADDED

Git LFS Details

  • SHA256: 98a858fd1d377b8f8623423d094d13b44c27b6fc2bf2c03af28b63faf699006a
  • Pointer size: 132 Bytes
  • Size of remote file: 2.22 MB
examples/case1/video_input/depth_0004.exr ADDED

Git LFS Details

  • SHA256: 6a523f22b49b6fb54eb6fccacdfacfc4527dd0ff5c64074c48ed2dfbd4cab71e
  • Pointer size: 132 Bytes
  • Size of remote file: 2.2 MB
examples/case1/video_input/depth_0005.exr ADDED

Git LFS Details

  • SHA256: 4717876ef4277a69dc90935e8c5087bcdceb66758bdfe6423d79004495640118
  • Pointer size: 132 Bytes
  • Size of remote file: 2.21 MB
examples/case1/video_input/depth_0006.exr ADDED

Git LFS Details

  • SHA256: 42581f740da6d39f220427c4d641b06103dc5d1a56e26a72e43b41a71dda7059
  • Pointer size: 132 Bytes
  • Size of remote file: 2.2 MB
examples/case1/video_input/depth_0007.exr ADDED

Git LFS Details

  • SHA256: e17efb75ee6d7f76d4e635981f7b91c77bc9e10a55b85a0f7f16357fe1e60815
  • Pointer size: 132 Bytes
  • Size of remote file: 2.2 MB
examples/case1/video_input/depth_0008.exr ADDED

Git LFS Details

  • SHA256: b427d934cfb60ab694919294bb83db86cd8a6d4b59c9bae2a993ed3df62a70b6
  • Pointer size: 132 Bytes
  • Size of remote file: 2.19 MB
examples/case1/video_input/depth_0009.exr ADDED

Git LFS Details

  • SHA256: ab8c8d7f17a6f5140329dcddee27a3f9484abeef5930422e9acd29b55bdfed96
  • Pointer size: 132 Bytes
  • Size of remote file: 2.18 MB
examples/case1/video_input/depth_0010.exr ADDED

Git LFS Details

  • SHA256: 13b11700a3b278d31a5d8ee4f82f19e0ac37779b2df6987e88792c71bb56ff1e
  • Pointer size: 132 Bytes
  • Size of remote file: 2.18 MB
examples/case1/video_input/depth_0011.exr ADDED

Git LFS Details

  • SHA256: 1d92a61ad5debf501ac676ed65b985f7cf7904d16feba7b73e1377c95eb0009d
  • Pointer size: 132 Bytes
  • Size of remote file: 2.17 MB
examples/case1/video_input/depth_0012.exr ADDED

Git LFS Details

  • SHA256: a4e66162822a866f61375b81fa67dee47d789642b1806ed92674870c82192013
  • Pointer size: 132 Bytes
  • Size of remote file: 2.18 MB