Image-to-3D
3d
Manojb commited on
Commit
f1ccd24
·
verified ·
1 Parent(s): 61e075c

Cloned from stabilityai/TripoSR

Browse files
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ figures/output_examples.mp4 filter=lfs diff=lfs merge=lfs -text
37
+ figures/input800.mp4 filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,56 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - allenai/objaverse
4
+ tags:
5
+ - 3d
6
+ extra_gated_fields:
7
+ Name: text
8
+ Email: text
9
+ Country: text
10
+ Organization or Affiliation: text
11
+ I ALLOW Stability AI to email me about new model releases: checkbox
12
+ license: mit
13
+ pipeline_tag: image-to-3d
14
+ ---
15
+
16
+ > Try our new model: **SF3D** with several improvements such as faster generation and more game-ready assets.
17
+ >
18
+ > The model is available [here](https://huggingface.co/stabilityai/stable-fast-3d) and we also have a [demo](https://huggingface.co/spaces/stabilityai/stable-fast-3d).
19
+
20
+ # TripoSR
21
+ ![](figures/input800.mp4)
22
+ TripoSR is a fast and feed-forward 3D generative model developed in collaboration between Stability AI and Tripo AI.
23
+
24
+
25
+ ## Model Details
26
+
27
+ ### Model Description
28
+
29
+ We closely follow [LRM](https://arxiv.org/abs/2311.04400) network architecture for the model design, where TripoSR incorporates a series of technical advancements over the LRM model in terms of both data curation as well as model and training improvements. For more technical details and evaluations, please refer to [our tech report](https://arxiv.org/abs/2403.02151).
30
+
31
+ * **Developed by**: [Stability AI](https://stability.ai/), [Tripo AI](https://tripo3d.ai/)
32
+ * **Model type**: Feed-forward 3D reconstruction from a single image
33
+ * **License**: MIT
34
+ * **Hardware**: We train `TripoSR` for 5 days on 22 GPU nodes each with 8 A100 40GB GPUs
35
+
36
+ ### Model Sources
37
+
38
+ * **Repository**: https://github.com/VAST-AI-Research/TripoSR
39
+ * **Tech report**: https://arxiv.org/abs/2403.02151
40
+ * **Demo**: https://huggingface.co/spaces/stabilityai/TripoSR
41
+
42
+ ### Training Dataset
43
+
44
+ We use renders from the [Objaverse](https://objaverse.allenai.org/objaverse-1.0) dataset, utilizing our enhanced rendering method that more closely replicate the distribution of images found in the real world, significantly improving our model’s ability to generalize. We selected a carefully curated subset of the Objaverse dataset for the training data, which is available under the CC-BY license.
45
+
46
+
47
+ ## Usage
48
+
49
+ * For usage instructions, please refer to our [TripoSR GitHub repository](https://github.com/VAST-AI-Research/TripoSR)
50
+
51
+ * You can also try it in [our gradio demo](https://huggingface.co/spaces/stabilityai/TripoSR)
52
+
53
+
54
+ ### Misuse, Malicious Use, and Out-of-Scope Use
55
+
56
+ The model should not be used to intentionally create or disseminate 3D models that people would foreseeably find disturbing, distressing, or offensive; or content that propagates historical or current stereotypes.
config.yaml ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ cond_image_size: 512
2
+
3
+ image_tokenizer_cls: tsr.models.tokenizers.image.DINOSingleImageTokenizer
4
+ image_tokenizer:
5
+ pretrained_model_name_or_path: "facebook/dino-vitb16"
6
+
7
+ tokenizer_cls: tsr.models.tokenizers.triplane.Triplane1DTokenizer
8
+ tokenizer:
9
+ plane_size: 32
10
+ num_channels: 1024
11
+
12
+ backbone_cls: tsr.models.transformer.transformer_1d.Transformer1D
13
+ backbone:
14
+ in_channels: ${tokenizer.num_channels}
15
+ num_attention_heads: 16
16
+ attention_head_dim: 64
17
+ num_layers: 16
18
+ cross_attention_dim: 768
19
+
20
+ post_processor_cls: tsr.models.network_utils.TriplaneUpsampleNetwork
21
+ post_processor:
22
+ in_channels: 1024
23
+ out_channels: 40
24
+
25
+ decoder_cls: tsr.models.network_utils.NeRFMLP
26
+ decoder:
27
+ in_channels: 120 # 3 * 40
28
+ n_neurons: 64
29
+ n_hidden_layers: 9
30
+ activation: silu
31
+
32
+ renderer_cls: tsr.models.nerf_renderer.TriplaneNeRFRenderer
33
+ renderer:
34
+ radius: 0.87 # slightly larger than 0.5 * sqrt(3)
35
+ feature_reduction: concat
36
+ density_activation: exp
37
+ density_bias: -1.0
38
+ num_samples_per_ray: 128
figures/input800.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b2da16c42c2e03139e297d6930bcff7a99c595cc100b545861519793c662500f
3
+ size 148042
figures/output_examples.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c63a5fe7afea93549cc412fab612a1c2e5a46844fa75c5ff4eee892b9d3bbc4e
3
+ size 2425685
model.ckpt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:429e2c6b22a0923967459de24d67f05962b235f79cde6b032aa7ed2ffcd970ee
3
+ size 1677246742