TerraTorch
Earth Observation
TerraMind
IBM
ESA
blumenstiel commited on
Commit
06ba86e
·
1 Parent(s): 3e763e7

Update ReadMe

Browse files

Signed-off-by: Benedikt Blumenstiel <[email protected]>

Files changed (2) hide show
  1. .gitattributes +1 -0
  2. README.md +163 -1
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,6 +1,168 @@
1
  ---
2
  license: apache-2.0
3
  library_name: terratorch
 
 
 
 
 
 
 
4
  ---
 
 
 
 
 
 
 
5
 
6
- See base model for more information.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  library_name: terratorch
4
+ datasets:
5
+ - ibm-esa-geospatial/TerraMesh
6
+ tags:
7
+ - Earth Observation
8
+ - TerraMind
9
+ - IBM
10
+ - ESA
11
  ---
12
+ [![Website](https://img.shields.io/badge/Website-TerraMind-0F62FE)](https://ibm.github.io/terramind/)
13
+ [![arXiv](https://img.shields.io/badge/arXiv-2504.11171-b31b1b?logo=arxiv)](https://arxiv.org/abs/2504.11171)
14
+ [![Docs](https://img.shields.io/badge/Docs-EE4B2B?logo=materialformkdocs&logoColor=fff)](https://ibm.github.io/terratorch/stable/guide/terramind/)
15
+ [![Examples](https://img.shields.io/badge/GitHub-Examples-0F62FE?logo=github)](https://github.com/IBM/terramind)
16
+ [![Code](https://img.shields.io/badge/Code-TerraTorch-EE4B2B?logo=github)](https://github.com/IBM/terratorch/tree/main/terratorch/models/backbones/terramind)
17
+ [![ESAblog](https://img.shields.io/badge/Blog-ESA-113145)](https://www.esa.int/Applications/Observing_the_Earth/ESA_and_IBM_collaborate_on_TerraMind)
18
+ [![IBMblog](https://img.shields.io/badge/Blog-IBM-0F62FE)](https://research.ibm.com/blog/terramind-esa-earth-observation-model)
19
 
20
+ # TerraMind 1.0 tiny
21
+
22
+ TerraMind is the first multimodal any-to-any generative foundation model for Earth Observation jointly developed by IBM, ESA, and Forschungszentrum Jülich.
23
+
24
+
25
+ ![terramind_architecture.png](assets%2Fterramind_architecture.png)
26
+
27
+ ## Architecture
28
+
29
+ TerraMind uses a dual-scale transformer-based encoder-decoder architecture, simultaneously processing pixel-level and token-level data.
30
+ The model was pre-trained on 500B tokens from 9M spatiotemporally aligned multimodal samples from the TerraMesh dataset.
31
+
32
+ Modality-specific patch embeddings allow direct processing of raw inputs, while modality-specific FSQ-VAEs are used for image tokenization.
33
+ For sequence-like modalities such as coordinates, an adapted WordPiece tokenizer is employed.
34
+ During pre-training, TerraMind leverages masked token reconstruction, learning complex cross-modal correlations to generate high-quality latent representations.
35
+
36
+ ## Evaluation
37
+
38
+ ![terramind_evaluation.png](assets%2Fterramind_evaluation.png)
39
+
40
+ We benchmarked TerraMind against other geospatial foundation models using the PANGAEA benchmark.
41
+ TerraMind consistently achieved state-of-the-art performance, surpassing existing models in various downstream tasks such as land use segmentation, water body mapping, and vegetation assessments.
42
+ The evaluation highlights its effectiveness in handling diverse Earth Observation scenarios.
43
+ We present additional experiments in our [pre-print](https://arxiv.org/abs/2504.11171).
44
+
45
+
46
+ ## Usage
47
+
48
+ TerraMind is fully integrated into the fine-tuning package [TerraTorch](https://ibm.github.io/terratorch/).
49
+ This makes it easy to initialize the pre-trained model or fine-tune it via PyTorch Lightning.
50
+ The weights are automatically downloaded from Hugging Face.
51
+
52
+ ### Fine-tuning
53
+
54
+ You can fine-tune TerraMind with a config using TerraTorch:
55
+
56
+ ```shell
57
+ terratorch fit -c terramind_config.yaml
58
+ ```
59
+
60
+ For testing the fine-tuned TerraMind model, run:
61
+ ```shell
62
+ terratorch test -c terramind_config.yaml --ckpt_path path/to/your/checkpoint.ckpt
63
+ ```
64
+
65
+ We provide config examples and notebooks with step-by-step explanations at https://github.com/IBM/terramind.
66
+
67
+ ### Backbone
68
+
69
+ Alternatively, you can build the backbone with the following code and use it in your custom pipeline.
70
+
71
+ ```python
72
+ from terratorch import BACKBONE_REGISTRY
73
+ model = BACKBONE_REGISTRY.build(
74
+ 'terramind_v1_tiny',
75
+ pretrained=True,
76
+ modalities=['S2L2A', 'S1GRD']
77
+ )
78
+ ```
79
+
80
+ The model supports the following raw inputs which you can specify in `modalities`: S2L2A, S2L1C, S1GRD, S1RTC, DEM, RGB.
81
+ If your data does not use all bands of a modality, you can specify a subset with `bands={'S2L2A': ['BLUE', 'GREEN', 'RED', 'NIR_NARROW', 'SWIR_1', 'SWIR_2']}`.
82
+ You can pass the inputs as in a dict to the model. If a tensor is directly passed, the model assumes it is the first defined modality.
83
+ TerraMind can also handle missing input modalities.
84
+
85
+ ```python
86
+ output = model(
87
+ {
88
+ 'S2L2A': s2l2a_tensor, # B, 12, 224, 224
89
+ 'S1GRD': s1grd_tensor, # B, 2, 224, 224
90
+ }
91
+ )
92
+
93
+ output.shape # B, 196, 768
94
+ ```
95
+
96
+ The model outputs patch embeddings for each input modality. By default, the patch embeddings are averaged over all modalities to reduce the output size.
97
+ You can specify another `merge_method` from `'mean'`, `'max'`, `'concat'`, `'dict'`, and `None`.
98
+ - `mean` and `max` are applied per patch over all image modality embeddings.
99
+ - `concat` stacks all image modalities along the embedding dimension and returns one embedding per patch.
100
+ - `dict` returns all tokens split by modality in a dictionary.
101
+ - `None` returns the tokens without further processing.
102
+
103
+ ### Thinking in Modalities
104
+
105
+ TerraMind introduces a new Thinking-in-Modalities (TiM) approach, where other modalities are predicted as an intermediate steps.
106
+ Then, the fine-tuned encoder uses both raw inputs and the generated modalities.
107
+
108
+ Use TiM models in TerraTorch by adding `_tim` to the model name:
109
+ ```python
110
+ from terratorch import BACKBONE_REGISTRY
111
+ model = BACKBONE_REGISTRY.build(
112
+ 'terramind_v1_tiny_tim',
113
+ pretrained=True,
114
+ modalities=['S2L2A', 'S1GRD'],
115
+ tim_modalities=['LULC'] # optional, defaults to LULC (land-use land-cover)
116
+ )
117
+ ```
118
+
119
+ If you use TiM models, we recommend using the [pre-training statistics](https://github.com/IBM/terratorch/blob/a4ca8df7c7f22ddf469f372e1099157d2d7beeb2/terratorch/models/backbones/terramind/model/terramind_register.py#L111) for standardization.
120
+
121
+ ### Generations
122
+
123
+ TerraMind can perform any-to-any generation based on varying combinations of inputs.
124
+
125
+ ![terramind_generations.png](assets%2Fterramind_generations.png)
126
+
127
+ Build the full TerraMind model (including de-tokenizer steps) from the `FULL_MODEL_REGISTRY`:
128
+
129
+ ```python
130
+ from terratorch import FULL_MODEL_REGISTRY
131
+
132
+ model = FULL_MODEL_REGISTRY.build(
133
+ 'terramind_v1_tiny_generate',
134
+ pretrained=False,
135
+ modalities=['S2L2A'],
136
+ output_modalities=['S1GRD', 'LULC'],
137
+ timesteps=10, # Define diffusion steps
138
+ standardize=True, # Apply standardization
139
+ )
140
+ ```
141
+ Like the backbone, pass multiple modalities as a dict or a single modality as a tensor to the model which returns the generated `output_modalities` as a dict of tensors.
142
+ Note: These generations are not reconstructions but "mental images" representing how the model imagines the modality.
143
+ You can control generation details via the number of diffusion steps (`timesteps`) that you can pass to the constructor or the forward function.
144
+ By passing `standardize=True`, the pre-training standardization values are automatically applied to the input and output.
145
+
146
+ We provide an example notebook for generations at https://github.com/IBM/terramind.
147
+
148
+ ## Feedback
149
+
150
+ Your feedback is invaluable to us.
151
+ Please share it with us by starting a discussion in this HF repository or submitting an issue to [TerraMind](https://github.com/IBM/terramind) on GitHub.
152
+
153
+ ## Challenge
154
+
155
+ Already working with TerraMind? Submit your use case to the [TerraMind Blue-Sky Challenge](https://huggingface.co/spaces/ibm-esa-geospatial/challenge), a bi-monthly award spotlighting the boldest, most imaginative ways using TerraMind.
156
+
157
+ ## Citation
158
+
159
+ If you use TerraMind in your research, please cite the [TerraMind](https://arxiv.org/abs/2504.11171) pre-print.
160
+
161
+ ```text
162
+ @article{jakubik2025terramind,
163
+ title={TerraMind: Large-Scale Generative Multimodality for Earth Observation},
164
+ author={Jakubik, Johannes and Yang, Felix and Blumenstiel, Benedikt and Scheurer, Erik and Sedona, Rocco and Maurogiovanni, Stefano and Bosmans, Jente and Dionelis, Nikolaos and Marsocci, Valerio and Kopp, Niklas and others},
165
+ journal={arXiv preprint arXiv:2504.11171},
166
+ year={2025}
167
+ }
168
+ ```