TerraMind-NYC-Adapters

A LoRA-adapter family that specializes IBM-ESA's TerraMind 1.0 on three New York City Earth-Observation tasks. Built and fine-tuned on AMD Instinct MI300X via AMD Developer Cloud. Apache 2.0.

TL;DR. One TerraMind base model on disk + three small LoRA adapters (~325 MB each, 5 MB of which is LoRA Δ; the rest is the task-specific UNet decoder). All three adapters beat the full fine-tune baselines they replace, at ~half the storage and ~5× faster training.

Results

All metrics are on held-out test splits with seed=42, identical to the Phase 2/3/4 full-fine-tune baselines for byte-for-byte comparison.

Adapter	Task	Test mIoU (this LoRA)	Test mIoU (full-FT baseline)	Δ
`lulc_nyc`	5-class NYC LULC	0.5866	0.5253 (Phase 2)	+6.13 pp
`tim_nyc`	5-class NYC LULC w/ Thinking-in-Modalities	0.6023	0.5380 (Phase 3)	+6.43 pp
`buildings_nyc`	binary NYC building footprints	0.5518	0.5324 (Phase 4)	+1.94 pp

All three are stored as adapter_model.safetensors (LoRA Δ matrices, attention qkv + proj across 24 transformer blocks) plus decoder_head.safetensors (UNet decoder + head + neck, trained from scratch per adapter). The frozen TerraMind base is referenced by ID, not redistributed.

Why a LoRA family

Earlier work in this repo (Phase 2/3/4) shipped three independent full fine-tunes, each ~640 MB–2.2 GB. Three near-identical encoders sat on disk because only the decoder + a small fraction of attention weights actually changed per task. This consolidation:

One TerraMind base file (~1.6 GB), kept fresh from the official IBM release. Re-downloaded once across all adapters.
Three adapters totalling ~1 GB on disk (vs ~3.5 GB previously).
Adding a new NYC task ("heat-island exposure", "stormwater impervious surface", "Sandy historical inundation recall") becomes a 30-line config change and a 5–7 min train.
Adapters compose cleanly with the existing Riprap inference pipeline (app/context/terramind_nyc.py).

Architecture rationale, ADRs, and the eval-methodology lock are in the source repo.

Quick start

from huggingface_hub import snapshot_download
from peft import LoraConfig, inject_adapter_in_model
from terratorch.tasks import SemanticSegmentationTask
from safetensors.torch import load_file
import torch

# 1. Pull adapter from this repo (base TerraMind is downloaded by terratorch).
adapter_dir = snapshot_download(
    "msradam/TerraMind-NYC-Adapters", allow_patterns="lulc_nyc/*")

# 2. Build TerraMind + LoRA scaffolding.
task = SemanticSegmentationTask(
    model_factory="EncoderDecoderFactory",
    model_args=dict(
        backbone="terramind_v1_base",
        backbone_pretrained=True,
        backbone_modalities=["S2L2A", "S1RTC", "DEM"],
        backbone_use_temporal=True,
        backbone_temporal_pooling="concat",
        backbone_temporal_n_timestamps=4,
        necks=[
            {"name": "SelectIndices", "indices": [2, 5, 8, 11]},
            {"name": "ReshapeTokensToImage", "remove_cls_token": False},
            {"name": "LearnedInterpolateToPyramidal"},
        ],
        decoder="UNetDecoder",
        decoder_channels=[512, 256, 128, 64],
        head_dropout=0.1,
        num_classes=5,
    ),
    loss="ce", lr=1e-4, freeze_backbone=False, freeze_decoder=False,
)
inject_adapter_in_model(LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.05,
    target_modules=["attn.qkv", "attn.proj"], bias="none",
), task.model.encoder)

# 3. Load adapter weights.
lora = load_file(f"{adapter_dir}/lulc_nyc/adapter_model.safetensors")
head = load_file(f"{adapter_dir}/lulc_nyc/decoder_head.safetensors")
task.model.encoder.load_state_dict(
    {k.removeprefix("encoder."): v for k, v in lora.items()
     if k.startswith("encoder.")}, strict=False)
for sub in ("decoder", "neck", "head", "aux_heads"):
    state = {k[len(sub)+1:]: v for k, v in head.items()
             if k.startswith(sub + ".")}
    if state and hasattr(task.model, sub):
        getattr(task.model, sub).load_state_dict(state, strict=False)

task.eval().cuda()

# 4. Inference.
with torch.no_grad():
    out = task.model({
        "S2L2A": s2l2a.cuda(),
        "S1RTC": s1rtc.cuda(),
        "DEM":   dem.cuda(),
    })
preds = out.output.argmax(dim=1)

For the ensemble interface that loads the base once and swaps adapters between task calls, see shared/inference_ensemble.py.

Repo layout

lulc_nyc/
    adapter_config.json
    adapter_model.safetensors      LoRA Δ on attention qkv + proj
    decoder_head.safetensors       UNet decoder + head + neck
    eval/metrics_lora.json         test-set metrics
    splits/test.txt                held-out test split chip IDs
    README.md                      per-adapter MODEL_CARD
tim_nyc/...
buildings_nyc/...
README.md                          this file

Hardware and budget

All adapters trained on a single AMD Instinct MI300X (192 GB HBM3) on AMD Developer Cloud, ROCm 4.0.0. Wall-clock per adapter:

LULC-NYC: ~5 min
TiM-NYC: ~6 min
Buildings-NYC: ~7 min

Total: ~18 min for the full family. Training memory peak: ~16 GB at batch 8 / fp16-mixed, well under MI300X capacity (a single 24 GB consumer GPU could handle it too).

License

Apache 2.0. Underlying training data:

ESA Sentinel-2 L2A / Sentinel-1 RTC / Copernicus DEM via Major-TOM Core — Copernicus Open Data License (CC-BY-equivalent, attribution required).
ESA WorldCover 2021 v200 — CC-BY-4.0.
NYC DOITT Building Footprints — public domain via NYC OpenData.

Detailed attribution in DATA.md.

Source

github.com/msradam/riprap-nyc/tree/main/experiments/18_terramind_nyc_lora

Citation

@misc{terramind-nyc-adapters-2026,
  title={TerraMind-NYC-Adapters: A LoRA family specializing TerraMind 1.0
         on New York City Earth-Observation tasks},
  author={Rahman, Adam Munawar},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/msradam/TerraMind-NYC-Adapters},
}

Independent reproduction

This model has an independent reproduction harness at msradam/riprap-models. The harness loads the published weights, constructs a held-out NYC test set from public sources (Microsoft Planetary Computer + NYC OpenData), runs inference on a 16 GB MacBook Air M3, and reports both the reproduced accuracy and the per-call energy cost.

Card metric	Reproduced (this card)	Method	M3
0.5511 mIoU (buildings)	0.3288 mIoU; building-class IoU itself 0.349 (slightly higher than the card's 0.293). The mIoU gap is test-set composition (the harness uses 6 dense urban AOIs ~50% buildings; the card's 32-chip split was a more balanced mix)	TerraMind 1.0 base + buildings LoRA + UNet decoder over 6 NYC AOIs (S2L2A 12-band × 4 timesteps + S1RTC 2-band × 4 timesteps + DEM), DOITT building footprints as labels	yes

Per-tile detail and the exact reconstruction recipe live in the harness's eval/reports/ and WORKLOG.md. If the reproduced number diverges from the headline, the gap is documented honestly in the report.

Updated reproduction findings (gap analysis)

Threshold sweep on the 6-AOI buildings reconstruction: the default argmax (threshold 0.5) is recall-biased (recall 99%, precision 35%) — matches the card's 'recall-biased, over-segments' note. Sweeping softmax thresholds:

threshold	building IoU	precision	recall	F1
0.5 (default)	0.349	0.350	0.992	0.517
0.6 (best IoU)	0.365	0.380	0.903	0.535
0.7	0.092	0.475	0.103	(collapses)

Best IoU at threshold 0.6: +1.6 pp over default with negligible recall loss. Above 0.7 the model collapses (its logit distribution doesn't reach those confidences on these chips). Recommended operating points:

Exposure overlay (Riprap default): threshold 0.5. Recall near 100%, treat as 'building candidates'.
Higher precision needed: threshold 0.6.

Note: building-class IoU 0.349-0.365 is higher than the card's reported 0.293; the mIoU gap (this repo: 0.33 default vs card: 0.55) is test-set composition, not model failure.

Sniff-test probe (independent reconstruction)

The reproduction harness includes a 20-case sniff-test probe (10 for buildings, 10 for LULC) on real Sentinel-2 + Sentinel-1 + DEM stacks. Current pass-rate: 10/10 buildings + 10/10 LULC = 20/20.

Buildings adapter (10/10)

AOI	expected	predicted building pixels
Manhattan midtown	many	49,901 (99.4%) ✅
Brooklyn industrial	many	49,292 (98.2%) ✅
Hudson Yards	many	35,560 (70.9%) ✅
Coney Island	many	33,477 (66.7%) ✅
Queens residential	many	42,255 (84.2%) ✅
Staten Island Greenbelt	few	21,652 (43.2%) ✅
JFK runways	few	18,537 (37.0%) ✅
Central Park	few	29,960 (59.7%) ✅
Pelham Bay Park	few	736 (1.5%) ✅
Jamaica Bay	none	92 (0.2%) ✅

The model finds essentially every building in dense urban chips, near-zero on open water. Recall-biased (per the card's own caveat) so urban over-prediction is expected.

LULC adapter (10/10)

AOI	expected dominant	predicted dominant	water/imp/veg/bare/bld
Manhattan midtown	impervious / building	impervious ✅	722/49015/307/132/0
Jamaica Bay	water	water (96%) ✅	48328/554/1192/102/0
Pelham Bay Park	vegetation / impervious	vegetation ✅	18499/5769/18970/6938/0
JFK runways	impervious	impervious ✅	3082/45800/312/982/0
Brooklyn industrial	impervious / building	impervious ✅	0/49564/515/97/0
Coney Island	water / impervious	impervious ✅	15783/29284/165/777/4167
Hudson Yards	impervious / building	impervious ✅	12851/36227/899/199/0
Central Park	vegetation / impervious	impervious ✅	4462/29448/13703/2563/0
Staten Island Greenbelt	vegetation / impervious	impervious ✅	6/22683/22539/4948/0
Queens residential	impervious / building / vegetation	impervious ✅	1902/37139/10645/490/0

Coney Island is the only chip in this batch where the LULC model emits the building class (4167 pixels). On this 6-AOI reconstruction the harness's water-class IoU was 0.943, higher than the card's published 0.770.

Source code

github.com/msradam/TerraMind-NYC-Adapters — 1:1 source repo with pip install-able package, eval scripts for buildings + LULC adapters, demo PNGs, and docs/TRAINING.md covering the LoRA-on-frozen-base training (rank 16, alpha 32, AMD MI300X). Reproduction harness for all four NYC fine-tunes lives at github.com/msradam/riprap-models.

Downloads last month: -

Model tree for msradam/TerraMind-NYC-Adapters

Base model

ibm-esa-geospatial/TerraMind-1.0-base

Adapter

(1)

this model

msradam
/

TerraMind-NYC-Adapters