TerraMind-NYC-Adapters

A LoRA-adapter family that specializes IBM-ESA's TerraMind 1.0 on three New York City Earth-Observation tasks. Built and fine-tuned on AMD Instinct MI300X via AMD Developer Cloud. Apache 2.0.

TL;DR. One TerraMind base model on disk + three small LoRA adapters (~325 MB each, 5 MB of which is LoRA Ξ”; the rest is the task-specific UNet decoder). All three adapters beat the full fine-tune baselines they replace, at ~half the storage and ~5Γ— faster training.

Results

All metrics are on held-out test splits with seed=42, identical to the Phase 2/3/4 full-fine-tune baselines for byte-for-byte comparison.

Adapter Task Test mIoU (this LoRA) Test mIoU (full-FT baseline) Ξ”
lulc_nyc 5-class NYC LULC 0.5866 0.5253 (Phase 2) +6.13 pp
tim_nyc 5-class NYC LULC w/ Thinking-in-Modalities 0.6023 0.5380 (Phase 3) +6.43 pp
buildings_nyc binary NYC building footprints 0.5518 0.5324 (Phase 4) +1.94 pp

All three are stored as adapter_model.safetensors (LoRA Ξ” matrices, attention qkv + proj across 24 transformer blocks) plus decoder_head.safetensors (UNet decoder + head + neck, trained from scratch per adapter). The frozen TerraMind base is referenced by ID, not redistributed.

Why a LoRA family

Earlier work in this repo (Phase 2/3/4) shipped three independent full fine-tunes, each ~640 MB–2.2 GB. Three near-identical encoders sat on disk because only the decoder + a small fraction of attention weights actually changed per task. This consolidation:

  • One TerraMind base file (~1.6 GB), kept fresh from the official IBM release. Re-downloaded once across all adapters.
  • Three adapters totalling ~1 GB on disk (vs ~3.5 GB previously).
  • Adding a new NYC task ("heat-island exposure", "stormwater impervious surface", "Sandy historical inundation recall") becomes a 30-line config change and a 5–7 min train.
  • Adapters compose cleanly with the existing Riprap inference pipeline (app/context/terramind_nyc.py).

Architecture rationale, ADRs, and the eval-methodology lock are in the source repo.

Quick start

from huggingface_hub import snapshot_download
from peft import LoraConfig, inject_adapter_in_model
from terratorch.tasks import SemanticSegmentationTask
from safetensors.torch import load_file
import torch

# 1. Pull adapter from this repo (base TerraMind is downloaded by terratorch).
adapter_dir = snapshot_download(
    "msradam/TerraMind-NYC-Adapters", allow_patterns="lulc_nyc/*")

# 2. Build TerraMind + LoRA scaffolding.
task = SemanticSegmentationTask(
    model_factory="EncoderDecoderFactory",
    model_args=dict(
        backbone="terramind_v1_base",
        backbone_pretrained=True,
        backbone_modalities=["S2L2A", "S1RTC", "DEM"],
        backbone_use_temporal=True,
        backbone_temporal_pooling="concat",
        backbone_temporal_n_timestamps=4,
        necks=[
            {"name": "SelectIndices", "indices": [2, 5, 8, 11]},
            {"name": "ReshapeTokensToImage", "remove_cls_token": False},
            {"name": "LearnedInterpolateToPyramidal"},
        ],
        decoder="UNetDecoder",
        decoder_channels=[512, 256, 128, 64],
        head_dropout=0.1,
        num_classes=5,
    ),
    loss="ce", lr=1e-4, freeze_backbone=False, freeze_decoder=False,
)
inject_adapter_in_model(LoraConfig(
    r=16, lora_alpha=32, lora_dropout=0.05,
    target_modules=["attn.qkv", "attn.proj"], bias="none",
), task.model.encoder)

# 3. Load adapter weights.
lora = load_file(f"{adapter_dir}/lulc_nyc/adapter_model.safetensors")
head = load_file(f"{adapter_dir}/lulc_nyc/decoder_head.safetensors")
task.model.encoder.load_state_dict(
    {k.removeprefix("encoder."): v for k, v in lora.items()
     if k.startswith("encoder.")}, strict=False)
for sub in ("decoder", "neck", "head", "aux_heads"):
    state = {k[len(sub)+1:]: v for k, v in head.items()
             if k.startswith(sub + ".")}
    if state and hasattr(task.model, sub):
        getattr(task.model, sub).load_state_dict(state, strict=False)

task.eval().cuda()

# 4. Inference.
with torch.no_grad():
    out = task.model({
        "S2L2A": s2l2a.cuda(),
        "S1RTC": s1rtc.cuda(),
        "DEM":   dem.cuda(),
    })
preds = out.output.argmax(dim=1)

For the ensemble interface that loads the base once and swaps adapters between task calls, see shared/inference_ensemble.py.

Repo layout

lulc_nyc/
    adapter_config.json
    adapter_model.safetensors      LoRA Ξ” on attention qkv + proj
    decoder_head.safetensors       UNet decoder + head + neck
    eval/metrics_lora.json         test-set metrics
    splits/test.txt                held-out test split chip IDs
    README.md                      per-adapter MODEL_CARD
tim_nyc/...
buildings_nyc/...
README.md                          this file

Hardware and budget

All adapters trained on a single AMD Instinct MI300X (192 GB HBM3) on AMD Developer Cloud, ROCm 4.0.0. Wall-clock per adapter:

  • LULC-NYC: ~5 min
  • TiM-NYC: ~6 min
  • Buildings-NYC: ~7 min

Total: ~18 min for the full family. Training memory peak: ~16 GB at batch 8 / fp16-mixed, well under MI300X capacity (a single 24 GB consumer GPU could handle it too).

License

Apache 2.0. Underlying training data:

  • ESA Sentinel-2 L2A / Sentinel-1 RTC / Copernicus DEM via Major-TOM Core β€” Copernicus Open Data License (CC-BY-equivalent, attribution required).
  • ESA WorldCover 2021 v200 β€” CC-BY-4.0.
  • NYC DOITT Building Footprints β€” public domain via NYC OpenData.

Detailed attribution in DATA.md.

Source

github.com/msradam/riprap-nyc/tree/main/experiments/18_terramind_nyc_lora

Citation

@misc{terramind-nyc-adapters-2026,
  title={TerraMind-NYC-Adapters: A LoRA family specializing TerraMind 1.0
         on New York City Earth-Observation tasks},
  author={Rahman, Adam Munawar},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/msradam/TerraMind-NYC-Adapters},
}

Independent reproduction

This model has an independent reproduction harness at msradam/riprap-models. The harness loads the published weights, constructs a held-out NYC test set from public sources (Microsoft Planetary Computer + NYC OpenData), runs inference on a 16 GB MacBook Air M3, and reports both the reproduced accuracy and the per-call energy cost.

Card metric Reproduced (this card) Method M3
0.5511 mIoU (buildings) 0.3288 mIoU; building-class IoU itself 0.349 (slightly higher than the card's 0.293). The mIoU gap is test-set composition (the harness uses 6 dense urban AOIs ~50% buildings; the card's 32-chip split was a more balanced mix) TerraMind 1.0 base + buildings LoRA + UNet decoder over 6 NYC AOIs (S2L2A 12-band Γ— 4 timesteps + S1RTC 2-band Γ— 4 timesteps + DEM), DOITT building footprints as labels yes

Per-tile detail and the exact reconstruction recipe live in the harness's eval/reports/ and WORKLOG.md. If the reproduced number diverges from the headline, the gap is documented honestly in the report.

Updated reproduction findings (gap analysis)

Threshold sweep on the 6-AOI buildings reconstruction: the default argmax (threshold 0.5) is recall-biased (recall 99%, precision 35%) β€” matches the card's 'recall-biased, over-segments' note. Sweeping softmax thresholds:

threshold building IoU precision recall F1
0.5 (default) 0.349 0.350 0.992 0.517
0.6 (best IoU) 0.365 0.380 0.903 0.535
0.7 0.092 0.475 0.103 (collapses)

Best IoU at threshold 0.6: +1.6 pp over default with negligible recall loss. Above 0.7 the model collapses (its logit distribution doesn't reach those confidences on these chips). Recommended operating points:

  • Exposure overlay (Riprap default): threshold 0.5. Recall near 100%, treat as 'building candidates'.
  • Higher precision needed: threshold 0.6.

Note: building-class IoU 0.349-0.365 is higher than the card's reported 0.293; the mIoU gap (this repo: 0.33 default vs card: 0.55) is test-set composition, not model failure.

Sniff-test probe (independent reconstruction)

The reproduction harness includes a 20-case sniff-test probe (10 for buildings, 10 for LULC) on real Sentinel-2 + Sentinel-1 + DEM stacks. Current pass-rate: 10/10 buildings + 10/10 LULC = 20/20.

Buildings adapter (10/10)

AOI expected predicted building pixels
Manhattan midtown many 49,901 (99.4%) βœ…
Brooklyn industrial many 49,292 (98.2%) βœ…
Hudson Yards many 35,560 (70.9%) βœ…
Coney Island many 33,477 (66.7%) βœ…
Queens residential many 42,255 (84.2%) βœ…
Staten Island Greenbelt few 21,652 (43.2%) βœ…
JFK runways few 18,537 (37.0%) βœ…
Central Park few 29,960 (59.7%) βœ…
Pelham Bay Park few 736 (1.5%) βœ…
Jamaica Bay none 92 (0.2%) βœ…

The model finds essentially every building in dense urban chips, near-zero on open water. Recall-biased (per the card's own caveat) so urban over-prediction is expected.

LULC adapter (10/10)

AOI expected dominant predicted dominant water/imp/veg/bare/bld
Manhattan midtown impervious / building impervious βœ… 722/49015/307/132/0
Jamaica Bay water water (96%) βœ… 48328/554/1192/102/0
Pelham Bay Park vegetation / impervious vegetation βœ… 18499/5769/18970/6938/0
JFK runways impervious impervious βœ… 3082/45800/312/982/0
Brooklyn industrial impervious / building impervious βœ… 0/49564/515/97/0
Coney Island water / impervious impervious βœ… 15783/29284/165/777/4167
Hudson Yards impervious / building impervious βœ… 12851/36227/899/199/0
Central Park vegetation / impervious impervious βœ… 4462/29448/13703/2563/0
Staten Island Greenbelt vegetation / impervious impervious βœ… 6/22683/22539/4948/0
Queens residential impervious / building / vegetation impervious βœ… 1902/37139/10645/490/0

Coney Island is the only chip in this batch where the LULC model emits the building class (4167 pixels). On this 6-AOI reconstruction the harness's water-class IoU was 0.943, higher than the card's published 0.770.

Source code

github.com/msradam/TerraMind-NYC-Adapters β€” 1:1 source repo with pip install-able package, eval scripts for buildings + LULC adapters, demo PNGs, and docs/TRAINING.md covering the LoRA-on-frozen-base training (rank 16, alpha 32, AMD MI300X). Reproduction harness for all four NYC fine-tunes lives at github.com/msradam/riprap-models.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for msradam/TerraMind-NYC-Adapters

Adapter
(1)
this model

Spaces using msradam/TerraMind-NYC-Adapters 3