SpanMarker-INDUS for Climate Research NER

This model is a SpanMarker model fine-tuned for fine-grained Named Entity Recognition (NER) in the climate change research domain, extracting 28 distinct entity types. It utilizes the domain-specific nasa-impact/indus-sde-v0.2 as the underlying encoder.

📌 Model Details

Model Type: SpanMarker
Encoder: nasa-impact/indus-sde-v0.2
Maximum Sequence Length: 512 tokens
Maximum Entity Length: 14 words
Language: English
License: cc-by-sa-4.0

Model Labels

Label	Examples
Asset	"raw material", "mental health", "water resources"
Body Part	"leaves", "deep tissue compartment", "plant leaves"
Body of Water	"rivers", "peripheral rivers", "Dhaleshwari river"
Chemical	"cathode materials", "domoic acid", "marine algal toxin"
Disease	"chronic epileptic syndrome", "seizures", "acute neurologic signs"
Ecosystem	"Tropical montane cloud forest", "polluted environment", "cloud forests"
Energy Source	"fossil fuels", "12-cell series battery-pack prototype", "battery cells"
Field of Study	"veterinary medicine", "study", "reference laboratory"
Geographical Feature	"heterogenous topography", "low point", "mountainous regions"
Intellectual Artefact	"Veterinary medical records", "data", "Daily husbandry records"
Location	"beaches", "Westbrook", "wild"
Mathematical Expression	"Stepwise machine hour constraints", "difference", "gradient"
Measuring Device	"MRI scan", "EEG", "station"
Meteorological Phenomenon	"rainfall", "climatic variability", "climate change"
Method	"clinical efficacy", "dosing", "serum monitoring"
Natural Disaster	"environmental pollution", "seasonal air pollution", "heavy metal contamination"
Natural Phenomenon	"algal blooms", "changing ocean conditions", "biochemical changes"
Organism	"California sea lions", "species", "Zalophus californianus"
Organization	"NOAA National Marine Fisheries Service", "long-term care facility", "reference laboratory"
Other	"normal eating", "reports", "marine mammal health"
Person	"staff", "clinicians", "Clinicians"
Physical Artefact	"paved east – west road", "EVs", "electric vehicle"
Physical Phenomenon	"normal food intake", "structural abnormalities", "seasonal changes"
Policy	"energy security", "pollution", "safety"
Quantity	"200 mAhg − 1", "energy density", ">"
Satellite	"Tropical Rainfall Measuring Mission", "satellites", "TRMM"
System	"global overturning circulation", "system structure", "climate"
Time Period	"several decades", "101 days", "periods of prolonged anorexia"

🚀 Main Results (Selected Checkpoint)

This repository provides the best-performing checkpoint selected from 5 runs with different random seeds. While the internal training logs tracked performance on the validation split of CliReNER_silver, the final model selection and the metrics below are evaluated on the independent, expert-annotated CliReNER_gold dataset.

Metric	Score
Precision	50.02
Recall	46.61
F1	48.26

This checkpoint corresponds to the seed with the highest strict F1 on the gold evaluation set (Seed 4 - 33).

📊 Results Across Seeds

We fine-tuned the model using 5 different random seeds to assess the stability and robustness of the architecture on the domain-specific text.

Seed	Precision	Recall	Strict F1
1	48.28	43.06	45.52
2	41.78	33.66	37.29
3	47.44	44.73	46.05
4	50.02	46.61	48.26
5	48.41	44.28	46.26

Summary:

F1: mean = 44.67, std = 4.26
Precision: mean = 47.19, std = 3.16
Recall: mean = 42.47, std = 5.09

Model Selection Strategy: The uploaded checkpoint is the single best seed (highest strict F1 on the gold dataset), ensuring strong real-world performance and high-fidelity alignment with domain-expert consensus.

📂 Dataset & Evaluation

Training Dataset: CliReNER_silver
- Splits used: Stratified 80:10:10 ratio (Train/Validation/Test). The 80% split was used for training.
Evaluation Dataset: CliReNER_gold
- Splits used: Evaluated on the combined 192 sentences (expert-annotated via Weighted Expert Voting).
Preprocessing:
- Texts were tokenized using the domain-specific INDUS tokenizer.
- The dataset utilizes a flat NER schema (nested entities are excluded, and overlapping entities are resolved to the most relevant span).
Metric Details:
- F1 type: Strict F1 (Entity-level exact match).
- Evaluation was performed ensuring entities match both the exact boundary span and the exact semantic label to be considered correct.

⚖️ Precision vs Recall Behavior

(Note to author: Describe the model’s tendency here based on your results. Example: "The model slightly favors recall over precision" or "Balanced precision and recall")

⚙️ Usage

Direct Use for Inference

Because this model was trained using the SpanMarker framework, it requires the span_marker library for inference.

pip install span_marker

from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("P0L3/CliReNER-indus-sde-v0.2")

# Run inference
text = "As shown in these specialised domains, general-domain, off-the-shelf NER systems often fail to capture domain-specific terminology, whereas models adapted or fine-tuned on in-domain corpora consistently achieve superior performance (Lee et al. 2019; Gururangan et al. 2020)."
entities = model.predict(text)

for entity in entities:
    print(f"Entity: {entity['span']} | Label: {entity['label']} | Score: {entity['score']:.4f}")

# Entity: specialised domains | Label: Other | Score: 0.4236
# Entity: general-domain | Label: Other | Score: 0.6351
# Entity: off-the-shelf | Label: Other | Score: 0.3454
# Entity: NER systems | Label: Method | Score: 0.5303
# Entity: domain-specific terminology | Label: Other | Score: 0.4407
# Entity: models | Label: Method | Score: 0.8604
# Entity: in-domain corpora | Label: Intellectual Artefact | Score: 0.8342
# Entity: Lee et al. | Label: Person | Score: 0.7713
# Entity: Gururangan et al. | Label: Person | Score: 0.7211
# Entity: 2020 | Label: Time Period | Score: 0.9387

Downstream Use

You can easily continue fine-tuning this model on your own dataset.

Click to expand

from span_marker import SpanMarkerModel, Trainer
from datasets import load_dataset

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("your-huggingface-username/your-model-name")

# Specify a Dataset with "tokens" and "ner_tags" columns
dataset = load_dataset("your_custom_dataset")

# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("span_marker_model_id-finetuned")

📉 Training Details

Training Set Metrics

Training set	Min	Median	Max
Sentence length	3	31.4819	97
Entities per sentence	1	7.0100	22

Training Hyperparameters

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 33
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training Results (CliReNER_silver Validation Split)

Epoch	Step	Validation Loss	Validation Precision	Validation Recall	Validation F1	Validation Accuracy
1.0	62	0.1564	0.0	0.0	0.0	0.6075
2.0	124	0.1363	0.0	0.0	0.0	0.6075
3.0	186	0.0993	0.0	0.0	0.0	0.6072
4.0	248	0.0653	0.5089	0.2453	0.3311	0.7044
5.0	310	0.0510	0.5383	0.4935	0.5150	0.7901
6.0	372	0.0475	0.5821	0.5495	0.5653	0.8113
7.0	434	0.0448	0.6149	0.6026	0.6087	0.8316
8.0	496	0.0478	0.6161	0.6055	0.6107	0.8286
9.0	558	0.0491	0.6112	0.6112	0.6112	0.8328
10.0	620	0.0493	0.6399	0.6169	0.6282	0.8419

Framework Versions

Python: 3.10.19
SpanMarker: 1.7.0
Transformers: 4.50.0
PyTorch: 2.9.1+cu126
Datasets: 3.0.0
Tokenizers: 0.21.4

📚 Citation

If you use this model or the CliReNER datasets in your research, please cite the project:

@misc{poleksic2026named,
  author       = {Poleksić, Andrija and Martinčić-Ipšić, Sanda},
  title        = {Named Entity Recognition for Climate Change Research},
  year         = {2026},
  howpublished = {Research Square},
  note         = {Preprint}
}

Please also acknowledge the SpanMarker framework:

@software{Aarsen_SpanMarker,
    author = {Aarsen, Tom},
    license = {Apache-2.0},
    title = {{SpanMarker for Named Entity Recognition}},
    url = {https://github.com/tomaarsen/SpanMarkerNER}
}

Downloads last month: 10

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for P0L3/CliReNER-indus-sde-v0.2

Base model

nasa-impact/indus-sde-v0.2

Finetuned

(5)

this model

Datasets used to train P0L3/CliReNER-indus-sde-v0.2

Collection including P0L3/CliReNER-indus-sde-v0.2

CliReNER-Encoders

Collection

A collection of fine-tuned transformer models for climate-focused Named Entity Recognition (NER). • 13 items • Updated Mar 23

Evaluation results

F1 on CliReNER_silver
self-reported

0.628
Precision on CliReNER_silver
self-reported

0.640
Recall on CliReNER_silver
self-reported

0.617