ToMMeR-Llama-3.2-3B_L1_R64

ToMMeR is a lightweight probing model extracting emergent mention detection capabilities from early layers representations of any LLM backbone, achieving high Zero Shot recall across a wide set of 13 NER benchmarks.

Checkpoint Details

Property Value
Base LLM meta-llama/Llama-3.2-3B
Layer 1
#Params 396.3K

Usage

Installation

Our code can be installed with pip+git, Please visit the repository for more details.

pip install git+https://github.com/VictorMorand/llm2ner.git

Fancy Outputs

import llm2ner
from llm2ner import ToMMeR

tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-Llama-3.2-3B_L1_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) 
tommer.to(llm.device)

text = "Large language models are awesome. While trained on language modeling, they exhibit emergent Zero Shot abilities that make them suitable for a wide range of tasks, including Named Entity Recognition (NER). "

#fancy interactive output
outputs = llm2ner.plotting.demo_inference( text, tommer, llm,
    decoding_strategy="threshold",  # or "greedy" for flat segmentation
    threshold=0.5, # default 50%
    show_attn=True,
)
Large PRED language PRED models are awesome . While trained on language PRED modeling , they exhibit emergent PRED abilities that make them suitable for a wide range of tasks PRED , including Named PRED Entity Recognition ( NER PRED ) .

Raw inference

By default, ToMMeR outputs span probabilities, but we also propose built-in options for decoding entities.

  • Inputs:
    • tokens (batch, seq): tokens to process,
    • model: LLM to extract representation from.
  • Outputs: (batch, seq, seq) matrix (masked outside valid spans)

tommer = ToMMeR.from_pretrained("llm2ner/ToMMeR-Llama-3.2-3B_L1_R64")
# load Backbone llm, optionnally cut the unused layer to save GPU space.
llm = llm2ner.utils.load_llm( tommer.llm_name, cut_to_layer=tommer.layer,) 
tommer.to(llm.device)

#### Raw Inference
text = ["Large language models are awesome"]
print(f"Input text: {text[0]}")

#tokenize in shape (1, seq_len)
tokens = model.tokenizer(text, return_tensors="pt")["input_ids"].to(device)
# Output raw scores
output = tommer.forward(tokens, model) # (batch_size, seq_len, seq_len)
print(f"Raw Output shape: {output.shape}")

#use given decoding strategy to infer entities
entities = tommer.infer_entities(tokens=tokens, model=model, threshold=0.5, decoding_strategy="greedy")
str_entities = [ model.tokenizer.decode(tokens[0,b:e+1]) for b, e in entities[0]]
print(f"Predicted entities: {str_entities}")

>>> Input text: Large language models are awesome
>>> Raw Output shape: torch.Size([1, 6, 6])
>>> Predicted entities: ['Large language models']

Please visit the repository for more details and a demo notebook.

Evaluation Results

dataset precision recall f1 n_samples
MultiNERD 0.2029 0.9852 0.3364 154144
CoNLL 2003 0.2823 0.9502 0.4353 16493
CrossNER_politics 0.2934 0.9661 0.4501 1389
CrossNER_AI 0.3115 0.9614 0.4706 879
CrossNER_literature 0.3505 0.935 0.5098 916
CrossNER_science 0.3389 0.9622 0.5012 1193
CrossNER_music 0.378 0.9332 0.538 945
ncbi 0.1145 0.9124 0.2034 3952
FabNER 0.2886 0.7301 0.4136 13681
WikiNeural 0.1962 0.9874 0.3273 92672
GENIA_NER 0.218 0.9517 0.3547 16563
ACE 2005 0.2583 0.4032 0.3149 8230
Ontonotes 0.2397 0.7331 0.3613 42193
Aggregated 0.2155 0.9277 0.3497 353250
Mean 0.2671 0.8778 0.4013 353250

Citation

If using this model or the approach, please cite the associated paper:

@misc{morand2025tommerefficiententity,
      title={ToMMeR -- Efficient Entity Mention Detection from Large Language Models}, 
      author={Victor Morand and Nadi Tomeh and Josiane Mothe and Benjamin Piwowarski},
      year={2025},
      eprint={2510.19410},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.19410}, 
}

License

Apache-2.0 (see repository for full text).

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for llm2ner/ToMMeR-Llama-3.2-3B_L1_R64

Finetuned
(342)
this model