SemCSE-Multi-Invasion-Biology Model Card
SemCSE-multi is a multifaceted embedding model that predicts multiple, aspect-specific embeddings for a given scientific text. This version of the model is targeted to the domain of invasion biology. It encodes the aspects: Hypothesis, Species, Ecosystem, Research Question, Methodology, Recommendation.
The individual aspect-specific embeddings can then be used to evaluate the similarity of two studies with regards to just that aspect in isolation. For details, please see our paper.
Model Details
Model Description
- Developed by: CLAUSE group at Bielefeld University
- Model type: DeBERTa
- Languages: English
- Finetuned from model: KISTI-AI/Scideberta-full with additional projection heads
Model Sources
- Repository: github.com/inas-argumentation/SemCSE-Multi
- Paper: https://arxiv.org/abs/2510.11599
How to Get Started with the Model
Minimal example on how to create embeddings with our model:
from transformers import AutoTokenizer, AutoModel
# Invasion biology model
model = AutoModel.from_pretrained("CLAUSE-Bielefeld/SemCSE-Multi-Invasion-Biology", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("CLAUSE-Bielefeld/SemCSE-Multi-Invasion-Biology")
text = "This is a scientific abstract from the domain of invasion biology."
batch = tokenizer([text], return_tensors='pt')
# Get the embedding for the "species" aspect. Other options are: "hypothesis", "ecosystem", "researchquestion", "methodology" and "recommendation".
output = model(**batch)["species"]
# The resulting embeddings can be used for similarity assessments using cosine similarity.
Training Details
This model was trained on a dataset of summaries for ca. 37000 scientific abstracts from from the domain of invasion biology. We used a contrastive loss to encourage summaries of the same abstract to be placed nearby in the embedding space. This is done for each aspect separately, and the individual models are then distilled into a single, multifaceted embedding model. The dataset and exact training procedure can be found in our GitHub repo.
Evaluation
Our model achieves state-of-the-art scores for performing precise, apsect-specific similarity assessments. The evaluations are included in our paper.
Citation
BibTeX:
@misc{brinner2025semcsemultimultifaceteddecodableembeddings,
title={SemCSE-Multi: Multifaceted and Decodable Embeddings for Aspect-Specific and Interpretable Scientific Domain Mapping},
author={Marc Brinner and Sina Zarrieß},
year={2025},
eprint={2510.11599},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2510.11599},
}
- Downloads last month
- 32
Model tree for CLAUSE-Bielefeld/SemCSE-Multi-Invasion-Biology
Base model
KISTI-AI/Scideberta-full