Model Card for quinex-context-v0-77M
quinex-context-v0-77M is based on FLAN-T5 small, which is a pre-trained and instruction-finetuned encoder-decoder transformer model. We further fine-tuned this model to extract the measurement context of quantities in text (i.e., the entity and property being measured as well as qualifiers). For more details, please refer to our paper "Quinex: Quantitative Information Extraction from Text using Open and Lightweight LLMs" (published soon).
Usage
This model is intended for extracting the measurement context of quantities in text using multi-turn question-answering. To first identify the quantities, you can use our quantity identification models.
This model assumes the use of specific question templates, and that already extracted information is highlighted in the context. The following input format is used: question: {question} context: {highlighted_text}. First, the measured property is extracted, then the entity, and then, without a specific order, further context such as the temporal scope, spatial scope, references, determination method, and other qualifiers. The quantity, property, and entity spans are highlighted in the context using $...$, **...**, and [[...]], respectively.
Example
Let's say we want to extract the property that is measured by the quantity "6 MW" in the sentence "The nominal power and rotor diameter of the Enercon E-175 EP5 are 6 MW and 175 m, respectively". The procedure would be as follows:
- First, the measured property is extracted using the following input:
question: Which property or quality is characterized by 6 MW? context: The nominal power and rotor diameter of the Enercon E-175 EP5 are $6 MW$ and 175 m, respectively. - Given the answer "nominal power", the entity is extracted using the following input:
question: Which entity's nominal power is characterized by 6 MW? context: The **nominal power** and rotor diameter of the Enercon E-175 EP5 are $6 MW$ and 175 m, respectively.In case the property could not be extracted, a fallback question is used instead. - Given the answer "Enercon E-175 EP5", further context can be extracted, e.g., the temporal scope using the following input:
question: For which point in time is the statement true that nominal power of Enercon E-175 EP5 is 6 MW? context: The **nominal power** and rotor diameter of the [[Enercon E-175 EP5]] are $6 MW$ and 175 m, respectively.For each of the context types (temporal scope, spatial scope, references, determination method, and other qualifiers), there is also a fallback question in case either the property or entity could not be extracted.
Full list of question templates:
- Property:
- Default: Which property or quality is characterized by {quantity_span}?
- Entity:
- Default: Which entity's {property_span} {is_or_are} characterized by {quantity_span}?
- Fallback: Which entity does {quantity_span} characterize?
- Qualifiers:
- Temporal scope:
- Default: For which point in time is the statement true that {property_span} of {entity_span} {is_or_are} {quantity_span}?
- Fallback: For which point in time is the statement true that {entity_or_property_span} {is_or_are} {quantity_span}?
- Spatial scope:
- Default: For which location is the statement true that {property_span} of {entity_span} {is_or_are} {quantity_span}?
- Fallback: For which location is the statement true that {entity_or_property_span} {is_or_are} {quantity_span}?
- Reference:
- Default: According to whom or which reference is the statement true that {property_span} of {entity_span} {is_or_are} {quantity_span}?
- Fallback: According to whom or which reference is the statement true that {entity_or_property_span} {is_or_are} {quantity_span}?
- Method:
- Default: What methods and instruments were used in determining that {property_span} of {entity_span} {is_or_are} {quantity_span}?
- Fallback: What methods and instruments were used in determining that {entity_or_property_span} {is_or_are} {quantity_span}?
- Other qualifiers:
- Default: Under which constraints is the statement true that {property_span} of {entity_span} {is_or_are} {quantity_span}?
- Fallback: Under which constraints is the statement true that {entity_or_property_span} {is_or_are} {quantity_span}?
- Temporal scope:
Model details
- Base Model: FLAN-T5 small
- Tokenizer: T5 tokenizer
- Parameters: 77M
Fine-tuning data
The model was fine-tuned sequentially on all non-curated "silver" examples and all curated "gold" examples from a combination of datasets for measurement context extraction that includes:
- Wiki-Measurements
- MeasEval (relabeled)
- Materials Science Procedural Text Corpus (relabeled)
- MuLMS (relabeled)
- orkg-R0 (relabeled)
- PolyIE (relabeled)
- PHEE (relabeled)
- SuperMat (subset, relabeled)
- Quinex-Hydrogen-TechData
Evaluation results
Evaluation results on the test set as described in the paper:
| Concept | F1 (SQuAD overlap) | |
|---|---|---|
| Overall | Micro average | 83.32 |
| Entity and property | Macro average | 80.13 |
| Measured Property | 81.68 | |
| Measured Entity | 78.58 | |
| Qualifiers | Macro average | 85.41 |
| Spatial Scope | 97.40 | |
| Temporal Scope | 92.52 | |
| Method | 86.65 | |
| Reference | 86.41 | |
| Other Qualifiers | 64.08 |
Note that the scores for the qualifier questions are higher because abstaining is often the correct answer for them.
Citation
If you use this model in your research, please cite the following paper:
@article{quinex2025,
title = {{Quinex: Quantitative Information Extraction from Text using Open and Lightweight LLMs}},
author = {Göpfert, Jan and Kuckertz, Patrick and Müller, Gian and Lütz, Luna and Körner, Celine and Khuat, Hang and Stolten, Detlef and Weinand, Jann M.},
month = okt,
year = {2025},
}
Framework versions
- Transformers 4.36.2
- Pytorch 2.1.2
- Datasets 2.16.1
- Tokenizers 0.15.0
- Downloads last month
- 12
Model tree for JuelichSystemsAnalysis/quinex-context-v0-77M
Base model
google/flan-t5-small