jakelever commited on
Commit
2f30fe6
·
verified ·
1 Parent(s): 8b7df4f

Upload folder using huggingface_hub

Browse files
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ task: token-classification
3
+ tags:
4
+ - biomedical
5
+ - bionlp
6
+ license: mit
7
+ base_model: microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext
8
+ ---
9
+
10
+ # bioner_medmentions_st21pv_finegrain
11
+
12
+ This is a named entity recognition model fine-tuned from the [microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext) model. It predicts spans with 91 possible labels. The labels are **Acquired Abnormality, Amino Acid Sequence, Amino Acid, Peptide, or Protein, Amphibian, Anatomical Abnormality, Anatomical Structure, Animal, Antibiotic, Bacterium, Biologic Function, Biologically Active Substance, Biomedical Occupation or Discipline, Biomedical or Dental Material, Bird, Body Location or Region, Body Part, Organ, or Organ Component, Body Space or Junction, Body Substance, Body System, Cell, Cell Component, Cell Function, Cell or Molecular Dysfunction, Chemical, Chemical Viewed Functionally, Chemical Viewed Structurally, Classification, Clinical Attribute, Congenital Abnormality, Diagnostic Procedure, Disease or Syndrome, Drug Delivery Device, Element, Ion, or Isotope, Embryonic Structure, Enzyme, Eukaryote, Experimental Model of Disease, Finding, Fish, Food, Fully Formed Anatomical Structure, Fungus, Gene or Genome, Genetic Function, Geographic Area, Hazardous or Poisonous Substance, Health Care Activity, Health Care Related Organization, Hormone, Human, Idea or Concept, Immunologic Factor, Indicator, Reagent, or Diagnostic Aid, Injury or Poisoning, Inorganic Chemical, Intellectual Product, Laboratory Procedure, Laboratory or Test Result, Mammal, Medical Device, Mental Process, Mental or Behavioral Dysfunction, Molecular Biology Research Technique, Molecular Function, Molecular Sequence, Neoplastic Process, Nucleic Acid, Nucleoside, or Nucleotide, Nucleotide Sequence, Organ or Tissue Function, Organic Chemical, Organism Function, Organization, Pathologic Function, Pharmacologic Substance, Physiologic Function, Plant, Population Group, Professional Society, Professional or Occupational Group, Receptor, Regulation or Law, Reptile, Research Activity, Self-help or Relief Organization, Sign or Symptom, Spatial Concept, Therapeutic or Preventive Procedure, Tissue, Vertebrate, Virus and Vitamin**.
13
+
14
+ The code used for training this model can be found at https://github.com/Glasgow-AI4BioMed/bioner along with links to other biomedical NER models trained on well-known biomedical corpora. The source dataset information is below.
15
+
16
+ ## Example Usage
17
+
18
+ The code below will load up the model and apply it to the provided text. It uses a simple aggregation strategy to post-process the individual tokens into larger multi-token entities where needed.
19
+
20
+ ```python
21
+ from transformers import pipeline
22
+
23
+ # Load the model as part of an NER pipeline
24
+ ner_pipeline = pipeline("token-classification",
25
+ model="Glasgow-AI4BioMed/bioner_medmentions_st21pv_finegrain",
26
+ aggregation_strategy="max")
27
+
28
+ # Apply it to some text
29
+ ner_pipeline("EGFR T790M mutations have been known to affect treatment outcomes for NSCLC patients receiving erlotinib.")
30
+
31
+ # Output:
32
+ # [ {"entity_group": "Gene or Genome", "score": 0.96229, "word": "egfr", "start": 0, "end": 4},
33
+ # {"entity_group": "Genetic Function", "score": 0.91988, "word": "t790m mutations", "start": 5, "end": 20},
34
+ # {"entity_group": "Neoplastic Process", "score": 0.99883, "word": "nsclc", "start": 51, "end": 56},
35
+ # {"entity_group": "Pharmacologic Substance", "score": 0.99931, "word": "erlotinib", "start": 76, "end": 85} ]
36
+ ```
37
+
38
+ ## Dataset Info
39
+
40
+ **Source:** The ST21pv version of MedMentions was downloaded from: https://github.com/chanzuckerberg/MedMentions/tree/master/st21pv
41
+
42
+ The dataset should be cited with: Mohan, Sunil, and Donghui Li. "MedMentions: A Large Biomedical Corpus Annotated with UMLS Concepts." Automated Knowledge Base Construction (AKBC), 2019, https://openreview.net/forum?id=SylxCx5pTQ. DOI: [10.24432/C5G59C](https://doi.org/10.24432/C5G59C)
43
+
44
+ An overview of semantic types can be found at: https://www.nlm.nih.gov/research/umls/META3_current_semantic_types.html
45
+
46
+ **Preprocessing:** The training, validation and test splits were maintained from the original dataset. Concept identifiers (CUIs) were used to map each annotation to its associated UMLS entry to recover semantic types (from the MRSTY.RRF UMLS file). Semantic types provided in MedMentions were not used. Annotations were mapped to specific *semantic types* names using the Semantic Groups file available at: https://www.nlm.nih.gov/research/umls/knowledge_sources/semantic_network/index.html. This contrasts with the finegrained version that mapped annotations to *semantic groups*. The preprocessing script for this dataset is [prepare_medmentions.py](https://github.com/Glasgow-AI4BioMed/bioner/blob/main/prepare_medmentions.py.py) with the --finegrain flag.
47
+
48
+ ## Performance
49
+
50
+ The span-level performance on the test split for the different labels are shown in the tables below. The full performance results are available in the model repo in Markdown format for viewing and JSON format for easier loading. These include the performance at token level (with individual B- and I- labels as the token classifier uses IOB2 token labelling).
51
+
52
+ | Label | Precision | Recall | F1-score | Support |
53
+ | --- | --- | --- | --- | --- |
54
+ | Acquired Abnormality | 0.273 | 0.240 | 0.255 | 50 |
55
+ | Amino Acid Sequence | 0.303 | 0.357 | 0.328 | 84 |
56
+ | Amino Acid, Peptide, or Protein | 0.258 | 0.277 | 0.267 | 166 |
57
+ | Anatomical Abnormality | 0.190 | 0.145 | 0.164 | 76 |
58
+ | Anatomical Structure | 0.160 | 0.308 | 0.211 | 13 |
59
+ | Animal | 0.605 | 0.742 | 0.667 | 93 |
60
+ | Antibiotic | 0.835 | 0.784 | 0.808 | 148 |
61
+ | Bacterium | 0.752 | 0.732 | 0.742 | 448 |
62
+ | Biologic Function | 0.341 | 0.369 | 0.355 | 157 |
63
+ | Biologically Active Substance | 0.586 | 0.640 | 0.612 | 2080 |
64
+ | Biomedical Occupation or Discipline | 0.437 | 0.444 | 0.441 | 196 |
65
+ | Biomedical or Dental Material | 0.369 | 0.452 | 0.406 | 197 |
66
+ | Bird | 0.810 | 0.819 | 0.814 | 83 |
67
+ | Body Location or Region | 0.386 | 0.461 | 0.420 | 232 |
68
+ | Body Part, Organ, or Organ Component | 0.565 | 0.604 | 0.584 | 1092 |
69
+ | Body Space or Junction | 0.272 | 0.311 | 0.290 | 90 |
70
+ | Body Substance | 0.542 | 0.731 | 0.622 | 212 |
71
+ | Body System | 0.554 | 0.511 | 0.532 | 90 |
72
+ | Cell | 0.680 | 0.737 | 0.708 | 924 |
73
+ | Cell Component | 0.589 | 0.637 | 0.612 | 311 |
74
+ | Cell Function | 0.484 | 0.599 | 0.535 | 499 |
75
+ | Cell or Molecular Dysfunction | 0.607 | 0.657 | 0.631 | 99 |
76
+ | Chemical | 0.348 | 0.333 | 0.340 | 72 |
77
+ | Chemical Viewed Functionally | 0.286 | 0.432 | 0.344 | 37 |
78
+ | Chemical Viewed Structurally | 0.443 | 0.427 | 0.435 | 82 |
79
+ | Classification | 0.520 | 0.544 | 0.532 | 309 |
80
+ | Clinical Attribute | 0.596 | 0.625 | 0.610 | 323 |
81
+ | Congenital Abnormality | 0.438 | 0.443 | 0.440 | 79 |
82
+ | Diagnostic Procedure | 0.672 | 0.648 | 0.660 | 735 |
83
+ | Disease or Syndrome | 0.757 | 0.774 | 0.766 | 2199 |
84
+ | Element, Ion, or Isotope | 0.713 | 0.657 | 0.684 | 385 |
85
+ | Embryonic Structure | 0.587 | 0.509 | 0.545 | 53 |
86
+ | Enzyme | 0.766 | 0.761 | 0.763 | 681 |
87
+ | Eukaryote | 0.745 | 0.793 | 0.768 | 397 |
88
+ | Experimental Model of Disease | 0.286 | 0.356 | 0.317 | 45 |
89
+ | Finding | 0.391 | 0.388 | 0.389 | 2759 |
90
+ | Fish | 1.000 | 0.947 | 0.973 | 19 |
91
+ | Food | 0.556 | 0.455 | 0.501 | 336 |
92
+ | Fully Formed Anatomical Structure | 0.000 | 0.000 | 0.000 | 1 |
93
+ | Fungus | 0.819 | 0.798 | 0.809 | 119 |
94
+ | Gene or Genome | 0.551 | 0.539 | 0.545 | 912 |
95
+ | Genetic Function | 0.598 | 0.646 | 0.621 | 652 |
96
+ | Geographic Area | 0.673 | 0.712 | 0.692 | 598 |
97
+ | Hazardous or Poisonous Substance | 0.513 | 0.522 | 0.518 | 293 |
98
+ | Health Care Activity | 0.487 | 0.458 | 0.472 | 1061 |
99
+ | Health Care Related Organization | 0.531 | 0.642 | 0.581 | 296 |
100
+ | Hormone | 0.806 | 0.746 | 0.775 | 189 |
101
+ | Human | 0.799 | 0.880 | 0.837 | 158 |
102
+ | Idea or Concept | 0.000 | 0.000 | 0.000 | 1 |
103
+ | Immunologic Factor | 0.674 | 0.606 | 0.638 | 434 |
104
+ | Indicator, Reagent, or Diagnostic Aid | 0.427 | 0.451 | 0.439 | 182 |
105
+ | Injury or Poisoning | 0.617 | 0.703 | 0.657 | 357 |
106
+ | Inorganic Chemical | 0.611 | 0.680 | 0.643 | 256 |
107
+ | Intellectual Product | 0.495 | 0.485 | 0.490 | 2075 |
108
+ | Laboratory Procedure | 0.445 | 0.452 | 0.448 | 908 |
109
+ | Laboratory or Test Result | 0.183 | 0.196 | 0.190 | 112 |
110
+ | Mammal | 0.778 | 0.838 | 0.807 | 456 |
111
+ | Medical Device | 0.434 | 0.437 | 0.435 | 355 |
112
+ | Mental Process | 0.546 | 0.546 | 0.546 | 740 |
113
+ | Mental or Behavioral Dysfunction | 0.710 | 0.774 | 0.741 | 518 |
114
+ | Molecular Biology Research Technique | 0.500 | 0.539 | 0.519 | 206 |
115
+ | Molecular Function | 0.504 | 0.555 | 0.528 | 719 |
116
+ | Molecular Sequence | 0.417 | 0.556 | 0.476 | 9 |
117
+ | Neoplastic Process | 0.761 | 0.745 | 0.753 | 918 |
118
+ | Nucleic Acid, Nucleoside, or Nucleotide | 0.331 | 0.450 | 0.381 | 109 |
119
+ | Nucleotide Sequence | 0.320 | 0.491 | 0.387 | 110 |
120
+ | Organ or Tissue Function | 0.482 | 0.425 | 0.452 | 247 |
121
+ | Organic Chemical | 0.396 | 0.464 | 0.427 | 511 |
122
+ | Organism Function | 0.462 | 0.518 | 0.488 | 471 |
123
+ | Organization | 0.270 | 0.442 | 0.335 | 77 |
124
+ | Pathologic Function | 0.541 | 0.541 | 0.541 | 669 |
125
+ | Pharmacologic Substance | 0.577 | 0.623 | 0.599 | 1258 |
126
+ | Physiologic Function | 0.283 | 0.286 | 0.284 | 182 |
127
+ | Plant | 0.603 | 0.618 | 0.610 | 403 |
128
+ | Population Group | 0.710 | 0.711 | 0.711 | 1263 |
129
+ | Professional Society | 0.000 | 0.000 | 0.000 | 7 |
130
+ | Professional or Occupational Group | 0.599 | 0.725 | 0.656 | 360 |
131
+ | Receptor | 0.614 | 0.686 | 0.648 | 271 |
132
+ | Regulation or Law | 0.182 | 0.125 | 0.148 | 16 |
133
+ | Reptile | 1.000 | 0.318 | 0.483 | 22 |
134
+ | Research Activity | 0.559 | 0.540 | 0.549 | 1653 |
135
+ | Self-help or Relief Organization | 0.000 | 0.000 | 0.000 | 2 |
136
+ | Sign or Symptom | 0.629 | 0.647 | 0.638 | 340 |
137
+ | Spatial Concept | 0.474 | 0.483 | 0.479 | 1282 |
138
+ | Therapeutic or Preventive Procedure | 0.609 | 0.617 | 0.613 | 2036 |
139
+ | Tissue | 0.562 | 0.525 | 0.543 | 259 |
140
+ | Vertebrate | 0.000 | 0.000 | 0.000 | 1 |
141
+ | Virus | 0.678 | 0.831 | 0.747 | 172 |
142
+ | Vitamin | 0.706 | 0.511 | 0.593 | 47 |
143
+ | macro avg | 0.507 | 0.525 | 0.512 | 40144 |
144
+ | weighted avg | 0.571 | 0.589 | 0.578 | 40144 |
145
+
146
+
147
+ ## Hyperparameters
148
+
149
+ Hyperparameter tuning was done with [optuna](https://optuna.org/) and the [hyperparameter_search](https://huggingface.co/docs/transformers/en/hpo_train) functionality. 100 trials were run. Early stopping was applied during training. The best performing model was selected using the macro F1 performance on the validation set. The selected hyperparameters are in the table below.
150
+
151
+ | Hyperparameter | Value |
152
+ |----------------|-------|
153
+ | epochs | 25.0 |
154
+ | learning_rate | 7.86794379743531e-05 |
155
+ | per_device_train_batch_size | 16 |
156
+ | weight_decay | 0.06816454557507429 |
157
+ | warmup_ratio | 0.07903396276412193 |
158
+
best_hyperparameters.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "epochs": 25.0,
3
+ "learning_rate": 7.86794379743531e-05,
4
+ "per_device_train_batch_size": 16,
5
+ "weight_decay": 0.06816454557507429,
6
+ "warmup_ratio": 0.07903396276412193
7
+ }
config.json ADDED
@@ -0,0 +1,210 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "microsoft/BiomedNLP-BiomedBERT-base-uncased-abstract-fulltext",
3
+ "architectures": [
4
+ "BertForTokenClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "O",
13
+ "1": "B-Acquired Abnormality",
14
+ "2": "I-Acquired Abnormality",
15
+ "3": "B-Amino Acid Sequence",
16
+ "4": "I-Amino Acid Sequence",
17
+ "5": "B-Amino Acid, Peptide, or Protein",
18
+ "6": "I-Amino Acid, Peptide, or Protein",
19
+ "7": "B-Amphibian",
20
+ "8": "I-Amphibian",
21
+ "9": "B-Anatomical Abnormality",
22
+ "10": "I-Anatomical Abnormality",
23
+ "11": "B-Anatomical Structure",
24
+ "12": "I-Anatomical Structure",
25
+ "13": "B-Animal",
26
+ "14": "I-Animal",
27
+ "15": "B-Antibiotic",
28
+ "16": "I-Antibiotic",
29
+ "17": "B-Bacterium",
30
+ "18": "I-Bacterium",
31
+ "19": "B-Biologic Function",
32
+ "20": "I-Biologic Function",
33
+ "21": "B-Biologically Active Substance",
34
+ "22": "I-Biologically Active Substance",
35
+ "23": "B-Biomedical Occupation or Discipline",
36
+ "24": "I-Biomedical Occupation or Discipline",
37
+ "25": "B-Biomedical or Dental Material",
38
+ "26": "I-Biomedical or Dental Material",
39
+ "27": "B-Bird",
40
+ "28": "I-Bird",
41
+ "29": "B-Body Location or Region",
42
+ "30": "I-Body Location or Region",
43
+ "31": "B-Body Part, Organ, or Organ Component",
44
+ "32": "I-Body Part, Organ, or Organ Component",
45
+ "33": "B-Body Space or Junction",
46
+ "34": "I-Body Space or Junction",
47
+ "35": "B-Body Substance",
48
+ "36": "I-Body Substance",
49
+ "37": "B-Body System",
50
+ "38": "I-Body System",
51
+ "39": "B-Cell",
52
+ "40": "I-Cell",
53
+ "41": "B-Cell Component",
54
+ "42": "I-Cell Component",
55
+ "43": "B-Cell Function",
56
+ "44": "I-Cell Function",
57
+ "45": "B-Cell or Molecular Dysfunction",
58
+ "46": "I-Cell or Molecular Dysfunction",
59
+ "47": "B-Chemical",
60
+ "48": "I-Chemical",
61
+ "49": "B-Chemical Viewed Functionally",
62
+ "50": "I-Chemical Viewed Functionally",
63
+ "51": "B-Chemical Viewed Structurally",
64
+ "52": "I-Chemical Viewed Structurally",
65
+ "53": "B-Classification",
66
+ "54": "I-Classification",
67
+ "55": "B-Clinical Attribute",
68
+ "56": "I-Clinical Attribute",
69
+ "57": "B-Congenital Abnormality",
70
+ "58": "I-Congenital Abnormality",
71
+ "59": "B-Diagnostic Procedure",
72
+ "60": "I-Diagnostic Procedure",
73
+ "61": "B-Disease or Syndrome",
74
+ "62": "I-Disease or Syndrome",
75
+ "63": "B-Drug Delivery Device",
76
+ "64": "I-Drug Delivery Device",
77
+ "65": "B-Element, Ion, or Isotope",
78
+ "66": "I-Element, Ion, or Isotope",
79
+ "67": "B-Embryonic Structure",
80
+ "68": "I-Embryonic Structure",
81
+ "69": "B-Enzyme",
82
+ "70": "I-Enzyme",
83
+ "71": "B-Eukaryote",
84
+ "72": "I-Eukaryote",
85
+ "73": "B-Experimental Model of Disease",
86
+ "74": "I-Experimental Model of Disease",
87
+ "75": "B-Finding",
88
+ "76": "I-Finding",
89
+ "77": "B-Fish",
90
+ "78": "I-Fish",
91
+ "79": "B-Food",
92
+ "80": "I-Food",
93
+ "81": "B-Fully Formed Anatomical Structure",
94
+ "82": "I-Fully Formed Anatomical Structure",
95
+ "83": "B-Fungus",
96
+ "84": "I-Fungus",
97
+ "85": "B-Gene or Genome",
98
+ "86": "I-Gene or Genome",
99
+ "87": "B-Genetic Function",
100
+ "88": "I-Genetic Function",
101
+ "89": "B-Geographic Area",
102
+ "90": "I-Geographic Area",
103
+ "91": "B-Hazardous or Poisonous Substance",
104
+ "92": "I-Hazardous or Poisonous Substance",
105
+ "93": "B-Health Care Activity",
106
+ "94": "I-Health Care Activity",
107
+ "95": "B-Health Care Related Organization",
108
+ "96": "I-Health Care Related Organization",
109
+ "97": "B-Hormone",
110
+ "98": "I-Hormone",
111
+ "99": "B-Human",
112
+ "100": "I-Human",
113
+ "101": "B-Idea or Concept",
114
+ "102": "I-Idea or Concept",
115
+ "103": "B-Immunologic Factor",
116
+ "104": "I-Immunologic Factor",
117
+ "105": "B-Indicator, Reagent, or Diagnostic Aid",
118
+ "106": "I-Indicator, Reagent, or Diagnostic Aid",
119
+ "107": "B-Injury or Poisoning",
120
+ "108": "I-Injury or Poisoning",
121
+ "109": "B-Inorganic Chemical",
122
+ "110": "I-Inorganic Chemical",
123
+ "111": "B-Intellectual Product",
124
+ "112": "I-Intellectual Product",
125
+ "113": "B-Laboratory Procedure",
126
+ "114": "I-Laboratory Procedure",
127
+ "115": "B-Laboratory or Test Result",
128
+ "116": "I-Laboratory or Test Result",
129
+ "117": "B-Mammal",
130
+ "118": "I-Mammal",
131
+ "119": "B-Medical Device",
132
+ "120": "I-Medical Device",
133
+ "121": "B-Mental Process",
134
+ "122": "I-Mental Process",
135
+ "123": "B-Mental or Behavioral Dysfunction",
136
+ "124": "I-Mental or Behavioral Dysfunction",
137
+ "125": "B-Molecular Biology Research Technique",
138
+ "126": "I-Molecular Biology Research Technique",
139
+ "127": "B-Molecular Function",
140
+ "128": "I-Molecular Function",
141
+ "129": "B-Molecular Sequence",
142
+ "130": "I-Molecular Sequence",
143
+ "131": "B-Neoplastic Process",
144
+ "132": "I-Neoplastic Process",
145
+ "133": "B-Nucleic Acid, Nucleoside, or Nucleotide",
146
+ "134": "I-Nucleic Acid, Nucleoside, or Nucleotide",
147
+ "135": "B-Nucleotide Sequence",
148
+ "136": "I-Nucleotide Sequence",
149
+ "137": "B-Organ or Tissue Function",
150
+ "138": "I-Organ or Tissue Function",
151
+ "139": "B-Organic Chemical",
152
+ "140": "I-Organic Chemical",
153
+ "141": "B-Organism Function",
154
+ "142": "I-Organism Function",
155
+ "143": "B-Organization",
156
+ "144": "I-Organization",
157
+ "145": "B-Pathologic Function",
158
+ "146": "I-Pathologic Function",
159
+ "147": "B-Pharmacologic Substance",
160
+ "148": "I-Pharmacologic Substance",
161
+ "149": "B-Physiologic Function",
162
+ "150": "I-Physiologic Function",
163
+ "151": "B-Plant",
164
+ "152": "I-Plant",
165
+ "153": "B-Population Group",
166
+ "154": "I-Population Group",
167
+ "155": "B-Professional Society",
168
+ "156": "I-Professional Society",
169
+ "157": "B-Professional or Occupational Group",
170
+ "158": "I-Professional or Occupational Group",
171
+ "159": "B-Receptor",
172
+ "160": "I-Receptor",
173
+ "161": "B-Regulation or Law",
174
+ "162": "I-Regulation or Law",
175
+ "163": "B-Reptile",
176
+ "164": "I-Reptile",
177
+ "165": "B-Research Activity",
178
+ "166": "I-Research Activity",
179
+ "167": "B-Self-help or Relief Organization",
180
+ "168": "I-Self-help or Relief Organization",
181
+ "169": "B-Sign or Symptom",
182
+ "170": "I-Sign or Symptom",
183
+ "171": "B-Spatial Concept",
184
+ "172": "I-Spatial Concept",
185
+ "173": "B-Therapeutic or Preventive Procedure",
186
+ "174": "I-Therapeutic or Preventive Procedure",
187
+ "175": "B-Tissue",
188
+ "176": "I-Tissue",
189
+ "177": "B-Vertebrate",
190
+ "178": "I-Vertebrate",
191
+ "179": "B-Virus",
192
+ "180": "I-Virus",
193
+ "181": "B-Vitamin",
194
+ "182": "I-Vitamin"
195
+ },
196
+ "initializer_range": 0.02,
197
+ "intermediate_size": 3072,
198
+ "layer_norm_eps": 1e-12,
199
+ "max_position_embeddings": 512,
200
+ "model_type": "bert",
201
+ "num_attention_heads": 12,
202
+ "num_hidden_layers": 12,
203
+ "pad_token_id": 0,
204
+ "position_embedding_type": "absolute",
205
+ "torch_dtype": "float32",
206
+ "transformers_version": "4.48.0.dev0",
207
+ "type_vocab_size": 2,
208
+ "use_cache": true,
209
+ "vocab_size": 30522
210
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8699f32b06c3650dedba2d05fab150b749339f2eab1c7ae58189892f0ffd5462
3
+ size 436152844
performance_report.json ADDED
The diff for this file is too large to render. See raw diff
 
performance_report.md ADDED
@@ -0,0 +1,868 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Performance on Training Set
2
+
3
+ ## Span Level
4
+
5
+ | Label | Precision | Recall | F1-score | Support |
6
+ | --- | --- | --- | --- | --- |
7
+ | Acquired Abnormality | 1.000 | 1.000 | 1.000 | 105 |
8
+ | Amino Acid Sequence | 0.995 | 0.986 | 0.990 | 207 |
9
+ | Amino Acid, Peptide, or Protein | 0.994 | 0.971 | 0.982 | 659 |
10
+ | Amphibian | 0.952 | 1.000 | 0.976 | 20 |
11
+ | Anatomical Abnormality | 0.993 | 0.951 | 0.971 | 143 |
12
+ | Anatomical Structure | 0.952 | 0.952 | 0.952 | 83 |
13
+ | Animal | 1.000 | 0.996 | 0.998 | 238 |
14
+ | Antibiotic | 0.997 | 0.997 | 0.997 | 364 |
15
+ | Bacterium | 0.988 | 0.964 | 0.976 | 1125 |
16
+ | Biologic Function | 0.992 | 0.989 | 0.990 | 622 |
17
+ | Biologically Active Substance | 0.993 | 0.987 | 0.990 | 5343 |
18
+ | Biomedical Occupation or Discipline | 0.996 | 0.985 | 0.990 | 522 |
19
+ | Biomedical or Dental Material | 0.995 | 0.994 | 0.995 | 845 |
20
+ | Bird | 0.978 | 0.938 | 0.958 | 242 |
21
+ | Body Location or Region | 0.993 | 0.986 | 0.989 | 702 |
22
+ | Body Part, Organ, or Organ Component | 0.996 | 0.991 | 0.993 | 3887 |
23
+ | Body Space or Junction | 0.997 | 0.989 | 0.993 | 356 |
24
+ | Body Substance | 0.996 | 0.994 | 0.995 | 774 |
25
+ | Body System | 0.977 | 0.973 | 0.975 | 300 |
26
+ | Cell | 0.993 | 0.992 | 0.993 | 3312 |
27
+ | Cell Component | 0.998 | 0.998 | 0.998 | 929 |
28
+ | Cell Function | 0.996 | 0.994 | 0.995 | 1941 |
29
+ | Cell or Molecular Dysfunction | 0.994 | 0.997 | 0.996 | 350 |
30
+ | Chemical | 0.994 | 0.987 | 0.990 | 156 |
31
+ | Chemical Viewed Functionally | 0.994 | 0.994 | 0.994 | 176 |
32
+ | Chemical Viewed Structurally | 0.993 | 0.986 | 0.989 | 284 |
33
+ | Classification | 0.995 | 0.988 | 0.991 | 738 |
34
+ | Clinical Attribute | 0.991 | 0.980 | 0.986 | 1046 |
35
+ | Congenital Abnormality | 0.967 | 0.989 | 0.978 | 177 |
36
+ | Diagnostic Procedure | 0.995 | 0.987 | 0.991 | 2426 |
37
+ | Disease or Syndrome | 0.998 | 0.988 | 0.993 | 6911 |
38
+ | Drug Delivery Device | 1.000 | 0.846 | 0.917 | 13 |
39
+ | Element, Ion, or Isotope | 0.998 | 0.982 | 0.990 | 934 |
40
+ | Embryonic Structure | 1.000 | 1.000 | 1.000 | 174 |
41
+ | Enzyme | 0.991 | 0.982 | 0.986 | 2192 |
42
+ | Eukaryote | 0.979 | 0.953 | 0.966 | 989 |
43
+ | Experimental Model of Disease | 1.000 | 1.000 | 1.000 | 189 |
44
+ | Finding | 0.992 | 0.986 | 0.989 | 8218 |
45
+ | Fish | 0.974 | 0.955 | 0.964 | 155 |
46
+ | Food | 0.995 | 0.972 | 0.983 | 851 |
47
+ | Fully Formed Anatomical Structure | 0.000 | 0.000 | 0.000 | 1 |
48
+ | Fungus | 0.979 | 0.946 | 0.963 | 353 |
49
+ | Gene or Genome | 0.993 | 0.988 | 0.991 | 2776 |
50
+ | Genetic Function | 0.993 | 0.997 | 0.995 | 2000 |
51
+ | Geographic Area | 0.995 | 0.991 | 0.993 | 1712 |
52
+ | Hazardous or Poisonous Substance | 0.993 | 0.981 | 0.987 | 702 |
53
+ | Health Care Activity | 0.994 | 0.986 | 0.990 | 2858 |
54
+ | Health Care Related Organization | 0.995 | 0.988 | 0.992 | 1041 |
55
+ | Hormone | 1.000 | 0.992 | 0.996 | 504 |
56
+ | Human | 1.000 | 0.998 | 0.999 | 437 |
57
+ | Immunologic Factor | 0.996 | 0.992 | 0.994 | 1722 |
58
+ | Indicator, Reagent, or Diagnostic Aid | 0.991 | 0.986 | 0.988 | 649 |
59
+ | Injury or Poisoning | 0.992 | 0.992 | 0.992 | 1062 |
60
+ | Inorganic Chemical | 0.995 | 0.996 | 0.995 | 765 |
61
+ | Intellectual Product | 0.996 | 0.990 | 0.993 | 5358 |
62
+ | Laboratory Procedure | 0.987 | 0.980 | 0.984 | 2761 |
63
+ | Laboratory or Test Result | 0.986 | 0.982 | 0.984 | 509 |
64
+ | Mammal | 0.996 | 0.987 | 0.992 | 1350 |
65
+ | Medical Device | 0.994 | 0.989 | 0.991 | 1157 |
66
+ | Mental Process | 0.994 | 0.990 | 0.992 | 1431 |
67
+ | Mental or Behavioral Dysfunction | 0.997 | 0.997 | 0.997 | 1099 |
68
+ | Molecular Biology Research Technique | 0.991 | 0.986 | 0.989 | 654 |
69
+ | Molecular Function | 0.994 | 0.994 | 0.994 | 2007 |
70
+ | Molecular Sequence | 0.974 | 0.974 | 0.974 | 39 |
71
+ | Neoplastic Process | 0.996 | 0.987 | 0.991 | 2924 |
72
+ | Nucleic Acid, Nucleoside, or Nucleotide | 0.998 | 0.992 | 0.995 | 508 |
73
+ | Nucleotide Sequence | 0.986 | 0.975 | 0.980 | 360 |
74
+ | Organ or Tissue Function | 0.992 | 0.988 | 0.990 | 904 |
75
+ | Organic Chemical | 0.994 | 0.988 | 0.991 | 1411 |
76
+ | Organism Function | 0.994 | 0.987 | 0.990 | 1639 |
77
+ | Organization | 0.983 | 0.992 | 0.988 | 240 |
78
+ | Pathologic Function | 0.999 | 0.991 | 0.995 | 2325 |
79
+ | Pharmacologic Substance | 0.997 | 0.986 | 0.991 | 4127 |
80
+ | Physiologic Function | 0.992 | 0.971 | 0.981 | 758 |
81
+ | Plant | 0.982 | 0.969 | 0.975 | 1113 |
82
+ | Population Group | 0.998 | 0.990 | 0.994 | 3580 |
83
+ | Professional Society | 1.000 | 0.957 | 0.978 | 23 |
84
+ | Professional or Occupational Group | 0.998 | 0.991 | 0.995 | 1053 |
85
+ | Receptor | 0.993 | 0.990 | 0.992 | 829 |
86
+ | Regulation or Law | 1.000 | 0.961 | 0.980 | 51 |
87
+ | Reptile | 0.941 | 0.889 | 0.914 | 18 |
88
+ | Research Activity | 0.996 | 0.989 | 0.992 | 4787 |
89
+ | Self-help or Relief Organization | 0.000 | 0.000 | 0.000 | 4 |
90
+ | Sign or Symptom | 0.997 | 0.997 | 0.997 | 1131 |
91
+ | Spatial Concept | 0.995 | 0.989 | 0.992 | 4126 |
92
+ | Therapeutic or Preventive Procedure | 0.994 | 0.986 | 0.990 | 6659 |
93
+ | Tissue | 0.991 | 0.983 | 0.987 | 991 |
94
+ | Vertebrate | 0.923 | 0.923 | 0.923 | 13 |
95
+ | Virus | 0.991 | 0.980 | 0.986 | 702 |
96
+ | Vitamin | 1.000 | 0.954 | 0.977 | 262 |
97
+ | macro avg | 0.969 | 0.959 | 0.964 | 122183 |
98
+ | weighted avg | 0.994 | 0.987 | 0.990 | 122183 |
99
+
100
+ ## Token Level
101
+
102
+ | Label | Precision | Recall | F1-score | Support |
103
+ | --- | --- | --- | --- | --- |
104
+ | O | 1.000 | 1.000 | 1.000 | 557052 |
105
+ | B-Acquired Abnormality | 1.000 | 1.000 | 1.000 | 105 |
106
+ | I-Acquired Abnormality | 1.000 | 1.000 | 1.000 | 138 |
107
+ | B-Amino Acid Sequence | 1.000 | 1.000 | 1.000 | 206 |
108
+ | I-Amino Acid Sequence | 1.000 | 1.000 | 1.000 | 345 |
109
+ | B-Amino Acid, Peptide, or Protein | 0.997 | 0.991 | 0.994 | 649 |
110
+ | I-Amino Acid, Peptide, or Protein | 1.000 | 0.997 | 0.998 | 968 |
111
+ | B-Amphibian | 1.000 | 1.000 | 1.000 | 20 |
112
+ | I-Amphibian | 0.906 | 1.000 | 0.951 | 29 |
113
+ | B-Anatomical Abnormality | 1.000 | 0.951 | 0.975 | 143 |
114
+ | I-Anatomical Abnormality | 1.000 | 0.985 | 0.992 | 201 |
115
+ | B-Anatomical Structure | 0.976 | 0.976 | 0.976 | 83 |
116
+ | I-Anatomical Structure | 1.000 | 0.972 | 0.986 | 71 |
117
+ | B-Animal | 1.000 | 0.996 | 0.998 | 238 |
118
+ | I-Animal | 1.000 | 1.000 | 1.000 | 95 |
119
+ | B-Antibiotic | 0.997 | 1.000 | 0.999 | 363 |
120
+ | I-Antibiotic | 1.000 | 1.000 | 1.000 | 294 |
121
+ | B-Bacterium | 1.000 | 1.000 | 1.000 | 1111 |
122
+ | I-Bacterium | 1.000 | 1.000 | 1.000 | 2449 |
123
+ | B-Biologic Function | 0.995 | 0.997 | 0.996 | 619 |
124
+ | I-Biologic Function | 0.997 | 0.997 | 0.997 | 305 |
125
+ | B-Biologically Active Substance | 0.999 | 0.998 | 0.998 | 5337 |
126
+ | I-Biologically Active Substance | 0.999 | 1.000 | 0.999 | 6779 |
127
+ | B-Biomedical Occupation or Discipline | 1.000 | 0.990 | 0.995 | 518 |
128
+ | I-Biomedical Occupation or Discipline | 0.990 | 1.000 | 0.995 | 393 |
129
+ | B-Biomedical or Dental Material | 0.995 | 0.998 | 0.996 | 842 |
130
+ | I-Biomedical or Dental Material | 1.000 | 1.000 | 1.000 | 1080 |
131
+ | B-Bird | 0.987 | 1.000 | 0.994 | 232 |
132
+ | I-Bird | 0.997 | 1.000 | 0.999 | 338 |
133
+ | B-Body Location or Region | 0.999 | 0.997 | 0.998 | 697 |
134
+ | I-Body Location or Region | 0.997 | 1.000 | 0.998 | 643 |
135
+ | B-Body Part, Organ, or Organ Component | 0.999 | 0.999 | 0.999 | 3868 |
136
+ | I-Body Part, Organ, or Organ Component | 1.000 | 1.000 | 1.000 | 3015 |
137
+ | B-Body Space or Junction | 0.997 | 0.997 | 0.997 | 353 |
138
+ | I-Body Space or Junction | 1.000 | 1.000 | 1.000 | 425 |
139
+ | B-Body Substance | 0.996 | 0.999 | 0.997 | 770 |
140
+ | I-Body Substance | 1.000 | 1.000 | 1.000 | 378 |
141
+ | B-Body System | 0.997 | 0.997 | 0.997 | 298 |
142
+ | I-Body System | 0.993 | 0.997 | 0.995 | 297 |
143
+ | B-Cell | 0.998 | 0.999 | 0.999 | 3301 |
144
+ | I-Cell | 0.999 | 1.000 | 0.999 | 3622 |
145
+ | B-Cell Component | 0.999 | 0.999 | 0.999 | 929 |
146
+ | I-Cell Component | 1.000 | 0.999 | 0.999 | 825 |
147
+ | B-Cell Function | 0.996 | 0.996 | 0.996 | 1937 |
148
+ | I-Cell Function | 0.999 | 0.999 | 0.999 | 1716 |
149
+ | B-Cell or Molecular Dysfunction | 0.997 | 1.000 | 0.999 | 350 |
150
+ | I-Cell or Molecular Dysfunction | 1.000 | 0.998 | 0.999 | 423 |
151
+ | B-Chemical | 0.994 | 1.000 | 0.997 | 154 |
152
+ | I-Chemical | 1.000 | 1.000 | 1.000 | 22 |
153
+ | B-Chemical Viewed Functionally | 0.994 | 0.994 | 0.994 | 176 |
154
+ | I-Chemical Viewed Functionally | 1.000 | 1.000 | 1.000 | 98 |
155
+ | B-Chemical Viewed Structurally | 0.996 | 1.000 | 0.998 | 281 |
156
+ | I-Chemical Viewed Structurally | 0.996 | 1.000 | 0.998 | 282 |
157
+ | B-Classification | 0.997 | 0.995 | 0.996 | 736 |
158
+ | I-Classification | 0.998 | 1.000 | 0.999 | 447 |
159
+ | B-Clinical Attribute | 0.996 | 0.999 | 0.998 | 1030 |
160
+ | I-Clinical Attribute | 0.995 | 0.999 | 0.997 | 854 |
161
+ | B-Congenital Abnormality | 0.967 | 1.000 | 0.983 | 177 |
162
+ | I-Congenital Abnormality | 0.988 | 1.000 | 0.994 | 331 |
163
+ | B-Diagnostic Procedure | 0.999 | 0.997 | 0.998 | 2412 |
164
+ | I-Diagnostic Procedure | 0.998 | 1.000 | 0.999 | 3191 |
165
+ | B-Disease or Syndrome | 0.999 | 0.999 | 0.999 | 6843 |
166
+ | I-Disease or Syndrome | 0.999 | 1.000 | 1.000 | 7295 |
167
+ | B-Drug Delivery Device | 1.000 | 1.000 | 1.000 | 11 |
168
+ | I-Drug Delivery Device | 1.000 | 1.000 | 1.000 | 3 |
169
+ | B-Element, Ion, or Isotope | 1.000 | 0.995 | 0.997 | 924 |
170
+ | I-Element, Ion, or Isotope | 0.999 | 1.000 | 0.999 | 922 |
171
+ | B-Embryonic Structure | 1.000 | 1.000 | 1.000 | 174 |
172
+ | I-Embryonic Structure | 1.000 | 1.000 | 1.000 | 73 |
173
+ | B-Enzyme | 0.998 | 1.000 | 0.999 | 2182 |
174
+ | I-Enzyme | 1.000 | 1.000 | 1.000 | 3095 |
175
+ | B-Eukaryote | 1.000 | 0.998 | 0.999 | 985 |
176
+ | I-Eukaryote | 1.000 | 1.000 | 1.000 | 2332 |
177
+ | B-Experimental Model of Disease | 1.000 | 1.000 | 1.000 | 189 |
178
+ | I-Experimental Model of Disease | 1.000 | 1.000 | 1.000 | 213 |
179
+ | B-Finding | 0.997 | 0.998 | 0.997 | 8158 |
180
+ | I-Finding | 0.996 | 0.999 | 0.998 | 7036 |
181
+ | B-Fish | 0.994 | 1.000 | 0.997 | 154 |
182
+ | I-Fish | 1.000 | 1.000 | 1.000 | 364 |
183
+ | B-Food | 0.999 | 0.998 | 0.998 | 832 |
184
+ | I-Food | 0.997 | 0.999 | 0.998 | 705 |
185
+ | B-Fully Formed Anatomical Structure | 0.000 | 0.000 | 0.000 | 1 |
186
+ | I-Fully Formed Anatomical Structure | 0.000 | 0.000 | 0.000 | 0 |
187
+ | B-Fungus | 1.000 | 1.000 | 1.000 | 348 |
188
+ | I-Fungus | 1.000 | 1.000 | 1.000 | 748 |
189
+ | B-Gene or Genome | 0.998 | 0.999 | 0.998 | 2768 |
190
+ | I-Gene or Genome | 1.000 | 1.000 | 1.000 | 4238 |
191
+ | B-Genetic Function | 0.995 | 0.999 | 0.997 | 1999 |
192
+ | I-Genetic Function | 0.999 | 1.000 | 1.000 | 1245 |
193
+ | B-Geographic Area | 0.998 | 1.000 | 0.999 | 1702 |
194
+ | I-Geographic Area | 0.996 | 0.998 | 0.997 | 1370 |
195
+ | B-Hazardous or Poisonous Substance | 0.997 | 0.997 | 0.997 | 697 |
196
+ | I-Hazardous or Poisonous Substance | 0.999 | 1.000 | 0.999 | 882 |
197
+ | B-Health Care Activity | 0.997 | 0.997 | 0.997 | 2833 |
198
+ | I-Health Care Activity | 0.998 | 0.999 | 0.999 | 2092 |
199
+ | B-Health Care Related Organization | 0.998 | 0.999 | 0.999 | 1031 |
200
+ | I-Health Care Related Organization | 0.999 | 1.000 | 0.999 | 1447 |
201
+ | B-Hormone | 1.000 | 1.000 | 1.000 | 500 |
202
+ | I-Hormone | 1.000 | 1.000 | 1.000 | 454 |
203
+ | B-Human | 1.000 | 1.000 | 1.000 | 436 |
204
+ | I-Human | 1.000 | 1.000 | 1.000 | 75 |
205
+ | B-Idea or Concept | 0.000 | 0.000 | 0.000 | 0 |
206
+ | I-Idea or Concept | 0.000 | 0.000 | 0.000 | 0 |
207
+ | B-Immunologic Factor | 0.997 | 0.998 | 0.997 | 1714 |
208
+ | I-Immunologic Factor | 1.000 | 1.000 | 1.000 | 2675 |
209
+ | B-Indicator, Reagent, or Diagnostic Aid | 0.995 | 0.998 | 0.997 | 644 |
210
+ | I-Indicator, Reagent, or Diagnostic Aid | 1.000 | 1.000 | 1.000 | 952 |
211
+ | B-Injury or Poisoning | 0.994 | 1.000 | 0.997 | 1056 |
212
+ | I-Injury or Poisoning | 0.998 | 1.000 | 0.999 | 1028 |
213
+ | B-Inorganic Chemical | 0.996 | 1.000 | 0.998 | 763 |
214
+ | I-Inorganic Chemical | 0.998 | 1.000 | 0.999 | 588 |
215
+ | B-Intellectual Product | 0.998 | 0.997 | 0.998 | 5336 |
216
+ | I-Intellectual Product | 0.999 | 0.999 | 0.999 | 6776 |
217
+ | B-Laboratory Procedure | 0.996 | 0.993 | 0.995 | 2745 |
218
+ | I-Laboratory Procedure | 0.999 | 1.000 | 0.999 | 4182 |
219
+ | B-Laboratory or Test Result | 0.994 | 0.992 | 0.993 | 505 |
220
+ | I-Laboratory or Test Result | 0.997 | 0.999 | 0.998 | 684 |
221
+ | B-Mammal | 0.999 | 0.999 | 0.999 | 1338 |
222
+ | I-Mammal | 0.999 | 0.999 | 0.999 | 1075 |
223
+ | B-Medical Device | 1.000 | 0.997 | 0.998 | 1152 |
224
+ | I-Medical Device | 0.999 | 1.000 | 0.999 | 1599 |
225
+ | B-Mental Process | 0.994 | 0.996 | 0.995 | 1423 |
226
+ | I-Mental Process | 1.000 | 0.997 | 0.999 | 749 |
227
+ | B-Mental or Behavioral Dysfunction | 0.997 | 1.000 | 0.999 | 1096 |
228
+ | I-Mental or Behavioral Dysfunction | 1.000 | 1.000 | 1.000 | 867 |
229
+ | B-Molecular Biology Research Technique | 0.994 | 0.992 | 0.993 | 651 |
230
+ | I-Molecular Biology Research Technique | 0.998 | 0.998 | 0.998 | 1121 |
231
+ | B-Molecular Function | 0.997 | 0.997 | 0.997 | 2004 |
232
+ | I-Molecular Function | 0.998 | 1.000 | 0.999 | 2015 |
233
+ | B-Molecular Sequence | 0.974 | 0.974 | 0.974 | 39 |
234
+ | I-Molecular Sequence | 1.000 | 1.000 | 1.000 | 42 |
235
+ | B-Neoplastic Process | 0.999 | 1.000 | 0.999 | 2900 |
236
+ | I-Neoplastic Process | 0.999 | 0.999 | 0.999 | 2938 |
237
+ | B-Nucleic Acid, Nucleoside, or Nucleotide | 1.000 | 0.992 | 0.996 | 508 |
238
+ | I-Nucleic Acid, Nucleoside, or Nucleotide | 0.997 | 1.000 | 0.998 | 609 |
239
+ | B-Nucleotide Sequence | 0.997 | 0.994 | 0.996 | 360 |
240
+ | I-Nucleotide Sequence | 0.998 | 0.998 | 0.998 | 641 |
241
+ | B-Organ or Tissue Function | 0.996 | 0.993 | 0.994 | 901 |
242
+ | I-Organ or Tissue Function | 0.996 | 0.995 | 0.996 | 845 |
243
+ | B-Organic Chemical | 0.999 | 0.999 | 0.999 | 1404 |
244
+ | I-Organic Chemical | 0.999 | 1.000 | 1.000 | 2838 |
245
+ | B-Organism Function | 0.995 | 0.996 | 0.996 | 1626 |
246
+ | I-Organism Function | 0.999 | 0.996 | 0.997 | 944 |
247
+ | B-Organization | 0.983 | 1.000 | 0.992 | 238 |
248
+ | I-Organization | 0.996 | 1.000 | 0.998 | 260 |
249
+ | B-Pathologic Function | 1.000 | 0.998 | 0.999 | 2309 |
250
+ | I-Pathologic Function | 0.999 | 1.000 | 0.999 | 1677 |
251
+ | B-Pharmacologic Substance | 0.999 | 1.000 | 0.999 | 4080 |
252
+ | I-Pharmacologic Substance | 0.999 | 1.000 | 1.000 | 6760 |
253
+ | B-Physiologic Function | 0.997 | 0.989 | 0.993 | 746 |
254
+ | I-Physiologic Function | 0.996 | 1.000 | 0.998 | 507 |
255
+ | B-Plant | 0.995 | 0.998 | 0.996 | 1104 |
256
+ | I-Plant | 0.998 | 0.999 | 0.999 | 1821 |
257
+ | B-Population Group | 0.999 | 0.999 | 0.999 | 3547 |
258
+ | I-Population Group | 0.997 | 0.999 | 0.998 | 1310 |
259
+ | B-Professional Society | 1.000 | 0.957 | 0.978 | 23 |
260
+ | I-Professional Society | 1.000 | 1.000 | 1.000 | 55 |
261
+ | B-Professional or Occupational Group | 0.998 | 0.999 | 0.999 | 1045 |
262
+ | I-Professional or Occupational Group | 1.000 | 1.000 | 1.000 | 700 |
263
+ | B-Receptor | 0.994 | 0.998 | 0.996 | 824 |
264
+ | I-Receptor | 0.998 | 0.998 | 0.998 | 1291 |
265
+ | B-Regulation or Law | 1.000 | 0.961 | 0.980 | 51 |
266
+ | I-Regulation or Law | 1.000 | 1.000 | 1.000 | 54 |
267
+ | B-Reptile | 1.000 | 1.000 | 1.000 | 18 |
268
+ | I-Reptile | 1.000 | 1.000 | 1.000 | 23 |
269
+ | B-Research Activity | 0.997 | 0.994 | 0.995 | 4768 |
270
+ | I-Research Activity | 0.999 | 0.999 | 0.999 | 3780 |
271
+ | B-Self-help or Relief Organization | 0.000 | 0.000 | 0.000 | 4 |
272
+ | I-Self-help or Relief Organization | 0.000 | 0.000 | 0.000 | 4 |
273
+ | B-Sign or Symptom | 0.996 | 0.999 | 0.998 | 1129 |
274
+ | I-Sign or Symptom | 0.999 | 0.998 | 0.998 | 807 |
275
+ | B-Spatial Concept | 0.997 | 0.997 | 0.997 | 4103 |
276
+ | I-Spatial Concept | 0.996 | 0.999 | 0.997 | 1473 |
277
+ | B-Therapeutic or Preventive Procedure | 0.998 | 0.998 | 0.998 | 6613 |
278
+ | I-Therapeutic or Preventive Procedure | 1.000 | 1.000 | 1.000 | 6523 |
279
+ | B-Tissue | 0.996 | 0.994 | 0.995 | 982 |
280
+ | I-Tissue | 0.995 | 0.998 | 0.997 | 597 |
281
+ | B-Vertebrate | 1.000 | 1.000 | 1.000 | 13 |
282
+ | I-Vertebrate | 0.000 | 0.000 | 0.000 | 3 |
283
+ | B-Virus | 0.999 | 1.000 | 0.999 | 696 |
284
+ | I-Virus | 1.000 | 1.000 | 1.000 | 1027 |
285
+ | B-Vitamin | 1.000 | 0.992 | 0.996 | 252 |
286
+ | I-Vitamin | 0.998 | 1.000 | 0.999 | 447 |
287
+ | macro avg | 0.959 | 0.959 | 0.959 | 805394 |
288
+ | weighted avg | 0.999 | 0.999 | 0.999 | 805394 |
289
+
290
+
291
+ # Performance on Validation Set
292
+
293
+ ## Span Level
294
+
295
+ | Label | Precision | Recall | F1-score | Support |
296
+ | --- | --- | --- | --- | --- |
297
+ | Acquired Abnormality | 0.109 | 0.182 | 0.136 | 33 |
298
+ | Amino Acid Sequence | 0.373 | 0.328 | 0.349 | 58 |
299
+ | Amino Acid, Peptide, or Protein | 0.310 | 0.278 | 0.293 | 187 |
300
+ | Amphibian | 1.000 | 0.143 | 0.250 | 7 |
301
+ | Anatomical Abnormality | 0.034 | 0.037 | 0.035 | 54 |
302
+ | Anatomical Structure | 0.625 | 0.263 | 0.370 | 38 |
303
+ | Animal | 0.615 | 0.747 | 0.675 | 75 |
304
+ | Antibiotic | 0.825 | 0.810 | 0.817 | 210 |
305
+ | Bacterium | 0.709 | 0.755 | 0.731 | 470 |
306
+ | Biologic Function | 0.389 | 0.479 | 0.429 | 142 |
307
+ | Biologically Active Substance | 0.583 | 0.641 | 0.611 | 1882 |
308
+ | Biomedical Occupation or Discipline | 0.584 | 0.475 | 0.524 | 198 |
309
+ | Biomedical or Dental Material | 0.395 | 0.428 | 0.411 | 304 |
310
+ | Bird | 0.846 | 0.759 | 0.800 | 116 |
311
+ | Body Location or Region | 0.288 | 0.393 | 0.332 | 163 |
312
+ | Body Part, Organ, or Organ Component | 0.639 | 0.665 | 0.652 | 1259 |
313
+ | Body Space or Junction | 0.442 | 0.476 | 0.458 | 145 |
314
+ | Body Substance | 0.641 | 0.725 | 0.681 | 269 |
315
+ | Body System | 0.611 | 0.629 | 0.620 | 105 |
316
+ | Cell | 0.713 | 0.733 | 0.723 | 1205 |
317
+ | Cell Component | 0.609 | 0.630 | 0.619 | 338 |
318
+ | Cell Function | 0.524 | 0.606 | 0.562 | 619 |
319
+ | Cell or Molecular Dysfunction | 0.635 | 0.595 | 0.614 | 111 |
320
+ | Chemical | 0.452 | 0.459 | 0.455 | 61 |
321
+ | Chemical Viewed Functionally | 0.362 | 0.382 | 0.372 | 55 |
322
+ | Chemical Viewed Structurally | 0.561 | 0.517 | 0.538 | 116 |
323
+ | Classification | 0.532 | 0.520 | 0.526 | 304 |
324
+ | Clinical Attribute | 0.545 | 0.522 | 0.534 | 404 |
325
+ | Congenital Abnormality | 0.481 | 0.694 | 0.568 | 36 |
326
+ | Diagnostic Procedure | 0.582 | 0.564 | 0.573 | 684 |
327
+ | Disease or Syndrome | 0.765 | 0.762 | 0.764 | 2365 |
328
+ | Drug Delivery Device | 0.000 | 0.000 | 0.000 | 6 |
329
+ | Element, Ion, or Isotope | 0.626 | 0.682 | 0.653 | 280 |
330
+ | Embryonic Structure | 0.698 | 0.769 | 0.732 | 39 |
331
+ | Enzyme | 0.647 | 0.728 | 0.685 | 569 |
332
+ | Eukaryote | 0.656 | 0.685 | 0.670 | 409 |
333
+ | Experimental Model of Disease | 0.377 | 0.547 | 0.446 | 53 |
334
+ | Finding | 0.355 | 0.348 | 0.351 | 2670 |
335
+ | Fish | 0.857 | 0.884 | 0.870 | 95 |
336
+ | Food | 0.585 | 0.641 | 0.611 | 259 |
337
+ | Fungus | 0.729 | 0.647 | 0.685 | 133 |
338
+ | Gene or Genome | 0.611 | 0.611 | 0.611 | 769 |
339
+ | Genetic Function | 0.644 | 0.680 | 0.662 | 519 |
340
+ | Geographic Area | 0.749 | 0.736 | 0.743 | 678 |
341
+ | Hazardous or Poisonous Substance | 0.598 | 0.620 | 0.609 | 221 |
342
+ | Health Care Activity | 0.450 | 0.446 | 0.448 | 981 |
343
+ | Health Care Related Organization | 0.532 | 0.559 | 0.545 | 367 |
344
+ | Hormone | 0.661 | 0.796 | 0.723 | 157 |
345
+ | Human | 0.766 | 0.883 | 0.820 | 137 |
346
+ | Immunologic Factor | 0.694 | 0.654 | 0.674 | 535 |
347
+ | Indicator, Reagent, or Diagnostic Aid | 0.342 | 0.385 | 0.363 | 231 |
348
+ | Injury or Poisoning | 0.621 | 0.625 | 0.623 | 435 |
349
+ | Inorganic Chemical | 0.688 | 0.688 | 0.688 | 285 |
350
+ | Intellectual Product | 0.480 | 0.465 | 0.472 | 1665 |
351
+ | Laboratory Procedure | 0.438 | 0.494 | 0.464 | 852 |
352
+ | Laboratory or Test Result | 0.144 | 0.189 | 0.163 | 132 |
353
+ | Mammal | 0.796 | 0.815 | 0.806 | 465 |
354
+ | Medical Device | 0.553 | 0.569 | 0.561 | 487 |
355
+ | Mental Process | 0.527 | 0.492 | 0.509 | 590 |
356
+ | Mental or Behavioral Dysfunction | 0.668 | 0.700 | 0.684 | 530 |
357
+ | Molecular Biology Research Technique | 0.510 | 0.667 | 0.578 | 195 |
358
+ | Molecular Function | 0.438 | 0.547 | 0.487 | 539 |
359
+ | Molecular Sequence | 0.071 | 0.125 | 0.091 | 8 |
360
+ | Neoplastic Process | 0.737 | 0.749 | 0.743 | 806 |
361
+ | Nucleic Acid, Nucleoside, or Nucleotide | 0.379 | 0.482 | 0.424 | 139 |
362
+ | Nucleotide Sequence | 0.404 | 0.423 | 0.414 | 130 |
363
+ | Organ or Tissue Function | 0.419 | 0.428 | 0.423 | 306 |
364
+ | Organic Chemical | 0.499 | 0.476 | 0.487 | 540 |
365
+ | Organism Function | 0.445 | 0.522 | 0.480 | 510 |
366
+ | Organization | 0.259 | 0.367 | 0.304 | 79 |
367
+ | Pathologic Function | 0.564 | 0.579 | 0.572 | 877 |
368
+ | Pharmacologic Substance | 0.634 | 0.682 | 0.657 | 1296 |
369
+ | Physiologic Function | 0.391 | 0.299 | 0.339 | 264 |
370
+ | Plant | 0.688 | 0.691 | 0.690 | 405 |
371
+ | Population Group | 0.661 | 0.620 | 0.640 | 1303 |
372
+ | Professional Society | 1.000 | 0.200 | 0.333 | 5 |
373
+ | Professional or Occupational Group | 0.643 | 0.734 | 0.685 | 365 |
374
+ | Receptor | 0.688 | 0.623 | 0.654 | 361 |
375
+ | Regulation or Law | 0.167 | 0.222 | 0.190 | 18 |
376
+ | Reptile | 1.000 | 0.222 | 0.364 | 18 |
377
+ | Research Activity | 0.590 | 0.525 | 0.556 | 1666 |
378
+ | Self-help or Relief Organization | 0.000 | 0.000 | 0.000 | 2 |
379
+ | Sign or Symptom | 0.581 | 0.634 | 0.607 | 350 |
380
+ | Spatial Concept | 0.431 | 0.439 | 0.435 | 1378 |
381
+ | Therapeutic or Preventive Procedure | 0.576 | 0.607 | 0.591 | 2131 |
382
+ | Tissue | 0.583 | 0.625 | 0.603 | 320 |
383
+ | Vertebrate | 1.000 | 0.500 | 0.667 | 6 |
384
+ | Virus | 0.690 | 0.777 | 0.731 | 224 |
385
+ | Vitamin | 0.860 | 0.607 | 0.712 | 61 |
386
+ | macro avg | 0.553 | 0.538 | 0.532 | 40864 |
387
+ | weighted avg | 0.575 | 0.588 | 0.580 | 40864 |
388
+
389
+ ## Token Level
390
+
391
+ | Label | Precision | Recall | F1-score | Support |
392
+ | --- | --- | --- | --- | --- |
393
+ | O | 0.938 | 0.939 | 0.939 | 187057 |
394
+ | B-Acquired Abnormality | 0.185 | 0.303 | 0.230 | 33 |
395
+ | I-Acquired Abnormality | 0.200 | 0.297 | 0.239 | 37 |
396
+ | B-Amino Acid Sequence | 0.522 | 0.414 | 0.462 | 58 |
397
+ | I-Amino Acid Sequence | 0.316 | 0.294 | 0.305 | 102 |
398
+ | B-Amino Acid, Peptide, or Protein | 0.346 | 0.292 | 0.317 | 185 |
399
+ | I-Amino Acid, Peptide, or Protein | 0.212 | 0.182 | 0.196 | 291 |
400
+ | B-Amphibian | 1.000 | 0.143 | 0.250 | 7 |
401
+ | I-Amphibian | 1.000 | 0.028 | 0.054 | 36 |
402
+ | B-Anatomical Abnormality | 0.041 | 0.038 | 0.039 | 53 |
403
+ | I-Anatomical Abnormality | 0.029 | 0.053 | 0.037 | 38 |
404
+ | B-Anatomical Structure | 0.625 | 0.263 | 0.370 | 38 |
405
+ | I-Anatomical Structure | 0.600 | 0.200 | 0.300 | 30 |
406
+ | B-Animal | 0.622 | 0.778 | 0.691 | 72 |
407
+ | I-Animal | 0.400 | 0.435 | 0.417 | 23 |
408
+ | B-Antibiotic | 0.856 | 0.832 | 0.844 | 208 |
409
+ | I-Antibiotic | 0.799 | 0.634 | 0.707 | 194 |
410
+ | B-Bacterium | 0.786 | 0.818 | 0.802 | 468 |
411
+ | I-Bacterium | 0.854 | 0.843 | 0.848 | 1112 |
412
+ | B-Biologic Function | 0.410 | 0.479 | 0.442 | 142 |
413
+ | I-Biologic Function | 0.328 | 0.367 | 0.346 | 60 |
414
+ | B-Biologically Active Substance | 0.633 | 0.674 | 0.653 | 1862 |
415
+ | I-Biologically Active Substance | 0.532 | 0.614 | 0.570 | 2209 |
416
+ | B-Biomedical Occupation or Discipline | 0.606 | 0.477 | 0.534 | 197 |
417
+ | I-Biomedical Occupation or Discipline | 0.615 | 0.370 | 0.462 | 173 |
418
+ | B-Biomedical or Dental Material | 0.438 | 0.486 | 0.461 | 292 |
419
+ | I-Biomedical or Dental Material | 0.427 | 0.262 | 0.325 | 447 |
420
+ | B-Bird | 0.875 | 0.798 | 0.835 | 114 |
421
+ | I-Bird | 0.938 | 0.822 | 0.876 | 219 |
422
+ | B-Body Location or Region | 0.323 | 0.429 | 0.368 | 163 |
423
+ | I-Body Location or Region | 0.333 | 0.417 | 0.370 | 144 |
424
+ | B-Body Part, Organ, or Organ Component | 0.690 | 0.712 | 0.701 | 1252 |
425
+ | I-Body Part, Organ, or Organ Component | 0.644 | 0.677 | 0.660 | 969 |
426
+ | B-Body Space or Junction | 0.482 | 0.472 | 0.477 | 142 |
427
+ | I-Body Space or Junction | 0.508 | 0.682 | 0.583 | 176 |
428
+ | B-Body Substance | 0.647 | 0.710 | 0.677 | 269 |
429
+ | I-Body Substance | 0.464 | 0.593 | 0.520 | 118 |
430
+ | B-Body System | 0.642 | 0.648 | 0.645 | 105 |
431
+ | I-Body System | 0.711 | 0.667 | 0.688 | 81 |
432
+ | B-Cell | 0.758 | 0.761 | 0.760 | 1200 |
433
+ | I-Cell | 0.727 | 0.855 | 0.786 | 1263 |
434
+ | B-Cell Component | 0.687 | 0.663 | 0.675 | 338 |
435
+ | I-Cell Component | 0.544 | 0.511 | 0.527 | 266 |
436
+ | B-Cell Function | 0.590 | 0.663 | 0.624 | 614 |
437
+ | I-Cell Function | 0.510 | 0.518 | 0.514 | 533 |
438
+ | B-Cell or Molecular Dysfunction | 0.687 | 0.613 | 0.648 | 111 |
439
+ | I-Cell or Molecular Dysfunction | 0.524 | 0.581 | 0.551 | 93 |
440
+ | B-Chemical | 0.483 | 0.483 | 0.483 | 60 |
441
+ | I-Chemical | 0.500 | 0.083 | 0.143 | 24 |
442
+ | B-Chemical Viewed Functionally | 0.362 | 0.382 | 0.372 | 55 |
443
+ | I-Chemical Viewed Functionally | 0.476 | 0.400 | 0.435 | 50 |
444
+ | B-Chemical Viewed Structurally | 0.614 | 0.534 | 0.571 | 116 |
445
+ | I-Chemical Viewed Structurally | 0.171 | 0.128 | 0.146 | 47 |
446
+ | B-Classification | 0.577 | 0.560 | 0.569 | 300 |
447
+ | I-Classification | 0.460 | 0.388 | 0.421 | 147 |
448
+ | B-Clinical Attribute | 0.607 | 0.579 | 0.593 | 401 |
449
+ | I-Clinical Attribute | 0.470 | 0.345 | 0.398 | 275 |
450
+ | B-Congenital Abnormality | 0.481 | 0.694 | 0.568 | 36 |
451
+ | I-Congenital Abnormality | 0.562 | 0.581 | 0.571 | 62 |
452
+ | B-Diagnostic Procedure | 0.697 | 0.635 | 0.665 | 680 |
453
+ | I-Diagnostic Procedure | 0.638 | 0.648 | 0.643 | 1000 |
454
+ | B-Disease or Syndrome | 0.790 | 0.788 | 0.789 | 2335 |
455
+ | I-Disease or Syndrome | 0.754 | 0.779 | 0.766 | 2454 |
456
+ | B-Drug Delivery Device | 0.000 | 0.000 | 0.000 | 6 |
457
+ | I-Drug Delivery Device | 0.000 | 0.000 | 0.000 | 3 |
458
+ | B-Element, Ion, or Isotope | 0.691 | 0.716 | 0.703 | 278 |
459
+ | I-Element, Ion, or Isotope | 0.608 | 0.734 | 0.665 | 192 |
460
+ | B-Embryonic Structure | 0.698 | 0.833 | 0.759 | 36 |
461
+ | I-Embryonic Structure | 0.889 | 0.889 | 0.889 | 9 |
462
+ | B-Enzyme | 0.693 | 0.767 | 0.728 | 566 |
463
+ | I-Enzyme | 0.704 | 0.758 | 0.730 | 770 |
464
+ | B-Eukaryote | 0.782 | 0.830 | 0.805 | 406 |
465
+ | I-Eukaryote | 0.828 | 0.878 | 0.852 | 950 |
466
+ | B-Experimental Model of Disease | 0.431 | 0.596 | 0.500 | 52 |
467
+ | I-Experimental Model of Disease | 0.360 | 0.300 | 0.327 | 60 |
468
+ | B-Finding | 0.410 | 0.383 | 0.396 | 2640 |
469
+ | I-Finding | 0.343 | 0.278 | 0.307 | 2209 |
470
+ | B-Fish | 0.825 | 0.904 | 0.863 | 94 |
471
+ | I-Fish | 0.788 | 0.732 | 0.759 | 142 |
472
+ | B-Food | 0.663 | 0.715 | 0.688 | 256 |
473
+ | I-Food | 0.625 | 0.689 | 0.655 | 244 |
474
+ | B-Fully Formed Anatomical Structure | 0.000 | 0.000 | 0.000 | 0 |
475
+ | I-Fully Formed Anatomical Structure | 0.000 | 0.000 | 0.000 | 0 |
476
+ | B-Fungus | 0.842 | 0.722 | 0.777 | 133 |
477
+ | I-Fungus | 0.881 | 0.776 | 0.825 | 210 |
478
+ | B-Gene or Genome | 0.682 | 0.677 | 0.679 | 764 |
479
+ | I-Gene or Genome | 0.734 | 0.658 | 0.694 | 1317 |
480
+ | B-Genetic Function | 0.695 | 0.706 | 0.701 | 517 |
481
+ | I-Genetic Function | 0.585 | 0.478 | 0.526 | 316 |
482
+ | B-Geographic Area | 0.815 | 0.789 | 0.802 | 674 |
483
+ | I-Geographic Area | 0.703 | 0.790 | 0.744 | 542 |
484
+ | B-Hazardous or Poisonous Substance | 0.661 | 0.658 | 0.659 | 219 |
485
+ | I-Hazardous or Poisonous Substance | 0.590 | 0.630 | 0.610 | 311 |
486
+ | B-Health Care Activity | 0.506 | 0.496 | 0.501 | 970 |
487
+ | I-Health Care Activity | 0.523 | 0.451 | 0.484 | 730 |
488
+ | B-Health Care Related Organization | 0.589 | 0.610 | 0.599 | 359 |
489
+ | I-Health Care Related Organization | 0.649 | 0.647 | 0.648 | 519 |
490
+ | B-Hormone | 0.691 | 0.828 | 0.754 | 157 |
491
+ | I-Hormone | 0.474 | 0.688 | 0.561 | 80 |
492
+ | B-Human | 0.778 | 0.898 | 0.834 | 137 |
493
+ | I-Human | 0.419 | 0.464 | 0.441 | 28 |
494
+ | B-Idea or Concept | 0.000 | 0.000 | 0.000 | 0 |
495
+ | I-Idea or Concept | 0.000 | 0.000 | 0.000 | 0 |
496
+ | B-Immunologic Factor | 0.766 | 0.680 | 0.721 | 535 |
497
+ | I-Immunologic Factor | 0.688 | 0.680 | 0.684 | 753 |
498
+ | B-Indicator, Reagent, or Diagnostic Aid | 0.450 | 0.476 | 0.463 | 227 |
499
+ | I-Indicator, Reagent, or Diagnostic Aid | 0.512 | 0.533 | 0.522 | 450 |
500
+ | B-Injury or Poisoning | 0.654 | 0.614 | 0.633 | 433 |
501
+ | I-Injury or Poisoning | 0.543 | 0.698 | 0.611 | 315 |
502
+ | B-Inorganic Chemical | 0.704 | 0.711 | 0.708 | 284 |
503
+ | I-Inorganic Chemical | 0.718 | 0.685 | 0.701 | 257 |
504
+ | B-Intellectual Product | 0.539 | 0.506 | 0.522 | 1653 |
505
+ | I-Intellectual Product | 0.557 | 0.486 | 0.519 | 2089 |
506
+ | B-Laboratory Procedure | 0.516 | 0.544 | 0.530 | 847 |
507
+ | I-Laboratory Procedure | 0.574 | 0.600 | 0.587 | 1237 |
508
+ | B-Laboratory or Test Result | 0.217 | 0.250 | 0.232 | 132 |
509
+ | I-Laboratory or Test Result | 0.199 | 0.224 | 0.211 | 152 |
510
+ | B-Mammal | 0.831 | 0.838 | 0.834 | 457 |
511
+ | I-Mammal | 0.840 | 0.849 | 0.844 | 345 |
512
+ | B-Medical Device | 0.604 | 0.603 | 0.604 | 486 |
513
+ | I-Medical Device | 0.605 | 0.604 | 0.605 | 685 |
514
+ | B-Mental Process | 0.567 | 0.523 | 0.544 | 583 |
515
+ | I-Mental Process | 0.609 | 0.533 | 0.568 | 304 |
516
+ | B-Mental or Behavioral Dysfunction | 0.711 | 0.735 | 0.723 | 529 |
517
+ | I-Mental or Behavioral Dysfunction | 0.700 | 0.670 | 0.685 | 397 |
518
+ | B-Molecular Biology Research Technique | 0.576 | 0.718 | 0.639 | 195 |
519
+ | I-Molecular Biology Research Technique | 0.712 | 0.724 | 0.718 | 409 |
520
+ | B-Molecular Function | 0.501 | 0.596 | 0.544 | 539 |
521
+ | I-Molecular Function | 0.449 | 0.483 | 0.466 | 532 |
522
+ | B-Molecular Sequence | 0.143 | 0.250 | 0.182 | 8 |
523
+ | I-Molecular Sequence | 0.222 | 0.286 | 0.250 | 7 |
524
+ | B-Neoplastic Process | 0.778 | 0.793 | 0.786 | 794 |
525
+ | I-Neoplastic Process | 0.774 | 0.788 | 0.781 | 905 |
526
+ | B-Nucleic Acid, Nucleoside, or Nucleotide | 0.453 | 0.600 | 0.517 | 130 |
527
+ | I-Nucleic Acid, Nucleoside, or Nucleotide | 0.392 | 0.396 | 0.394 | 197 |
528
+ | B-Nucleotide Sequence | 0.446 | 0.425 | 0.435 | 127 |
529
+ | I-Nucleotide Sequence | 0.316 | 0.524 | 0.394 | 103 |
530
+ | B-Organ or Tissue Function | 0.459 | 0.447 | 0.453 | 304 |
531
+ | I-Organ or Tissue Function | 0.478 | 0.481 | 0.480 | 270 |
532
+ | B-Organic Chemical | 0.537 | 0.484 | 0.509 | 539 |
533
+ | I-Organic Chemical | 0.450 | 0.363 | 0.402 | 1105 |
534
+ | B-Organism Function | 0.466 | 0.540 | 0.500 | 506 |
535
+ | I-Organism Function | 0.389 | 0.379 | 0.384 | 272 |
536
+ | B-Organization | 0.287 | 0.403 | 0.335 | 77 |
537
+ | I-Organization | 0.250 | 0.330 | 0.284 | 91 |
538
+ | B-Pathologic Function | 0.598 | 0.595 | 0.597 | 872 |
539
+ | I-Pathologic Function | 0.467 | 0.487 | 0.477 | 532 |
540
+ | B-Pharmacologic Substance | 0.670 | 0.718 | 0.693 | 1274 |
541
+ | I-Pharmacologic Substance | 0.602 | 0.666 | 0.633 | 1991 |
542
+ | B-Physiologic Function | 0.437 | 0.317 | 0.367 | 262 |
543
+ | I-Physiologic Function | 0.271 | 0.172 | 0.211 | 186 |
544
+ | B-Plant | 0.781 | 0.779 | 0.780 | 403 |
545
+ | I-Plant | 0.824 | 0.845 | 0.834 | 676 |
546
+ | B-Population Group | 0.712 | 0.659 | 0.684 | 1289 |
547
+ | I-Population Group | 0.489 | 0.467 | 0.478 | 486 |
548
+ | B-Professional Society | 0.500 | 0.200 | 0.286 | 5 |
549
+ | I-Professional Society | 1.000 | 0.250 | 0.400 | 8 |
550
+ | B-Professional or Occupational Group | 0.683 | 0.786 | 0.731 | 359 |
551
+ | I-Professional or Occupational Group | 0.631 | 0.613 | 0.622 | 302 |
552
+ | B-Receptor | 0.755 | 0.657 | 0.702 | 361 |
553
+ | I-Receptor | 0.686 | 0.647 | 0.666 | 541 |
554
+ | B-Regulation or Law | 0.286 | 0.333 | 0.308 | 18 |
555
+ | I-Regulation or Law | 0.640 | 0.627 | 0.634 | 51 |
556
+ | B-Reptile | 1.000 | 0.222 | 0.364 | 18 |
557
+ | I-Reptile | 1.000 | 0.115 | 0.207 | 52 |
558
+ | B-Research Activity | 0.639 | 0.559 | 0.597 | 1650 |
559
+ | I-Research Activity | 0.605 | 0.542 | 0.572 | 1274 |
560
+ | B-Self-help or Relief Organization | 0.000 | 0.000 | 0.000 | 2 |
561
+ | I-Self-help or Relief Organization | 0.000 | 0.000 | 0.000 | 4 |
562
+ | B-Sign or Symptom | 0.616 | 0.646 | 0.630 | 350 |
563
+ | I-Sign or Symptom | 0.504 | 0.513 | 0.509 | 232 |
564
+ | B-Spatial Concept | 0.485 | 0.481 | 0.483 | 1371 |
565
+ | I-Spatial Concept | 0.434 | 0.389 | 0.410 | 570 |
566
+ | B-Therapeutic or Preventive Procedure | 0.618 | 0.637 | 0.628 | 2112 |
567
+ | I-Therapeutic or Preventive Procedure | 0.611 | 0.623 | 0.617 | 1917 |
568
+ | B-Tissue | 0.620 | 0.649 | 0.634 | 319 |
569
+ | I-Tissue | 0.571 | 0.563 | 0.567 | 215 |
570
+ | B-Vertebrate | 1.000 | 0.500 | 0.667 | 6 |
571
+ | I-Vertebrate | 0.000 | 0.000 | 0.000 | 2 |
572
+ | B-Virus | 0.760 | 0.856 | 0.805 | 222 |
573
+ | I-Virus | 0.699 | 0.843 | 0.764 | 267 |
574
+ | B-Vitamin | 0.886 | 0.639 | 0.743 | 61 |
575
+ | I-Vitamin | 0.889 | 0.681 | 0.771 | 94 |
576
+ | macro avg | 0.560 | 0.533 | 0.533 | 269146 |
577
+ | weighted avg | 0.839 | 0.840 | 0.839 | 269146 |
578
+
579
+
580
+ # Performance on Testing Set
581
+
582
+ ## Span Level
583
+
584
+ | Label | Precision | Recall | F1-score | Support |
585
+ | --- | --- | --- | --- | --- |
586
+ | Acquired Abnormality | 0.273 | 0.240 | 0.255 | 50 |
587
+ | Amino Acid Sequence | 0.303 | 0.357 | 0.328 | 84 |
588
+ | Amino Acid, Peptide, or Protein | 0.258 | 0.277 | 0.267 | 166 |
589
+ | Anatomical Abnormality | 0.190 | 0.145 | 0.164 | 76 |
590
+ | Anatomical Structure | 0.160 | 0.308 | 0.211 | 13 |
591
+ | Animal | 0.605 | 0.742 | 0.667 | 93 |
592
+ | Antibiotic | 0.835 | 0.784 | 0.808 | 148 |
593
+ | Bacterium | 0.752 | 0.732 | 0.742 | 448 |
594
+ | Biologic Function | 0.341 | 0.369 | 0.355 | 157 |
595
+ | Biologically Active Substance | 0.586 | 0.640 | 0.612 | 2080 |
596
+ | Biomedical Occupation or Discipline | 0.437 | 0.444 | 0.441 | 196 |
597
+ | Biomedical or Dental Material | 0.369 | 0.452 | 0.406 | 197 |
598
+ | Bird | 0.810 | 0.819 | 0.814 | 83 |
599
+ | Body Location or Region | 0.386 | 0.461 | 0.420 | 232 |
600
+ | Body Part, Organ, or Organ Component | 0.565 | 0.604 | 0.584 | 1092 |
601
+ | Body Space or Junction | 0.272 | 0.311 | 0.290 | 90 |
602
+ | Body Substance | 0.542 | 0.731 | 0.622 | 212 |
603
+ | Body System | 0.554 | 0.511 | 0.532 | 90 |
604
+ | Cell | 0.680 | 0.737 | 0.708 | 924 |
605
+ | Cell Component | 0.589 | 0.637 | 0.612 | 311 |
606
+ | Cell Function | 0.484 | 0.599 | 0.535 | 499 |
607
+ | Cell or Molecular Dysfunction | 0.607 | 0.657 | 0.631 | 99 |
608
+ | Chemical | 0.348 | 0.333 | 0.340 | 72 |
609
+ | Chemical Viewed Functionally | 0.286 | 0.432 | 0.344 | 37 |
610
+ | Chemical Viewed Structurally | 0.443 | 0.427 | 0.435 | 82 |
611
+ | Classification | 0.520 | 0.544 | 0.532 | 309 |
612
+ | Clinical Attribute | 0.596 | 0.625 | 0.610 | 323 |
613
+ | Congenital Abnormality | 0.438 | 0.443 | 0.440 | 79 |
614
+ | Diagnostic Procedure | 0.672 | 0.648 | 0.660 | 735 |
615
+ | Disease or Syndrome | 0.757 | 0.774 | 0.766 | 2199 |
616
+ | Element, Ion, or Isotope | 0.713 | 0.657 | 0.684 | 385 |
617
+ | Embryonic Structure | 0.587 | 0.509 | 0.545 | 53 |
618
+ | Enzyme | 0.766 | 0.761 | 0.763 | 681 |
619
+ | Eukaryote | 0.745 | 0.793 | 0.768 | 397 |
620
+ | Experimental Model of Disease | 0.286 | 0.356 | 0.317 | 45 |
621
+ | Finding | 0.391 | 0.388 | 0.389 | 2759 |
622
+ | Fish | 1.000 | 0.947 | 0.973 | 19 |
623
+ | Food | 0.556 | 0.455 | 0.501 | 336 |
624
+ | Fully Formed Anatomical Structure | 0.000 | 0.000 | 0.000 | 1 |
625
+ | Fungus | 0.819 | 0.798 | 0.809 | 119 |
626
+ | Gene or Genome | 0.551 | 0.539 | 0.545 | 912 |
627
+ | Genetic Function | 0.598 | 0.646 | 0.621 | 652 |
628
+ | Geographic Area | 0.673 | 0.712 | 0.692 | 598 |
629
+ | Hazardous or Poisonous Substance | 0.513 | 0.522 | 0.518 | 293 |
630
+ | Health Care Activity | 0.487 | 0.458 | 0.472 | 1061 |
631
+ | Health Care Related Organization | 0.531 | 0.642 | 0.581 | 296 |
632
+ | Hormone | 0.806 | 0.746 | 0.775 | 189 |
633
+ | Human | 0.799 | 0.880 | 0.837 | 158 |
634
+ | Idea or Concept | 0.000 | 0.000 | 0.000 | 1 |
635
+ | Immunologic Factor | 0.674 | 0.606 | 0.638 | 434 |
636
+ | Indicator, Reagent, or Diagnostic Aid | 0.427 | 0.451 | 0.439 | 182 |
637
+ | Injury or Poisoning | 0.617 | 0.703 | 0.657 | 357 |
638
+ | Inorganic Chemical | 0.611 | 0.680 | 0.643 | 256 |
639
+ | Intellectual Product | 0.495 | 0.485 | 0.490 | 2075 |
640
+ | Laboratory Procedure | 0.445 | 0.452 | 0.448 | 908 |
641
+ | Laboratory or Test Result | 0.183 | 0.196 | 0.190 | 112 |
642
+ | Mammal | 0.778 | 0.838 | 0.807 | 456 |
643
+ | Medical Device | 0.434 | 0.437 | 0.435 | 355 |
644
+ | Mental Process | 0.546 | 0.546 | 0.546 | 740 |
645
+ | Mental or Behavioral Dysfunction | 0.710 | 0.774 | 0.741 | 518 |
646
+ | Molecular Biology Research Technique | 0.500 | 0.539 | 0.519 | 206 |
647
+ | Molecular Function | 0.504 | 0.555 | 0.528 | 719 |
648
+ | Molecular Sequence | 0.417 | 0.556 | 0.476 | 9 |
649
+ | Neoplastic Process | 0.761 | 0.745 | 0.753 | 918 |
650
+ | Nucleic Acid, Nucleoside, or Nucleotide | 0.331 | 0.450 | 0.381 | 109 |
651
+ | Nucleotide Sequence | 0.320 | 0.491 | 0.387 | 110 |
652
+ | Organ or Tissue Function | 0.482 | 0.425 | 0.452 | 247 |
653
+ | Organic Chemical | 0.396 | 0.464 | 0.427 | 511 |
654
+ | Organism Function | 0.462 | 0.518 | 0.488 | 471 |
655
+ | Organization | 0.270 | 0.442 | 0.335 | 77 |
656
+ | Pathologic Function | 0.541 | 0.541 | 0.541 | 669 |
657
+ | Pharmacologic Substance | 0.577 | 0.623 | 0.599 | 1258 |
658
+ | Physiologic Function | 0.283 | 0.286 | 0.284 | 182 |
659
+ | Plant | 0.603 | 0.618 | 0.610 | 403 |
660
+ | Population Group | 0.710 | 0.711 | 0.711 | 1263 |
661
+ | Professional Society | 0.000 | 0.000 | 0.000 | 7 |
662
+ | Professional or Occupational Group | 0.599 | 0.725 | 0.656 | 360 |
663
+ | Receptor | 0.614 | 0.686 | 0.648 | 271 |
664
+ | Regulation or Law | 0.182 | 0.125 | 0.148 | 16 |
665
+ | Reptile | 1.000 | 0.318 | 0.483 | 22 |
666
+ | Research Activity | 0.559 | 0.540 | 0.549 | 1653 |
667
+ | Self-help or Relief Organization | 0.000 | 0.000 | 0.000 | 2 |
668
+ | Sign or Symptom | 0.629 | 0.647 | 0.638 | 340 |
669
+ | Spatial Concept | 0.474 | 0.483 | 0.479 | 1282 |
670
+ | Therapeutic or Preventive Procedure | 0.609 | 0.617 | 0.613 | 2036 |
671
+ | Tissue | 0.562 | 0.525 | 0.543 | 259 |
672
+ | Vertebrate | 0.000 | 0.000 | 0.000 | 1 |
673
+ | Virus | 0.678 | 0.831 | 0.747 | 172 |
674
+ | Vitamin | 0.706 | 0.511 | 0.593 | 47 |
675
+ | macro avg | 0.507 | 0.525 | 0.512 | 40144 |
676
+ | weighted avg | 0.571 | 0.589 | 0.578 | 40144 |
677
+
678
+ ## Token Level
679
+
680
+ | Label | Precision | Recall | F1-score | Support |
681
+ | --- | --- | --- | --- | --- |
682
+ | O | 0.940 | 0.938 | 0.939 | 188640 |
683
+ | B-Acquired Abnormality | 0.279 | 0.240 | 0.258 | 50 |
684
+ | I-Acquired Abnormality | 0.371 | 0.283 | 0.321 | 46 |
685
+ | B-Amino Acid Sequence | 0.368 | 0.390 | 0.379 | 82 |
686
+ | I-Amino Acid Sequence | 0.525 | 0.503 | 0.514 | 189 |
687
+ | B-Amino Acid, Peptide, or Protein | 0.321 | 0.325 | 0.323 | 166 |
688
+ | I-Amino Acid, Peptide, or Protein | 0.173 | 0.211 | 0.190 | 161 |
689
+ | B-Amphibian | 0.000 | 0.000 | 0.000 | 0 |
690
+ | I-Amphibian | 0.000 | 0.000 | 0.000 | 0 |
691
+ | B-Anatomical Abnormality | 0.231 | 0.158 | 0.188 | 76 |
692
+ | I-Anatomical Abnormality | 0.296 | 0.154 | 0.203 | 104 |
693
+ | B-Anatomical Structure | 0.154 | 0.333 | 0.211 | 12 |
694
+ | I-Anatomical Structure | 0.000 | 0.000 | 0.000 | 11 |
695
+ | B-Animal | 0.646 | 0.793 | 0.712 | 92 |
696
+ | I-Animal | 0.500 | 0.512 | 0.506 | 41 |
697
+ | B-Antibiotic | 0.846 | 0.782 | 0.813 | 147 |
698
+ | I-Antibiotic | 0.727 | 0.566 | 0.636 | 99 |
699
+ | B-Bacterium | 0.836 | 0.791 | 0.813 | 446 |
700
+ | I-Bacterium | 0.904 | 0.822 | 0.861 | 1311 |
701
+ | B-Biologic Function | 0.381 | 0.394 | 0.387 | 155 |
702
+ | I-Biologic Function | 0.165 | 0.195 | 0.179 | 77 |
703
+ | B-Biologically Active Substance | 0.632 | 0.675 | 0.652 | 2070 |
704
+ | I-Biologically Active Substance | 0.581 | 0.633 | 0.606 | 2623 |
705
+ | B-Biomedical Occupation or Discipline | 0.532 | 0.508 | 0.520 | 195 |
706
+ | I-Biomedical Occupation or Discipline | 0.509 | 0.446 | 0.475 | 130 |
707
+ | B-Biomedical or Dental Material | 0.431 | 0.513 | 0.468 | 195 |
708
+ | I-Biomedical or Dental Material | 0.416 | 0.537 | 0.469 | 203 |
709
+ | B-Bird | 0.902 | 0.892 | 0.897 | 83 |
710
+ | I-Bird | 0.886 | 0.951 | 0.917 | 163 |
711
+ | B-Body Location or Region | 0.455 | 0.504 | 0.479 | 232 |
712
+ | I-Body Location or Region | 0.255 | 0.351 | 0.295 | 174 |
713
+ | B-Body Part, Organ, or Organ Component | 0.610 | 0.643 | 0.626 | 1086 |
714
+ | I-Body Part, Organ, or Organ Component | 0.579 | 0.630 | 0.604 | 865 |
715
+ | B-Body Space or Junction | 0.271 | 0.299 | 0.284 | 87 |
716
+ | I-Body Space or Junction | 0.383 | 0.500 | 0.434 | 108 |
717
+ | B-Body Substance | 0.546 | 0.722 | 0.622 | 212 |
718
+ | I-Body Substance | 0.529 | 0.618 | 0.570 | 102 |
719
+ | B-Body System | 0.605 | 0.557 | 0.580 | 88 |
720
+ | I-Body System | 0.629 | 0.619 | 0.624 | 63 |
721
+ | B-Cell | 0.721 | 0.765 | 0.743 | 920 |
722
+ | I-Cell | 0.747 | 0.750 | 0.748 | 1142 |
723
+ | B-Cell Component | 0.658 | 0.688 | 0.673 | 311 |
724
+ | I-Cell Component | 0.637 | 0.594 | 0.615 | 293 |
725
+ | B-Cell Function | 0.545 | 0.643 | 0.590 | 498 |
726
+ | I-Cell Function | 0.549 | 0.491 | 0.518 | 483 |
727
+ | B-Cell or Molecular Dysfunction | 0.684 | 0.677 | 0.680 | 99 |
728
+ | I-Cell or Molecular Dysfunction | 0.656 | 0.468 | 0.546 | 126 |
729
+ | B-Chemical | 0.371 | 0.361 | 0.366 | 72 |
730
+ | I-Chemical | 0.000 | 0.000 | 0.000 | 24 |
731
+ | B-Chemical Viewed Functionally | 0.340 | 0.459 | 0.391 | 37 |
732
+ | I-Chemical Viewed Functionally | 0.149 | 0.412 | 0.219 | 17 |
733
+ | B-Chemical Viewed Structurally | 0.545 | 0.512 | 0.528 | 82 |
734
+ | I-Chemical Viewed Structurally | 0.162 | 0.107 | 0.129 | 56 |
735
+ | B-Classification | 0.556 | 0.582 | 0.569 | 306 |
736
+ | I-Classification | 0.292 | 0.226 | 0.255 | 177 |
737
+ | B-Clinical Attribute | 0.612 | 0.637 | 0.624 | 322 |
738
+ | I-Clinical Attribute | 0.558 | 0.502 | 0.529 | 219 |
739
+ | B-Congenital Abnormality | 0.500 | 0.481 | 0.490 | 79 |
740
+ | I-Congenital Abnormality | 0.463 | 0.479 | 0.471 | 169 |
741
+ | B-Diagnostic Procedure | 0.731 | 0.686 | 0.708 | 732 |
742
+ | I-Diagnostic Procedure | 0.775 | 0.692 | 0.732 | 1102 |
743
+ | B-Disease or Syndrome | 0.788 | 0.803 | 0.795 | 2185 |
744
+ | I-Disease or Syndrome | 0.742 | 0.755 | 0.749 | 2326 |
745
+ | B-Drug Delivery Device | 0.000 | 0.000 | 0.000 | 0 |
746
+ | I-Drug Delivery Device | 0.000 | 0.000 | 0.000 | 0 |
747
+ | B-Element, Ion, or Isotope | 0.737 | 0.669 | 0.702 | 381 |
748
+ | I-Element, Ion, or Isotope | 0.714 | 0.622 | 0.665 | 225 |
749
+ | B-Embryonic Structure | 0.595 | 0.472 | 0.526 | 53 |
750
+ | I-Embryonic Structure | 0.545 | 0.486 | 0.514 | 37 |
751
+ | B-Enzyme | 0.791 | 0.781 | 0.786 | 680 |
752
+ | I-Enzyme | 0.823 | 0.779 | 0.801 | 1051 |
753
+ | B-Eukaryote | 0.820 | 0.899 | 0.858 | 396 |
754
+ | I-Eukaryote | 0.894 | 0.938 | 0.916 | 892 |
755
+ | B-Experimental Model of Disease | 0.353 | 0.400 | 0.375 | 45 |
756
+ | I-Experimental Model of Disease | 0.392 | 0.328 | 0.357 | 61 |
757
+ | B-Finding | 0.449 | 0.424 | 0.436 | 2741 |
758
+ | I-Finding | 0.353 | 0.323 | 0.337 | 2236 |
759
+ | B-Fish | 0.857 | 0.947 | 0.900 | 19 |
760
+ | I-Fish | 0.944 | 1.000 | 0.971 | 17 |
761
+ | B-Food | 0.650 | 0.499 | 0.564 | 335 |
762
+ | I-Food | 0.580 | 0.375 | 0.455 | 299 |
763
+ | B-Fully Formed Anatomical Structure | 0.000 | 0.000 | 0.000 | 1 |
764
+ | I-Fully Formed Anatomical Structure | 0.000 | 0.000 | 0.000 | 1 |
765
+ | B-Fungus | 0.913 | 0.890 | 0.901 | 118 |
766
+ | I-Fungus | 0.917 | 0.948 | 0.933 | 328 |
767
+ | B-Gene or Genome | 0.653 | 0.629 | 0.641 | 911 |
768
+ | I-Gene or Genome | 0.558 | 0.577 | 0.567 | 1329 |
769
+ | B-Genetic Function | 0.659 | 0.683 | 0.671 | 652 |
770
+ | I-Genetic Function | 0.523 | 0.372 | 0.435 | 489 |
771
+ | B-Geographic Area | 0.727 | 0.763 | 0.744 | 594 |
772
+ | I-Geographic Area | 0.672 | 0.698 | 0.685 | 557 |
773
+ | B-Hazardous or Poisonous Substance | 0.589 | 0.570 | 0.579 | 291 |
774
+ | I-Hazardous or Poisonous Substance | 0.427 | 0.527 | 0.472 | 300 |
775
+ | B-Health Care Activity | 0.548 | 0.498 | 0.522 | 1055 |
776
+ | I-Health Care Activity | 0.576 | 0.462 | 0.513 | 795 |
777
+ | B-Health Care Related Organization | 0.582 | 0.682 | 0.628 | 296 |
778
+ | I-Health Care Related Organization | 0.633 | 0.754 | 0.688 | 411 |
779
+ | B-Hormone | 0.831 | 0.757 | 0.792 | 189 |
780
+ | I-Hormone | 0.744 | 0.782 | 0.762 | 119 |
781
+ | B-Human | 0.824 | 0.892 | 0.856 | 157 |
782
+ | I-Human | 0.535 | 0.719 | 0.613 | 32 |
783
+ | B-Idea or Concept | 0.000 | 0.000 | 0.000 | 1 |
784
+ | I-Idea or Concept | 0.000 | 0.000 | 0.000 | 2 |
785
+ | B-Immunologic Factor | 0.738 | 0.653 | 0.693 | 432 |
786
+ | I-Immunologic Factor | 0.741 | 0.690 | 0.715 | 720 |
787
+ | B-Indicator, Reagent, or Diagnostic Aid | 0.469 | 0.462 | 0.465 | 182 |
788
+ | I-Indicator, Reagent, or Diagnostic Aid | 0.598 | 0.699 | 0.645 | 196 |
789
+ | B-Injury or Poisoning | 0.672 | 0.749 | 0.708 | 347 |
790
+ | I-Injury or Poisoning | 0.683 | 0.709 | 0.695 | 398 |
791
+ | B-Inorganic Chemical | 0.656 | 0.707 | 0.680 | 256 |
792
+ | I-Inorganic Chemical | 0.599 | 0.809 | 0.689 | 220 |
793
+ | B-Intellectual Product | 0.547 | 0.517 | 0.532 | 2065 |
794
+ | I-Intellectual Product | 0.552 | 0.541 | 0.547 | 2378 |
795
+ | B-Laboratory Procedure | 0.538 | 0.513 | 0.525 | 907 |
796
+ | I-Laboratory Procedure | 0.586 | 0.600 | 0.593 | 1390 |
797
+ | B-Laboratory or Test Result | 0.283 | 0.255 | 0.268 | 110 |
798
+ | I-Laboratory or Test Result | 0.189 | 0.174 | 0.181 | 132 |
799
+ | B-Mammal | 0.832 | 0.859 | 0.845 | 455 |
800
+ | I-Mammal | 0.649 | 0.633 | 0.641 | 278 |
801
+ | B-Medical Device | 0.508 | 0.460 | 0.483 | 354 |
802
+ | I-Medical Device | 0.451 | 0.470 | 0.460 | 566 |
803
+ | B-Mental Process | 0.582 | 0.585 | 0.583 | 727 |
804
+ | I-Mental Process | 0.419 | 0.545 | 0.474 | 299 |
805
+ | B-Mental or Behavioral Dysfunction | 0.741 | 0.798 | 0.768 | 515 |
806
+ | I-Mental or Behavioral Dysfunction | 0.670 | 0.670 | 0.670 | 412 |
807
+ | B-Molecular Biology Research Technique | 0.570 | 0.576 | 0.573 | 205 |
808
+ | I-Molecular Biology Research Technique | 0.621 | 0.598 | 0.609 | 353 |
809
+ | B-Molecular Function | 0.583 | 0.610 | 0.597 | 716 |
810
+ | I-Molecular Function | 0.526 | 0.526 | 0.526 | 741 |
811
+ | B-Molecular Sequence | 0.455 | 0.556 | 0.500 | 9 |
812
+ | I-Molecular Sequence | 0.600 | 0.833 | 0.698 | 18 |
813
+ | B-Neoplastic Process | 0.807 | 0.778 | 0.792 | 914 |
814
+ | I-Neoplastic Process | 0.792 | 0.772 | 0.782 | 1007 |
815
+ | B-Nucleic Acid, Nucleoside, or Nucleotide | 0.394 | 0.495 | 0.439 | 109 |
816
+ | I-Nucleic Acid, Nucleoside, or Nucleotide | 0.204 | 0.320 | 0.249 | 97 |
817
+ | B-Nucleotide Sequence | 0.365 | 0.532 | 0.433 | 109 |
818
+ | I-Nucleotide Sequence | 0.401 | 0.525 | 0.455 | 158 |
819
+ | B-Organ or Tissue Function | 0.551 | 0.457 | 0.500 | 247 |
820
+ | I-Organ or Tissue Function | 0.522 | 0.422 | 0.467 | 256 |
821
+ | B-Organic Chemical | 0.442 | 0.491 | 0.465 | 509 |
822
+ | I-Organic Chemical | 0.396 | 0.485 | 0.436 | 1003 |
823
+ | B-Organism Function | 0.495 | 0.537 | 0.515 | 471 |
824
+ | I-Organism Function | 0.314 | 0.338 | 0.326 | 207 |
825
+ | B-Organization | 0.293 | 0.442 | 0.352 | 77 |
826
+ | I-Organization | 0.327 | 0.467 | 0.385 | 75 |
827
+ | B-Pathologic Function | 0.569 | 0.558 | 0.564 | 661 |
828
+ | I-Pathologic Function | 0.517 | 0.469 | 0.492 | 512 |
829
+ | B-Pharmacologic Substance | 0.613 | 0.659 | 0.635 | 1248 |
830
+ | I-Pharmacologic Substance | 0.618 | 0.620 | 0.619 | 2121 |
831
+ | B-Physiologic Function | 0.288 | 0.282 | 0.285 | 181 |
832
+ | I-Physiologic Function | 0.393 | 0.315 | 0.350 | 111 |
833
+ | B-Plant | 0.702 | 0.712 | 0.707 | 403 |
834
+ | I-Plant | 0.650 | 0.763 | 0.702 | 528 |
835
+ | B-Population Group | 0.752 | 0.738 | 0.745 | 1261 |
836
+ | I-Population Group | 0.523 | 0.520 | 0.521 | 456 |
837
+ | B-Professional Society | 0.000 | 0.000 | 0.000 | 7 |
838
+ | I-Professional Society | 0.000 | 0.000 | 0.000 | 23 |
839
+ | B-Professional or Occupational Group | 0.639 | 0.760 | 0.695 | 359 |
840
+ | I-Professional or Occupational Group | 0.600 | 0.707 | 0.649 | 208 |
841
+ | B-Receptor | 0.648 | 0.730 | 0.686 | 270 |
842
+ | I-Receptor | 0.491 | 0.684 | 0.572 | 367 |
843
+ | B-Regulation or Law | 0.273 | 0.188 | 0.222 | 16 |
844
+ | I-Regulation or Law | 0.000 | 0.000 | 0.000 | 14 |
845
+ | B-Reptile | 1.000 | 0.136 | 0.240 | 22 |
846
+ | I-Reptile | 1.000 | 0.160 | 0.276 | 50 |
847
+ | B-Research Activity | 0.607 | 0.572 | 0.589 | 1642 |
848
+ | I-Research Activity | 0.658 | 0.619 | 0.638 | 1276 |
849
+ | B-Self-help or Relief Organization | 0.000 | 0.000 | 0.000 | 2 |
850
+ | I-Self-help or Relief Organization | 0.000 | 0.000 | 0.000 | 2 |
851
+ | B-Sign or Symptom | 0.676 | 0.684 | 0.680 | 335 |
852
+ | I-Sign or Symptom | 0.457 | 0.500 | 0.478 | 204 |
853
+ | B-Spatial Concept | 0.504 | 0.503 | 0.503 | 1273 |
854
+ | I-Spatial Concept | 0.480 | 0.512 | 0.495 | 512 |
855
+ | B-Therapeutic or Preventive Procedure | 0.660 | 0.656 | 0.658 | 2017 |
856
+ | I-Therapeutic or Preventive Procedure | 0.687 | 0.660 | 0.673 | 2012 |
857
+ | B-Tissue | 0.633 | 0.559 | 0.593 | 256 |
858
+ | I-Tissue | 0.620 | 0.444 | 0.517 | 239 |
859
+ | B-Vertebrate | 1.000 | 1.000 | 1.000 | 1 |
860
+ | I-Vertebrate | 0.000 | 0.000 | 0.000 | 1 |
861
+ | B-Virus | 0.731 | 0.901 | 0.807 | 172 |
862
+ | I-Virus | 0.723 | 0.923 | 0.811 | 209 |
863
+ | B-Vitamin | 0.758 | 0.532 | 0.625 | 47 |
864
+ | I-Vitamin | 0.582 | 0.561 | 0.571 | 57 |
865
+ | macro avg | 0.525 | 0.527 | 0.519 | 270152 |
866
+ | weighted avg | 0.842 | 0.841 | 0.841 | 270152 |
867
+
868
+
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f0e2ec907ee89aeff3b80d487b543a220d4abdba877500be5e721c1d7460494e
3
+ size 14244
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "extra_special_tokens": {},
49
+ "mask_token": "[MASK]",
50
+ "model_max_length": 512,
51
+ "never_split": null,
52
+ "pad_token": "[PAD]",
53
+ "sep_token": "[SEP]",
54
+ "strip_accents": null,
55
+ "tokenize_chinese_chars": true,
56
+ "tokenizer_class": "BertTokenizer",
57
+ "unk_token": "[UNK]"
58
+ }
trainer_state.json ADDED
@@ -0,0 +1,403 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_metric": 0.5446548032758927,
3
+ "best_model_checkpoint": "tmp_ner_damsay_304130/run-36/checkpoint-4125",
4
+ "epoch": 25.0,
5
+ "eval_steps": 500,
6
+ "global_step": 4125,
7
+ "is_hyper_param_search": true,
8
+ "is_local_process_zero": true,
9
+ "is_world_process_zero": true,
10
+ "log_history": [
11
+ {
12
+ "epoch": 1.0,
13
+ "eval_accuracy": 0.7470146314639639,
14
+ "eval_loss": 1.2898669242858887,
15
+ "eval_macro_f1": 0.060475942039181926,
16
+ "eval_macro_precision": 0.09351501221756969,
17
+ "eval_macro_recall": 0.06591547068187953,
18
+ "eval_runtime": 3.9883,
19
+ "eval_samples_per_second": 220.142,
20
+ "eval_steps_per_second": 27.58,
21
+ "step": 165
22
+ },
23
+ {
24
+ "epoch": 2.0,
25
+ "eval_accuracy": 0.8084088190052983,
26
+ "eval_loss": 0.8025009632110596,
27
+ "eval_macro_f1": 0.24340420766720258,
28
+ "eval_macro_precision": 0.3139630221874961,
29
+ "eval_macro_recall": 0.24833685017157406,
30
+ "eval_runtime": 3.8063,
31
+ "eval_samples_per_second": 230.671,
32
+ "eval_steps_per_second": 28.9,
33
+ "step": 330
34
+ },
35
+ {
36
+ "epoch": 3.0,
37
+ "eval_accuracy": 0.8305009177175213,
38
+ "eval_loss": 0.6665005683898926,
39
+ "eval_macro_f1": 0.37091081081224914,
40
+ "eval_macro_precision": 0.4480773602522756,
41
+ "eval_macro_recall": 0.36393297647770273,
42
+ "eval_runtime": 3.7531,
43
+ "eval_samples_per_second": 233.938,
44
+ "eval_steps_per_second": 29.309,
45
+ "step": 495
46
+ },
47
+ {
48
+ "epoch": 3.0303030303030303,
49
+ "grad_norm": 1.6078789234161377,
50
+ "learning_rate": 7.735247090033069e-05,
51
+ "loss": 1.4056,
52
+ "step": 500
53
+ },
54
+ {
55
+ "epoch": 4.0,
56
+ "eval_accuracy": 0.8344504469693029,
57
+ "eval_loss": 0.6422820091247559,
58
+ "eval_macro_f1": 0.43071408069652817,
59
+ "eval_macro_precision": 0.48878382034336376,
60
+ "eval_macro_recall": 0.42148708788382616,
61
+ "eval_runtime": 3.8104,
62
+ "eval_samples_per_second": 230.421,
63
+ "eval_steps_per_second": 28.868,
64
+ "step": 660
65
+ },
66
+ {
67
+ "epoch": 5.0,
68
+ "eval_accuracy": 0.8393288401090858,
69
+ "eval_loss": 0.6265012621879578,
70
+ "eval_macro_f1": 0.47492811430942733,
71
+ "eval_macro_precision": 0.5136390951479204,
72
+ "eval_macro_recall": 0.46552778262584993,
73
+ "eval_runtime": 3.749,
74
+ "eval_samples_per_second": 234.199,
75
+ "eval_steps_per_second": 29.342,
76
+ "step": 825
77
+ },
78
+ {
79
+ "epoch": 6.0,
80
+ "eval_accuracy": 0.8411568442406724,
81
+ "eval_loss": 0.6596290469169617,
82
+ "eval_macro_f1": 0.4931684511043183,
83
+ "eval_macro_precision": 0.5359461122068587,
84
+ "eval_macro_recall": 0.48533559234357204,
85
+ "eval_runtime": 3.8138,
86
+ "eval_samples_per_second": 230.215,
87
+ "eval_steps_per_second": 28.842,
88
+ "step": 990
89
+ },
90
+ {
91
+ "epoch": 6.0606060606060606,
92
+ "grad_norm": 1.48198664188385,
93
+ "learning_rate": 6.926120825385258e-05,
94
+ "loss": 0.4174,
95
+ "step": 1000
96
+ },
97
+ {
98
+ "epoch": 7.0,
99
+ "eval_accuracy": 0.8377386251328275,
100
+ "eval_loss": 0.6801306009292603,
101
+ "eval_macro_f1": 0.5105261331060333,
102
+ "eval_macro_precision": 0.5298621091537271,
103
+ "eval_macro_recall": 0.5130394897405378,
104
+ "eval_runtime": 3.743,
105
+ "eval_samples_per_second": 234.574,
106
+ "eval_steps_per_second": 29.389,
107
+ "step": 1155
108
+ },
109
+ {
110
+ "epoch": 8.0,
111
+ "eval_accuracy": 0.8405512249856955,
112
+ "eval_loss": 0.7128350138664246,
113
+ "eval_macro_f1": 0.5145736070824514,
114
+ "eval_macro_precision": 0.5338232435701135,
115
+ "eval_macro_recall": 0.5148104452511267,
116
+ "eval_runtime": 3.8105,
117
+ "eval_samples_per_second": 230.418,
118
+ "eval_steps_per_second": 28.868,
119
+ "step": 1320
120
+ },
121
+ {
122
+ "epoch": 9.0,
123
+ "eval_accuracy": 0.8356208154681846,
124
+ "eval_loss": 0.7495563626289368,
125
+ "eval_macro_f1": 0.5078761754368146,
126
+ "eval_macro_precision": 0.5215628937215542,
127
+ "eval_macro_recall": 0.510509104208083,
128
+ "eval_runtime": 3.8142,
129
+ "eval_samples_per_second": 230.191,
130
+ "eval_steps_per_second": 28.839,
131
+ "step": 1485
132
+ },
133
+ {
134
+ "epoch": 9.090909090909092,
135
+ "grad_norm": 1.2219234704971313,
136
+ "learning_rate": 6.116994560737448e-05,
137
+ "loss": 0.207,
138
+ "step": 1500
139
+ },
140
+ {
141
+ "epoch": 10.0,
142
+ "eval_accuracy": 0.8404137531302713,
143
+ "eval_loss": 0.7915265560150146,
144
+ "eval_macro_f1": 0.5284150089162629,
145
+ "eval_macro_precision": 0.5652985249581868,
146
+ "eval_macro_recall": 0.5173389157983858,
147
+ "eval_runtime": 3.8115,
148
+ "eval_samples_per_second": 230.354,
149
+ "eval_steps_per_second": 28.86,
150
+ "step": 1650
151
+ },
152
+ {
153
+ "epoch": 11.0,
154
+ "eval_accuracy": 0.8365311020784258,
155
+ "eval_loss": 0.8357340097427368,
156
+ "eval_macro_f1": 0.5241371167696747,
157
+ "eval_macro_precision": 0.5418612467571973,
158
+ "eval_macro_recall": 0.5268409732620435,
159
+ "eval_runtime": 3.8104,
160
+ "eval_samples_per_second": 230.422,
161
+ "eval_steps_per_second": 28.868,
162
+ "step": 1815
163
+ },
164
+ {
165
+ "epoch": 12.0,
166
+ "eval_accuracy": 0.83948488924227,
167
+ "eval_loss": 0.860895574092865,
168
+ "eval_macro_f1": 0.5274542995604197,
169
+ "eval_macro_precision": 0.5642301005555995,
170
+ "eval_macro_recall": 0.5189445167674911,
171
+ "eval_runtime": 3.7927,
172
+ "eval_samples_per_second": 231.498,
173
+ "eval_steps_per_second": 29.003,
174
+ "step": 1980
175
+ },
176
+ {
177
+ "epoch": 12.121212121212121,
178
+ "grad_norm": 1.4894951581954956,
179
+ "learning_rate": 5.307868296089637e-05,
180
+ "loss": 0.1106,
181
+ "step": 2000
182
+ },
183
+ {
184
+ "epoch": 13.0,
185
+ "eval_accuracy": 0.837367079577627,
186
+ "eval_loss": 0.8975165486335754,
187
+ "eval_macro_f1": 0.5324199411604539,
188
+ "eval_macro_precision": 0.545792242605528,
189
+ "eval_macro_recall": 0.5377469764932828,
190
+ "eval_runtime": 3.7913,
191
+ "eval_samples_per_second": 231.585,
192
+ "eval_steps_per_second": 29.014,
193
+ "step": 2145
194
+ },
195
+ {
196
+ "epoch": 14.0,
197
+ "eval_accuracy": 0.831667570760851,
198
+ "eval_loss": 0.9421281218528748,
199
+ "eval_macro_f1": 0.5262611445373023,
200
+ "eval_macro_precision": 0.5347627543444692,
201
+ "eval_macro_recall": 0.5417363003088691,
202
+ "eval_runtime": 3.804,
203
+ "eval_samples_per_second": 230.807,
204
+ "eval_steps_per_second": 28.917,
205
+ "step": 2310
206
+ },
207
+ {
208
+ "epoch": 15.0,
209
+ "eval_accuracy": 0.8381064552324761,
210
+ "eval_loss": 0.9438627362251282,
211
+ "eval_macro_f1": 0.5348984394876934,
212
+ "eval_macro_precision": 0.546671824271427,
213
+ "eval_macro_recall": 0.5379298790499474,
214
+ "eval_runtime": 3.8089,
215
+ "eval_samples_per_second": 230.516,
216
+ "eval_steps_per_second": 28.88,
217
+ "step": 2475
218
+ },
219
+ {
220
+ "epoch": 15.151515151515152,
221
+ "grad_norm": 0.9380698800086975,
222
+ "learning_rate": 4.498742031441827e-05,
223
+ "loss": 0.0622,
224
+ "step": 2500
225
+ },
226
+ {
227
+ "epoch": 16.0,
228
+ "eval_accuracy": 0.835973783745625,
229
+ "eval_loss": 0.9927621483802795,
230
+ "eval_macro_f1": 0.5346150876236929,
231
+ "eval_macro_precision": 0.5412591223724303,
232
+ "eval_macro_recall": 0.5415646974890974,
233
+ "eval_runtime": 3.8119,
234
+ "eval_samples_per_second": 230.328,
235
+ "eval_steps_per_second": 28.857,
236
+ "step": 2640
237
+ },
238
+ {
239
+ "epoch": 17.0,
240
+ "eval_accuracy": 0.8385114398876446,
241
+ "eval_loss": 1.00917387008667,
242
+ "eval_macro_f1": 0.5375529028664181,
243
+ "eval_macro_precision": 0.5691165677941541,
244
+ "eval_macro_recall": 0.5349126486369854,
245
+ "eval_runtime": 3.8075,
246
+ "eval_samples_per_second": 230.599,
247
+ "eval_steps_per_second": 28.891,
248
+ "step": 2805
249
+ },
250
+ {
251
+ "epoch": 18.0,
252
+ "eval_accuracy": 0.8392285228091816,
253
+ "eval_loss": 1.0304529666900635,
254
+ "eval_macro_f1": 0.5373729743137114,
255
+ "eval_macro_precision": 0.5576924127069764,
256
+ "eval_macro_recall": 0.539334821981497,
257
+ "eval_runtime": 3.8101,
258
+ "eval_samples_per_second": 230.437,
259
+ "eval_steps_per_second": 28.87,
260
+ "step": 2970
261
+ },
262
+ {
263
+ "epoch": 18.181818181818183,
264
+ "grad_norm": 1.10272216796875,
265
+ "learning_rate": 3.689615766794016e-05,
266
+ "loss": 0.0365,
267
+ "step": 3000
268
+ },
269
+ {
270
+ "epoch": 19.0,
271
+ "eval_accuracy": 0.8382290652656922,
272
+ "eval_loss": 1.0565452575683594,
273
+ "eval_macro_f1": 0.5382182396542954,
274
+ "eval_macro_precision": 0.5617383574248755,
275
+ "eval_macro_recall": 0.5399115920507231,
276
+ "eval_runtime": 3.8119,
277
+ "eval_samples_per_second": 230.33,
278
+ "eval_steps_per_second": 28.857,
279
+ "step": 3135
280
+ },
281
+ {
282
+ "epoch": 20.0,
283
+ "eval_accuracy": 0.8372110304444428,
284
+ "eval_loss": 1.0832605361938477,
285
+ "eval_macro_f1": 0.5426999278930784,
286
+ "eval_macro_precision": 0.5539906546541987,
287
+ "eval_macro_recall": 0.5517989175520424,
288
+ "eval_runtime": 3.8083,
289
+ "eval_samples_per_second": 230.549,
290
+ "eval_steps_per_second": 28.884,
291
+ "step": 3300
292
+ },
293
+ {
294
+ "epoch": 21.0,
295
+ "eval_accuracy": 0.839499751064478,
296
+ "eval_loss": 1.0868653059005737,
297
+ "eval_macro_f1": 0.53917903306677,
298
+ "eval_macro_precision": 0.5749497192899481,
299
+ "eval_macro_recall": 0.5373986303762837,
300
+ "eval_runtime": 3.8141,
301
+ "eval_samples_per_second": 230.199,
302
+ "eval_steps_per_second": 28.84,
303
+ "step": 3465
304
+ },
305
+ {
306
+ "epoch": 21.21212121212121,
307
+ "grad_norm": 0.46956342458724976,
308
+ "learning_rate": 2.8804895021462055e-05,
309
+ "loss": 0.0224,
310
+ "step": 3500
311
+ },
312
+ {
313
+ "epoch": 22.0,
314
+ "eval_accuracy": 0.8391096282315175,
315
+ "eval_loss": 1.1133960485458374,
316
+ "eval_macro_f1": 0.5443083566401578,
317
+ "eval_macro_precision": 0.5641662348298077,
318
+ "eval_macro_recall": 0.5481264782508153,
319
+ "eval_runtime": 3.813,
320
+ "eval_samples_per_second": 230.266,
321
+ "eval_steps_per_second": 28.849,
322
+ "step": 3630
323
+ },
324
+ {
325
+ "epoch": 23.0,
326
+ "eval_accuracy": 0.838050723399196,
327
+ "eval_loss": 1.1350845098495483,
328
+ "eval_macro_f1": 0.544210275681084,
329
+ "eval_macro_precision": 0.5622845529634525,
330
+ "eval_macro_recall": 0.5482077780076627,
331
+ "eval_runtime": 3.8169,
332
+ "eval_samples_per_second": 230.032,
333
+ "eval_steps_per_second": 28.82,
334
+ "step": 3795
335
+ },
336
+ {
337
+ "epoch": 24.0,
338
+ "eval_accuracy": 0.838050723399196,
339
+ "eval_loss": 1.1593064069747925,
340
+ "eval_macro_f1": 0.5416925123679783,
341
+ "eval_macro_precision": 0.5680966862603153,
342
+ "eval_macro_recall": 0.5448186956610225,
343
+ "eval_runtime": 3.8224,
344
+ "eval_samples_per_second": 229.699,
345
+ "eval_steps_per_second": 28.778,
346
+ "step": 3960
347
+ },
348
+ {
349
+ "epoch": 24.242424242424242,
350
+ "grad_norm": 0.9432898759841919,
351
+ "learning_rate": 2.071363237498395e-05,
352
+ "loss": 0.0149,
353
+ "step": 4000
354
+ },
355
+ {
356
+ "epoch": 25.0,
357
+ "eval_accuracy": 0.8398712966196785,
358
+ "eval_loss": 1.1621172428131104,
359
+ "eval_macro_f1": 0.5446548032758927,
360
+ "eval_macro_precision": 0.572924667254898,
361
+ "eval_macro_recall": 0.5453082375999836,
362
+ "eval_runtime": 3.8202,
363
+ "eval_samples_per_second": 229.832,
364
+ "eval_steps_per_second": 28.794,
365
+ "step": 4125
366
+ }
367
+ ],
368
+ "logging_steps": 500,
369
+ "max_steps": 5280,
370
+ "num_input_tokens_seen": 0,
371
+ "num_train_epochs": 32,
372
+ "save_steps": 500,
373
+ "stateful_callbacks": {
374
+ "EarlyStoppingCallback": {
375
+ "args": {
376
+ "early_stopping_patience": 3,
377
+ "early_stopping_threshold": 0.001
378
+ },
379
+ "attributes": {
380
+ "early_stopping_patience_counter": 3
381
+ }
382
+ },
383
+ "TrainerControl": {
384
+ "args": {
385
+ "should_epoch_stop": false,
386
+ "should_evaluate": false,
387
+ "should_log": false,
388
+ "should_save": true,
389
+ "should_training_stop": true
390
+ },
391
+ "attributes": {}
392
+ }
393
+ },
394
+ "total_flos": 1.745821766419731e+16,
395
+ "train_batch_size": 16,
396
+ "trial_name": null,
397
+ "trial_params": {
398
+ "learning_rate": 7.86794379743531e-05,
399
+ "per_device_train_batch_size": 16,
400
+ "warmup_ratio": 0.07903396276412193,
401
+ "weight_decay": 0.06816454557507429
402
+ }
403
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff