Model Card for cisco-ai/SecureBERT2.0-NER
The Secure Modern BERT NER Model is a fine-tuned transformer based on SecureBERT 2.0, designed for Named Entity Recognition (NER) in cybersecurity text.
It extracts domain-specific entities such as Indicators, Malware, Organizations, Systems, and Vulnerabilities from unstructured data sources like threat reports, incident analyses, advisories, and blogs.
NER in cybersecurity enables:
- Automated extraction of indicators of compromise (IOCs)
- Structuring of unstructured threat intelligence text
- Improved situational awareness for analysts
- Faster incident response and vulnerability triage
Model Details
Model Description
- Developed by: Cisco AI
- Model Type: ModernBertForTokenClassification
- Framework: TensorFlow / Transformers
- Tokenizer Type: PreTrainedTokenizerFast
- Number of Labels: 11
- Task: Named Entity Recognition (NER)
- License: Apache-2.0
- Language: English
- Base Model: cisco-ai/SecureBERT2.0
Supported Entity Labels
| Entity | Description |
|---|---|
B-Indicator, I-Indicator |
Indicators of Compromise (e.g., IPs, domains, hashes) |
B-Malware, I-Malware |
Malware or exploit names |
B-Organization, I-Organization |
Companies or groups mentioned |
B-System, I-System |
Affected software or platforms |
B-Vulnerability, I-Vulnerability |
Specific CVEs or flaw descriptions |
O |
Outside token |
Model Configuration
| Parameter | Value |
|---|---|
| Hidden size | 768 |
| Intermediate size | 1152 |
| Hidden layers | 22 |
| Attention heads | 12 |
| Max sequence length | 8192 |
| Vocabulary size | 50368 |
| Activation | GELU |
| Dropout | 0.0 (embedding, attention, MLP, classifier) |
Uses
Direct Use
- Named Entity Recognition (NER) on cybersecurity text
- Threat intelligence enrichment
- IOC extraction and normalization
- Incident report analysis
- Vulnerability mention detection
Downstream Use
This model can be integrated into:
- Threat intelligence platforms (TIPs)
- SOC automation tools
- Cybersecurity knowledge graphs
- Vulnerability management and CVE monitoring systems
Out-of-Scope Use
- Non-technical or general-domain NER tasks
- Generative or conversational AI applications
Benchmark Cybersecurity NER Corpus
Dataset Overview
| Aspect | Description |
|---|---|
| Purpose | Benchmark dataset for extracting cybersecurity entities from unstructured reports |
| Data Source | Curated threat intelligence documents emphasizing malware and system analysis |
| Annotation Methodology | Fully hand-labeled by domain experts |
| Entity Types | Malware, Indicator, System, Organization, Vulnerability |
| Size | 3.4k training samples + 717 test samples |
How to Get Started with the Model
Example Usage (Transformers)
from transformers import AutoTokenizer, TFAutoModelForTokenClassification, pipeline
model_name = "cisco-ai/SecureBERT2.0-NER"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = TFAutoModelForTokenClassification.from_pretrained(model_name)
ner_pipeline = pipeline("ner", model=model, tokenizer=tokenizer)
text = "Stealc malware targets browser cookies and passwords."
entities = ner_pipeline(text)
print(entities)
Training Details
Training Objective and Procedure
The SecureBERT2.0-NER was fine-tuned for token-level classification on cybersecurity text using Cross Entropy Loss.
Training focused on accurately classifying entity boundaries and types across five cybersecurity-specific categories: Malware, Indicator, System, Organization, and Vulnerability.
The AdamW optimizer was used with a linear learning rate scheduler, and gradient clipping ensured stability during fine-tuning.
Training Configuration
| Setting | Value |
|---|---|
| Objective | Token-wise Cross Entropy |
| Optimizer | AdamW |
| Learning Rate | 1e-5 |
| Weight Decay | 0.001 |
| Batch Size per GPU | 8 |
| Epochs | 20 |
| Max Sequence Length | 1024 |
| Gradient Clipping Norm | 1.0 |
| Scheduler | Linear |
| Mixed Precision | fp16 |
| Framework | TensorFlow / Transformers |
Training Dataset
The model was fine-tuned on a cybersecurity-specific NER corpus, containing annotated threat intelligence reports, advisories, and technical documentation.
| Property | Description |
|---|---|
| Dataset Type | Manually annotated corpus |
| Language | English |
| Entity Types | Malware, Indicator, System, Organization, Vulnerability |
| Train Size | 3,400 samples |
| Test Size | 717 samples |
| Annotation Method | Expert hand-labeling for accuracy and consistency |
Preprocessing
- Texts were tokenized using the
PreTrainedTokenizerFasttokenizer from SecureBERT 2.0. - All sequences were truncated or padded to 1024 tokens.
- Labels were aligned with subword tokens to maintain token–label consistency.
Hardware and Training Setup
| Component | Description |
|---|---|
| GPUs Used | 8× NVIDIA A100 |
| Precision | Mixed precision (fp16) |
| Batch Size | 8 per GPU |
| Framework | Transformers (TensorFlow backend) |
Optimization Summary
The model converged after approximately 20 epochs, with loss stabilizing at a low level.
Validation metrics (F1, precision, recall) showed steady improvement from epoch 3 onward, confirming effective domain-specific adaptation.
Evaluation
Testing Data, Factors & Metrics
Testing Data
Evaluation was conducted on a cybersecurity-specific NER benchmark corpus containing annotated threat reports, advisories, and incident analysis texts.
This benchmark includes five key entity types: Malware, Indicator, System, Organization, and Vulnerability.
Metrics
The following metrics were used to assess model performance:
- F1-score: Harmonic mean of precision and recall
- Recall: Measures how many true entities were correctly identified
- Precision: Measures how many predicted entities were correct
Results
| Model | F1 | Recall | Precision |
|---|---|---|---|
| CyBERT | 0.351 | 0.281 | 0.467 |
| SecureBERT | 0.734 | 0.759 | 0.717 |
| SecureBERT 2.0 (Ours) | 0.945 | 0.965 | 0.927 |
Summary
The SecureBERT 2.0 NER model significantly outperforms both CyBERT and the original SecureBERT across all metrics.
- It achieves a F1-score of 0.945, a +21% absolute improvement over SecureBERT.
- Its recall (0.965) indicates excellent coverage of cybersecurity entities.
- Its precision (0.927) shows strong accuracy and low false-positive rates.
This demonstrates that domain-adaptive pretraining and fine-tuning on cybersecurity corpora dramatically improves NER performance compared to general or earlier models.
Reference
@article{aghaei2025securebert,
title={SecureBERT 2.0: Advanced Language Model for Cybersecurity Intelligence},
author={Aghaei, Ehsan and Jain, Sarthak and Arun, Prashanth and Sambamoorthy, Arjun},
journal={arXiv preprint arXiv:2510.00240},
year={2025}
}
Model Card Authors
Cisco AI
Model Card Contact
For inquiries, please contact [email protected]
- Downloads last month
- 6
Model tree for cisco-ai/SecureBERT2.0-NER
Base model
answerdotai/ModernBERT-base