π§ Phishing Detection Model (BERT-Large-Uncased)
A transformer-based model fine-tuned to detect phishing content across multiple formats β including emails, URLs, SMS messages, and scripts.
Built on BERT-Large-Uncased, it leverages deep contextual understanding of language to classify text as phishing or benign with high accuracy.
π Model Details
Base model: bert-large-uncased
Architecture: 24 layers β’ 1024 hidden size β’ 16 attention heads β’ ~336M parameters
License: Apache 2.0
Language: English
Pipeline tag: text-classification
π§© Model Description
This model was trained to identify phishing-related content by analyzing linguistic and structural patterns commonly found in malicious communications.
By leveraging BERTβs bidirectional transformer architecture, it effectively detects phishing attempts even when the message appears legitimate or well-written.
Key Features
- Detects phishing attempts in text, emails, URLs, and scripts
- Useful for cybersecurity applications, such as email gateways or web filtering systems
- Capable of identifying varied phishing tactics (impersonation, link manipulation, credential harvesting, etc.)
π― Intended Uses
Recommended use cases:
- Classify messages, emails, and URLs as phishing or benign
- Integrate into automated security pipelines, email filtering tools, or chat moderation systems
- Aid in phishing research or awareness programs
Limitations:
- May trigger false positives on legitimate content with financial or urgent language
- Optimized for English text only
- Should be part of a multi-layered defense strategy, not a standalone cybersecurity control
π Evaluation Results
| Metric | Score |
|---|---|
| Loss | 0.1953 |
| Accuracy | 0.9717 |
| Precision | 0.9658 |
| Recall | 0.9670 |
| False Positive Rate | 0.0249 |
βοΈ Training Details
Hyperparameters
| Parameter | Value |
|---|---|
| Learning rate | 2e-05 |
| Train batch size | 16 |
| Eval batch size | 16 |
| Seed | 42 |
| Optimizer | Adam (Ξ²β=0.9, Ξ²β=0.999, Ξ΅=1e-08) |
| LR scheduler | Linear |
| Epochs | 4 |
Training Results
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Precision | Recall | False Positive Rate |
|---|---|---|---|---|---|---|---|
| 0.1487 | 1.0 | 3866 | 0.1454 | 0.9596 | 0.9709 | 0.9320 | 0.0203 |
| 0.0805 | 2.0 | 7732 | 0.1389 | 0.9691 | 0.9663 | 0.9601 | 0.0243 |
| 0.0389 | 3.0 | 11598 | 0.1779 | 0.9683 | 0.9778 | 0.9461 | 0.0156 |
| 0.0091 | 4.0 | 15464 | 0.1953 | 0.9717 | 0.9658 | 0.9670 | 0.0249 |
π§ Example Inference
Try the model in Python using the transformers library:
from transformers import pipeline
# Load the phishing detection model
classifier = pipeline("text-classification", model="your-username/phishing-email-detector-capstone")
# Example texts
examples = [
"Dear colleague, your email storage is full. Click here to verify your account: https://secure-update-login.com",
"Hi team, the meeting starts at 2 PM today.",
"You have won a free gift card! Claim now at http://bit.ly/3xYzabc"
]
# Run inference
for text in examples:
result = classifier(text)[0]
print(f"Text: {text}\nPrediction: {result['label']} (score: {result['score']:.4f})\n")
- Downloads last month
- 20
Model tree for TestingCapstone/phishing-email-detector-capstone
Base model
google-bert/bert-large-uncased