|
|
--- |
|
|
license: mit |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- precision |
|
|
- recall |
|
|
- f1 |
|
|
- accuracy |
|
|
pipeline_tag: text-classification |
|
|
tags: |
|
|
- classification |
|
|
- security |
|
|
--- |
|
|
|
|
|
# Model Card for Infinitode/SMCM-OPEN-ARC |
|
|
|
|
|
Repository: https://github.com/Infinitode/OPEN-ARC/ |
|
|
|
|
|
## Model Description |
|
|
|
|
|
OPEN-ARC-SMC is a MultinomialNB model developed as part of Infinitode's OPEN-ARC initiative. It was created to categorize text, particularly emails, as either spam or legitimate (ham). |
|
|
|
|
|
**Architecture**: |
|
|
|
|
|
- **MultinomialNB**: Used default parameters. |
|
|
- **Framework**: SKLearn. |
|
|
- **Training Setup**: Trained using default params. |
|
|
|
|
|
## Uses |
|
|
|
|
|
- Determining whether emails or SMS are spam or legitimate. |
|
|
- Enhancing research and developing defensive measures against spammers. |
|
|
|
|
|
## Limitations |
|
|
|
|
|
Emails or SMS may be classified as false positives or false negatives because of the nature of the data and its inherent limitations. |
|
|
|
|
|
## Training Data |
|
|
|
|
|
- Dataset: Spam Mail Classifier Dataset dataset from Kaggle. |
|
|
- Source URL: https://www.kaggle.com/datasets/mosapabdelghany/spam-mail-classifier/ |
|
|
- Content: Messages categorized as either spam or ham (legitimate emails or SMS). |
|
|
- Size: 1000 email/SMS messages labeled as spam or ham. |
|
|
- Preprocessing: The preprocessing steps included removing missing values and converting text into vectors. |
|
|
|
|
|
## Training Procedure |
|
|
|
|
|
- Metrics: accuracy, precision, recall, F1 |
|
|
- Train/Testing Split: 80% train, 20% testing. |
|
|
|
|
|
## Evaluation Results |
|
|
|
|
|
| Metric | Value | |
|
|
| ------ | ----- | |
|
|
| Testing Accuracy | 98.48% | |
|
|
| Testing Precision (`spam`) | 96.15% | |
|
|
| Testing Recall (`spam`) | 93.17% | |
|
|
| Testing F1 (`spam`) | 94.64% | |
|
|
|
|
|
## How to Use |
|
|
|
|
|
```python |
|
|
new_emails = [ |
|
|
"Congratulations! You've won a free prize. Click the link to claim.", # Likely spam |
|
|
"Hi, just confirming our meeting for tomorrow at 10 AM. Thanks." # Likely not spam |
|
|
] |
|
|
|
|
|
# Vectorize the new emails using the fitted vectorizer |
|
|
new_emails_vectorized = vectorizer.transform(new_emails) |
|
|
|
|
|
# Make predictions |
|
|
predictions = model.predict(new_emails_vectorized) |
|
|
|
|
|
for i, email in enumerate(new_emails): |
|
|
print(f"\nEmail: '{email}'") |
|
|
print(f"Prediction: {predictions[i]}") |
|
|
``` |
|
|
|
|
|
## Contact |
|
|
|
|
|
For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact. |