SMCM-OPEN-ARC / README.md
JohanBeytell's picture
Update README.md
c110d02 verified
---
license: mit
language:
- en
metrics:
- precision
- recall
- f1
- accuracy
pipeline_tag: text-classification
tags:
- classification
- security
---
# Model Card for Infinitode/SMCM-OPEN-ARC
Repository: https://github.com/Infinitode/OPEN-ARC/
## Model Description
OPEN-ARC-SMC is a MultinomialNB model developed as part of Infinitode's OPEN-ARC initiative. It was created to categorize text, particularly emails, as either spam or legitimate (ham).
**Architecture**:
- **MultinomialNB**: Used default parameters.
- **Framework**: SKLearn.
- **Training Setup**: Trained using default params.
## Uses
- Determining whether emails or SMS are spam or legitimate.
- Enhancing research and developing defensive measures against spammers.
## Limitations
Emails or SMS may be classified as false positives or false negatives because of the nature of the data and its inherent limitations.
## Training Data
- Dataset: Spam Mail Classifier Dataset dataset from Kaggle.
- Source URL: https://www.kaggle.com/datasets/mosapabdelghany/spam-mail-classifier/
- Content: Messages categorized as either spam or ham (legitimate emails or SMS).
- Size: 1000 email/SMS messages labeled as spam or ham.
- Preprocessing: The preprocessing steps included removing missing values and converting text into vectors.
## Training Procedure
- Metrics: accuracy, precision, recall, F1
- Train/Testing Split: 80% train, 20% testing.
## Evaluation Results
| Metric | Value |
| ------ | ----- |
| Testing Accuracy | 98.48% |
| Testing Precision (`spam`) | 96.15% |
| Testing Recall (`spam`) | 93.17% |
| Testing F1 (`spam`) | 94.64% |
## How to Use
```python
new_emails = [
"Congratulations! You've won a free prize. Click the link to claim.", # Likely spam
"Hi, just confirming our meeting for tomorrow at 10 AM. Thanks." # Likely not spam
]
# Vectorize the new emails using the fitted vectorizer
new_emails_vectorized = vectorizer.transform(new_emails)
# Make predictions
predictions = model.predict(new_emails_vectorized)
for i, email in enumerate(new_emails):
print(f"\nEmail: '{email}'")
print(f"Prediction: {predictions[i]}")
```
## Contact
For questions or issues, open a GitHub issue or reach out at https://infinitode.netlify.app/forms/contact.