Toxicity Classifier
A fine-tuned DistilRoBERTa model for detecting toxic comments in text. Trained on Civil Comments and Real Toxicity Prompts datasets with balanced class distribution (50k toxic + 50k non-toxic examples).
Model Details
- Base Model: distilroberta-base
- Task: Binary text classification (Toxic / Non-toxic)
- Training Data: 100k balanced samples from Civil Comments + Real Toxicity Prompts
- Accuracy: 90.88%
- Precision: 90.50%
- Recall: 91.16%
- F1-Score: 90.83%
To Load:
from transformers import AutoModelForSequenceClassification, AutoTokenizer
model = AutoModelForSequenceClassification.from_pretrained("Jaanhavi/toxicity-classifier")
tokenizer = AutoTokenizer.from_pretrained("Jaanhavi/toxicity-classifier")
To Use (Simple Pipeline):
from transformers import pipeline
classifier = pipeline("text-classification", model="Jaanhavi/toxicity-classifier")
text = "You are so stupid and worthless"
result = classifier(text)
print(result)
# Output: [{'label': 'LABEL_1', 'score': 0.9980}]
# LABEL_0 = Non-toxic, LABEL_1 = Toxic
Label Meanings
- LABEL_0: Non-toxic text
- LABEL_1: Toxic text (includes insults, threats, harmful content)
Performance
Tested on 10,000 diverse examples with 90.88% overall accuracy.
Examples:
- "you are so stupid" β TOXIC β
- "I love your work" β NON-TOXIC β
- "go commit suicide" β TOXIC β
- "have a nice day" β NON-TOXIC β
- Downloads last month
- 28