Toxicity Classifier

A fine-tuned DistilRoBERTa model for detecting toxic comments in text. Trained on Civil Comments and Real Toxicity Prompts datasets with balanced class distribution (50k toxic + 50k non-toxic examples).

Model Details

Base Model: distilroberta-base
Task: Binary text classification (Toxic / Non-toxic)
Training Data: 100k balanced samples from Civil Comments + Real Toxicity Prompts
Accuracy: 90.88%
Precision: 90.50%
Recall: 91.16%
F1-Score: 90.83%

To Load:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("Jaanhavi/toxicity-classifier")
tokenizer = AutoTokenizer.from_pretrained("Jaanhavi/toxicity-classifier")

To Use (Simple Pipeline):

from transformers import pipeline

classifier = pipeline("text-classification", model="Jaanhavi/toxicity-classifier")

text = "You are so stupid and worthless"
result = classifier(text)
print(result)
# Output: [{'label': 'LABEL_1', 'score': 0.9980}]
# LABEL_0 = Non-toxic, LABEL_1 = Toxic

Label Meanings

LABEL_0: Non-toxic text
LABEL_1: Toxic text (includes insults, threats, harmful content)

Performance

Tested on 10,000 diverse examples with 90.88% overall accuracy.

Examples:

"you are so stupid" → TOXIC ✅
"I love your work" → NON-TOXIC ✅
"go commit suicide" → TOXIC ✅
"have a nice day" → NON-TOXIC ✅

Downloads last month: 28

Safetensors

Model size

82.1M params

Tensor type

F32