Toxicity Classifier

A fine-tuned DistilRoBERTa model for detecting toxic comments in text. Trained on Civil Comments and Real Toxicity Prompts datasets with balanced class distribution (50k toxic + 50k non-toxic examples).

Model Details

  • Base Model: distilroberta-base
  • Task: Binary text classification (Toxic / Non-toxic)
  • Training Data: 100k balanced samples from Civil Comments + Real Toxicity Prompts
  • Accuracy: 90.88%
  • Precision: 90.50%
  • Recall: 91.16%
  • F1-Score: 90.83%

To Load:

from transformers import AutoModelForSequenceClassification, AutoTokenizer

model = AutoModelForSequenceClassification.from_pretrained("Jaanhavi/toxicity-classifier")
tokenizer = AutoTokenizer.from_pretrained("Jaanhavi/toxicity-classifier")

To Use (Simple Pipeline):

from transformers import pipeline

classifier = pipeline("text-classification", model="Jaanhavi/toxicity-classifier")

text = "You are so stupid and worthless"
result = classifier(text)
print(result)
# Output: [{'label': 'LABEL_1', 'score': 0.9980}]
# LABEL_0 = Non-toxic, LABEL_1 = Toxic

Label Meanings

  • LABEL_0: Non-toxic text
  • LABEL_1: Toxic text (includes insults, threats, harmful content)

Performance

Tested on 10,000 diverse examples with 90.88% overall accuracy.

Examples:

  • "you are so stupid" β†’ TOXIC βœ…
  • "I love your work" β†’ NON-TOXIC βœ…
  • "go commit suicide" β†’ TOXIC βœ…
  • "have a nice day" β†’ NON-TOXIC βœ…
Downloads last month
28
Safetensors
Model size
82.1M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support