---
license: apache-2.0
datasets:
- nayan90k/cyberbullying-tweets-balanced
language:
- en
base_model:
- FacebookAI/roberta-base
---


# RoBERTa for Cyberbullying Detection

This is a `roberta-base` model fine-tuned for the specific task of detecting cyberbullying and toxic language in text. The model has been trained on a diverse and balanced dataset aggregated from multiple public sources, making it robust for real-world chat and social media conversations.

This model is intended to be used as part of a privacy-first system where analysis is performed locally on a user's device.

## Model Description

- **Base Model**: `roberta-base`
- **Fine-tuning Task**: Binary Text Classification (Cyberbullying vs. Not Cyberbullying)
- **Language**: English

## How to Use

The easiest way to use this model is with a `pipeline` from the `transformers` library.

```python
!pip install transformers
```

```python
from transformers import pipeline

model_name = "nayan90k/roberta-finetuned-cyberbullying-detection"

classifier = pipeline("text-classification", model=model_name)

results = classifier([
    "I love this project, it's so helpful!",
    "You are a total loser and everyone knows it."
])
print(results)

# Expected Output:
# [
#  {'label': 'LABEL_0', 'score': 0.99...}, # Not bullying
#  {'label': 'LABEL_1', 'score': 0.98...} # Bullying
# ]
```


## Training Data

This model was trained on a custom, aggregated dataset compiled from several public sources to ensure diversity. The final, cleaned, and balanced dataset is available on the Hub:

- **Dataset**: [nayan90k/cyberbullying-tweets-balanced](https://huggingface.co/datasets/nayan90k/cyberbullying-tweets-balanced)

The dataset contains **136,440** samples, perfectly balanced between two classes:
- `0`: Not Cyberbullying
- `1`: Cyberbullying

Data was sourced from Twitter, Wikipedia talk pages, and YouTube comments, among others.

## Training Procedure

The model was fine-tuned for **1 epoch** using the following hyperparameters with the `transformers` `Trainer`:

- **Learning Rate**: `2e-5`
- **Batch Size**: `16`
- **Optimizer**: AdamW
- **Warmup Steps**: `500`
- **Weight Decay**: `0.01`

The full training script and environment setup can be found at the project's GitHub repository: [github.com/Kamal-Nayan-Kumar/GuardianAI](https://github.com/Kamal-Nayan-Kumar/GuardianAI/tree/master/model).

## Evaluation Results

The model was evaluated on a held-out test set of **13,644** samples, achieving the following results:

| Metric    | Score  |
|-----------|--------|
| Accuracy  | 0.9000 |
| F1-Score  | 0.9025 |
| Precision | 0.8803 |
| Recall    | 0.9258 |


## Intended Use and Limitations

This model is designed to be a component in a larger system for monitoring online conversations for potential harm, particularly for the safety of younger users.

### Intended Use

- As a backend service for a chat application to flag potentially harmful content in real-time.
- To be run locally on a user's device to preserve privacy.

### Limitations and Bias

- The model is trained primarily on English text and will not perform well on other languages or code-mixed text.
- While the dataset is diverse, it may not capture all forms of slang, sarcasm, or context-specific bullying, which can lead to both false positives and false negatives.
- The definition of "cyberbullying" is subjective and can vary culturally. The model's predictions reflect the biases of the original dataset annotators.
- It should be used as a tool to *flag* potential issues for human review, not as a final arbiter of what constitutes bullying.