--- license: apache-2.0 datasets: - nayan90k/cyberbullying-tweets-balanced language: - en base_model: - FacebookAI/roberta-base --- # RoBERTa for Cyberbullying Detection This is a `roberta-base` model fine-tuned for the specific task of detecting cyberbullying and toxic language in text. The model has been trained on a diverse and balanced dataset aggregated from multiple public sources, making it robust for real-world chat and social media conversations. This model is intended to be used as part of a privacy-first system where analysis is performed locally on a user's device. ## Model Description - **Base Model**: `roberta-base` - **Fine-tuning Task**: Binary Text Classification (Cyberbullying vs. Not Cyberbullying) - **Language**: English ## How to Use The easiest way to use this model is with a `pipeline` from the `transformers` library. ```python !pip install transformers ``` ```python from transformers import pipeline model_name = "nayan90k/roberta-finetuned-cyberbullying-detection" classifier = pipeline("text-classification", model=model_name) results = classifier([ "I love this project, it's so helpful!", "You are a total loser and everyone knows it." ]) print(results) # Expected Output: # [ # {'label': 'LABEL_0', 'score': 0.99...}, # Not bullying # {'label': 'LABEL_1', 'score': 0.98...} # Bullying # ] ``` ## Training Data This model was trained on a custom, aggregated dataset compiled from several public sources to ensure diversity. The final, cleaned, and balanced dataset is available on the Hub: - **Dataset**: [nayan90k/cyberbullying-tweets-balanced](https://huggingface.co/datasets/nayan90k/cyberbullying-tweets-balanced) The dataset contains **136,440** samples, perfectly balanced between two classes: - `0`: Not Cyberbullying - `1`: Cyberbullying Data was sourced from Twitter, Wikipedia talk pages, and YouTube comments, among others. ## Training Procedure The model was fine-tuned for **1 epoch** using the following hyperparameters with the `transformers` `Trainer`: - **Learning Rate**: `2e-5` - **Batch Size**: `16` - **Optimizer**: AdamW - **Warmup Steps**: `500` - **Weight Decay**: `0.01` The full training script and environment setup can be found at the project's GitHub repository: [github.com/Kamal-Nayan-Kumar/GuardianAI](https://github.com/Kamal-Nayan-Kumar/GuardianAI/tree/master/model). ## Evaluation Results The model was evaluated on a held-out test set of **13,644** samples, achieving the following results: | Metric | Score | |-----------|--------| | Accuracy | 0.9000 | | F1-Score | 0.9025 | | Precision | 0.8803 | | Recall | 0.9258 | ## Intended Use and Limitations This model is designed to be a component in a larger system for monitoring online conversations for potential harm, particularly for the safety of younger users. ### Intended Use - As a backend service for a chat application to flag potentially harmful content in real-time. - To be run locally on a user's device to preserve privacy. ### Limitations and Bias - The model is trained primarily on English text and will not perform well on other languages or code-mixed text. - While the dataset is diverse, it may not capture all forms of slang, sarcasm, or context-specific bullying, which can lead to both false positives and false negatives. - The definition of "cyberbullying" is subjective and can vary culturally. The model's predictions reflect the biases of the original dataset annotators. - It should be used as a tool to *flag* potential issues for human review, not as a final arbiter of what constitutes bullying.