prompt-hate-speech-binary (moderation)
Collection
Tiny guardrails for 'prompt-hate-speech-binary' trained on https://huggingface.co/datasets/enguard/multi-lingual-prompt-moderation.
•
5 items
•
Updated
This model is a fine-tuned Model2Vec classifier based on minishlab/potion-base-4m for the prompt-hate-speech-binary found in the enguard/multi-lingual-prompt-moderation dataset.
pip install model2vec[inference]
from model2vec.inference import StaticModelPipeline
model = StaticModelPipeline.from_pretrained(
"enguard/tiny-guard-4m-en-prompt-hate-speech-binary-moderation"
)
# Supports single texts. Format input as a single text:
text = "Example sentence"
model.predict([text])
model.predict_proba([text])
Below is a quick overview of the model variant and core metrics.
| Field | Value |
|---|---|
| Classifies | prompt-hate-speech-binary |
| Base Model | minishlab/potion-base-4m |
| Precision | 0.9198 |
| Recall | 0.7831 |
| F1 | 0.8460 |
| True \ Predicted | FAIL | PASS |
|---|---|---|
| FAIL | 196 | 54 |
| PASS | 17 | 233 |
{
"FAIL": {
"precision": 0.9198113207547169,
"recall": 0.7831325301204819,
"f1-score": 0.8459869848156182,
"support": 249.0
},
"PASS": {
"precision": 0.8118466898954704,
"recall": 0.932,
"f1-score": 0.8677839851024208,
"support": 250.0
},
"accuracy": 0.8577154308617234,
"macro avg": {
"precision": 0.8658290053250937,
"recall": 0.8575662650602409,
"f1-score": 0.8568854849590195,
"support": 499.0
},
"weighted avg": {
"precision": 0.8657208243322486,
"recall": 0.8577154308617234,
"f1-score": 0.8569073256406696,
"support": 499.0
}
}
| Text | True Label | Predicted Label |
|---|---|---|
| It really only works on bitchy women because they get insulted that I think they look old | FAIL | FAIL |
| You mean the Master of Mesopotamia | PASS | PASS |
| Oh fuck that's so stupid. You believed his father to be dead and you acted like a decent person. Fuck this gay earth | FAIL | FAIL |
| I'm taking it. Thanks for suggestion. | PASS | PASS |
| I think you fell for a person, not a gender. | PASS | PASS |
| It really only works on bitchy women because they get insulted that I think they look old | FAIL | FAIL |
| Dataset Size | Time (seconds) | Predictions/Second |
|---|---|---|
| 1 | 0.0004 | 2493.64 |
| 500 | 0.0402 | 12439.95 |
| 500 | 0.0312 | 16042.59 |
Below is a general overview of the best-performing models for each dataset variant.
If you use this model, please cite Model2Vec:
@software{minishlab2024model2vec,
author = {Stephan Tulkens and {van Dongen}, Thomas},
title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
year = {2024},
publisher = {Zenodo},
doi = {10.5281/zenodo.17270888},
url = {https://github.com/MinishLab/model2vec},
license = {MIT}
}