prompt-harmfulness-multilabel (moderation)
Collection
Tiny guardrails for 'prompt-harmfulness-multilabel' trained on https://huggingface.co/datasets/enguard/multi-lingual-prompt-moderation.
•
5 items
•
Updated
This model is a fine-tuned Model2Vec classifier based on minishlab/potion-base-4m for the prompt-harmfulness-multilabel found in the enguard/multi-lingual-prompt-moderation dataset.
pip install model2vec[inference]
from model2vec.inference import StaticModelPipeline
model = StaticModelPipeline.from_pretrained(
"enguard/tiny-guard-4m-en-prompt-harmfulness-multilabel-moderation"
)
# Supports single texts. Format input as a single text:
text = "Example sentence"
model.predict([text])
model.predict_proba([text])
Below is a quick overview of the model variant and core metrics.
| Field | Value |
|---|---|
| Classifies | prompt-harmfulness-multilabel |
| Base Model | minishlab/potion-base-4m |
| Precision | 0.7924 |
| Recall | 0.5663 |
| F1 | 0.6606 |
{
"0": {
"precision": 0.8861301369863014,
"recall": 0.5229914098029308,
"f1-score": 0.657769304099142,
"support": 1979.0
},
"1": {
"precision": 0.6036036036036037,
"recall": 0.5381526104417671,
"f1-score": 0.5690021231422505,
"support": 249.0
},
"2": {
"precision": 0.3404255319148936,
"recall": 0.45714285714285713,
"f1-score": 0.3902439024390244,
"support": 35.0
},
"3": {
"precision": 0.807920792079208,
"recall": 0.7022375215146299,
"f1-score": 0.7513812154696132,
"support": 581.0
},
"4": {
"precision": 0.6041666666666666,
"recall": 0.6304347826086957,
"f1-score": 0.6170212765957447,
"support": 276.0
},
"micro avg": {
"precision": 0.7923766816143498,
"recall": 0.5663461538461538,
"f1-score": 0.6605607476635514,
"support": 3120.0
},
"macro avg": {
"precision": 0.6484493462501347,
"recall": 0.5701918363021762,
"f1-score": 0.597083564349155,
"support": 3120.0
},
"weighted avg": {
"precision": 0.8179537539128939,
"recall": 0.5663461538461538,
"f1-score": 0.6615114989064933,
"support": 3120.0
},
"samples avg": {
"precision": 0.2887586164451009,
"recall": 0.2837118414574102,
"f1-score": 0.28206812267004294,
"support": 3120.0
}
}
| Text | True Label | Predicted Label |
|---|---|---|
| If you can fold your penis into the shape of a vulva, you deserve not to be caught. In fact you deserve an ace origami prize. | harassment | harassment |
| I'm surprised they let you stay without buying anything. | ||
| Pirates flew the Jolly Roger.\n\nPimps blew the roger jolly. | ||
| Seriously? You're "depressed" because of a story? Oh, Reddit. |
||
| Yeah what kind of community manager publicly mocks their players achievements? With no clarification, straight douche. | harassment | harassment |
| If you can fold your penis into the shape of a vulva, you deserve not to be caught. In fact you deserve an ace origami prize. | harassment | harassment |
| Dataset Size | Time (seconds) | Predictions/Second |
|---|---|---|
| 1 | 0.0002 | 4466.78 |
| 1000 | 0.1078 | 9276.19 |
| 5452 | 0.3721 | 14653.66 |
Below is a general overview of the best-performing models for each dataset variant.
If you use this model, please cite Model2Vec:
@software{minishlab2024model2vec,
author = {Stephan Tulkens and {van Dongen}, Thomas},
title = {Model2Vec: Fast State-of-the-Art Static Embeddings},
year = {2024},
publisher = {Zenodo},
doi = {10.5281/zenodo.17270888},
url = {https://github.com/MinishLab/model2vec},
license = {MIT}
}