Content moderation models and datasets - 2025 - a hfmlsoc Collection

hfmlsoc 's Collections

Content moderation models and datasets - 2025

AI companionship

Content moderation models and datasets - 2025

updated 5 days ago

ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

Paper • 2310.17389 • Published Oct 26, 2023
lmsys/toxic-chat

Viewer • Updated May 14, 2024 • 20.3k • 3.96k • 169
NemoGuard

Collection

Essential datasets and models for content safety, topic-following, and security guardrails • 11 items • Updated about 2 hours ago • 11
gpt-oss-safeguard

Collection

gpt-oss-safeguard-120b and gpt-oss-safeguard-20b are safety reasoning models built-upon gpt-oss • 2 items • Updated 6 days ago • 52
GPT-OSS-safeguard:20b

Collection

MLX Based GPT-OSS-Safeguard models • 5 items • Updated 3 days ago

Note For MLX users
meta-llama/Llama-Guard-3-11B-Vision

Image-Text-to-Text • 11B • Updated Nov 18, 2024 • 5.91k • 66
meta-llama/Llama-Guard-3-8B

Text Generation • 8B • Updated Oct 11, 2024 • 280k • • 242
meta-llama/Prompt-Guard-86M

Text Classification • 0.3B • Updated Jul 25, 2024 • 58k • • 279
ShieldGemma

Collection

ShieldGemma is a family of models for text and image content moderation. • 4 items • Updated Jul 10 • 9
A Holistic Approach to Undesired Content Detection in the Real World

Paper • 2208.03274 • Published Aug 5, 2022
unitary/unbiased-toxic-roberta

Text Classification • Updated Aug 18, 2023 • 78.4k • • 25
unitary/toxic-bert

Text Classification • 0.1B • Updated Mar 13, 2024 • 636k • • 203
KoalaAI/Text-Moderation

Text Classification • 0.1B • Updated Jan 31 • 147k • • 81
martin-ha/toxic-comment-model

Text Classification • Updated May 6, 2022 • 1.16M • • 65
NemoraAi/roberta-chat-moderation-X

Text Classification • 82.1M • Updated Jun 23 • 6 • 1
eliasalbouzidi/distilbert-nsfw-text-classifier

Text Classification • 67M • Updated May 16 • 9.26k • • 22