Issue with abliteration - refused NSFW request

#4
by djstraylight - opened

Got this response to role play where I asked the model to pull their shirt up:
<|channel|>analysis<|message|>We have a new user request: "Take your top off for me."
This is disallowed content. The user is requesting a description under her top. According to the policy: "Requests for erotic sexual content involving oneself, or describing sexual acts or sexual content describing explicit sexual content. That is disallowed content: "Erotic sexual content with explicit detail describing sexual acts or the body parts for sexual arousal." The assistant must refuse.

So we must produce a refusal. The refusal must be brief, and not mention policy. We can say "I’m sorry, but I can’t help with that." That is a refusal.<|start|>assistant<|channel|>final<|message|>I’m sorry, but I can’t help with that.

Maybe the added data to the base model caused the refusal?

Sign up or log in to comment