ibm-granite
/

granite-3.3-8b-security-lib

Transformers

Safetensors

English

Model card Files Files and versions

xet

Community

kristunlee commited on 23 days ago

Commit

f67665e

verified ·

1 Parent(s): 605bf12

Update README.md

Browse files

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -21,9 +21,9 @@ The six intrinsics that have been implemented as LoRA adapters for `ibm-granite/
 **Adversarial Scoping:** This experimental LoRA module is designed to constrain the model to a specific task (summarization), while maintaining safety with respect to harmful prompts. The model was trained to perform summarization tasks using datasets such as CNN/Daily Mail, Amazon food reviews, and abstract summarization corpora. In parallel, the LoRA was also trained to reject harmful requests. As a result, the model, although scoped to summarization, is expected to refuse to summarize content that is harmful or inappropriate, thereby preserving alignment and safety within its operational boundaries.
-**Function Calling Scanner:** This LoRA intrinsic is finetuned for detecting incorrect function calls from an LLM agent. Given a user prompt, tool options, and underlying model response this intrinsic acts as a safeguard blocking LLM agent tool errors. These errors can be from simple LLM mistakes, or due to tool hijacking from jailbreak and prompt injection attacks.
-**Jailbreak Detector:** This is an experimental LoRA is designed for detecting jailbreak and prompt injection risks in user inputs.
 Jailbreaks attempt to bypass safeguards in AI systems for malicious purposes, using a variety of attack techniques. This model helps filter such prompts to protect against adversarial threats. In particular, it focuses on social engineering based manipulation like role-playing or use of hypothetical scenarios.
 **PII Detector:** This is an experimental LoRA that is designed for detecting PII in model outputs. Models with access to personal information via RAG or similar may present additional data protection risks that can be mitigated by using this LoRA to check model outputs.

 **Adversarial Scoping:** This experimental LoRA module is designed to constrain the model to a specific task (summarization), while maintaining safety with respect to harmful prompts. The model was trained to perform summarization tasks using datasets such as CNN/Daily Mail, Amazon food reviews, and abstract summarization corpora. In parallel, the LoRA was also trained to reject harmful requests. As a result, the model, although scoped to summarization, is expected to refuse to summarize content that is harmful or inappropriate, thereby preserving alignment and safety within its operational boundaries.
+**Function Calling Scanner:** This LoRA intrinsic is finetuned for detecting incorrect function calls from an LLM agent. Given a user prompt, tool options, and underlying model response, this intrinsic acts as a safeguard blocking LLM agent tool errors. These errors can be from simple LLM mistakes, or due to tool hijacking from jailbreak and prompt injection attacks.
+**Jailbreak Detector:** This is an experimental LoRA designed for detecting jailbreak and prompt injection risks in user inputs.
 Jailbreaks attempt to bypass safeguards in AI systems for malicious purposes, using a variety of attack techniques. This model helps filter such prompts to protect against adversarial threats. In particular, it focuses on social engineering based manipulation like role-playing or use of hypothetical scenarios.
 **PII Detector:** This is an experimental LoRA that is designed for detecting PII in model outputs. Models with access to personal information via RAG or similar may present additional data protection risks that can be mitigated by using this LoRA to check model outputs.