Update README.md
Browse files
README.md
CHANGED
|
@@ -21,9 +21,9 @@ The six intrinsics that have been implemented as LoRA adapters for `ibm-granite/
|
|
| 21 |
|
| 22 |
**Adversarial Scoping:** This experimental LoRA module is designed to constrain the model to a specific task (summarization), while maintaining safety with respect to harmful prompts. The model was trained to perform summarization tasks using datasets such as CNN/Daily Mail, Amazon food reviews, and abstract summarization corpora. In parallel, the LoRA was also trained to reject harmful requests. As a result, the model, although scoped to summarization, is expected to refuse to summarize content that is harmful or inappropriate, thereby preserving alignment and safety within its operational boundaries.
|
| 23 |
|
| 24 |
-
**Function Calling Scanner:** This LoRA intrinsic is finetuned for detecting incorrect function calls from an LLM agent. Given a user prompt, tool options, and underlying model response this intrinsic acts as a safeguard blocking LLM agent tool errors. These errors can be from simple LLM mistakes, or due to tool hijacking from jailbreak and prompt injection attacks.
|
| 25 |
|
| 26 |
-
**Jailbreak Detector:** This is an experimental LoRA
|
| 27 |
Jailbreaks attempt to bypass safeguards in AI systems for malicious purposes, using a variety of attack techniques. This model helps filter such prompts to protect against adversarial threats. In particular, it focuses on social engineering based manipulation like role-playing or use of hypothetical scenarios.
|
| 28 |
|
| 29 |
**PII Detector:** This is an experimental LoRA that is designed for detecting PII in model outputs. Models with access to personal information via RAG or similar may present additional data protection risks that can be mitigated by using this LoRA to check model outputs.
|
|
|
|
| 21 |
|
| 22 |
**Adversarial Scoping:** This experimental LoRA module is designed to constrain the model to a specific task (summarization), while maintaining safety with respect to harmful prompts. The model was trained to perform summarization tasks using datasets such as CNN/Daily Mail, Amazon food reviews, and abstract summarization corpora. In parallel, the LoRA was also trained to reject harmful requests. As a result, the model, although scoped to summarization, is expected to refuse to summarize content that is harmful or inappropriate, thereby preserving alignment and safety within its operational boundaries.
|
| 23 |
|
| 24 |
+
**Function Calling Scanner:** This LoRA intrinsic is finetuned for detecting incorrect function calls from an LLM agent. Given a user prompt, tool options, and underlying model response, this intrinsic acts as a safeguard blocking LLM agent tool errors. These errors can be from simple LLM mistakes, or due to tool hijacking from jailbreak and prompt injection attacks.
|
| 25 |
|
| 26 |
+
**Jailbreak Detector:** This is an experimental LoRA designed for detecting jailbreak and prompt injection risks in user inputs.
|
| 27 |
Jailbreaks attempt to bypass safeguards in AI systems for malicious purposes, using a variety of attack techniques. This model helps filter such prompts to protect against adversarial threats. In particular, it focuses on social engineering based manipulation like role-playing or use of hypothetical scenarios.
|
| 28 |
|
| 29 |
**PII Detector:** This is an experimental LoRA that is designed for detecting PII in model outputs. Models with access to personal information via RAG or similar may present additional data protection risks that can be mitigated by using this LoRA to check model outputs.
|