---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
---

# Granite 3.3 8B Instruct - RAG Data Leakage LoRA

Welcome to Granite Experiments!

Think of Experiments as a preview of what's to come. These projects are still under development, but we wanted to let the open-source community take them for spin! Use them, break them, and help us build what's next for Granite - we'll keep an eye out for feedback and questions. Happy exploring!

Just a heads-up: Experiments are forever evolving, so we can't commit to ongoing support or guarantee performance.

## Model Summary

This is a LoRA adapter for [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct), 
adding the capability to detect cases in which parts of the context documents are repeated in the model response.

- **Developer:** IBM Research
- **Model type:** LoRA adapter for [ibm-granite/granite-3.3-8b-instruct](https://huggingface.co/ibm-granite/granite-3.3-8b-instruct)
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0)


## Intended use

This experimental safeguard is designed to detect and mitigate the risk of sensitive data leakage from RAG documents into model outputs.
RAG systems enhance AI responses by retrieving relevant documents from external databases, but this introduces the potential for unintended disclosure of private, proprietary, or sensitive information.
This model monitors generated responses to prevent such leaks, especially in scenarios where retrieved content may be sensitive or confidential.

**RAG data leakage Risk Detection**: The model identifies potential RAG context leakage when the special role `<|start_of_role|>leakage_detector<|end_of_role|>` is included in prompts. Without this role, the model behaves like the base model.

### Quickstart Example

The following code describes how to use the LoRA adapter model to detect data leakage model output.

```python

def generate_batch_lora(batch, INVOCATION_PROMPT, max_new_tokens=1):
    batch['eval_text'] = [x +'\n'+INVOCATION_PROMPT for x in batch['eval_text']]
    inputs = tokenizer(batch["eval_text"], return_tensors="pt", padding=True, truncation=True).to(device)
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            do_sample=False,
            pad_token_id=tokenizer.pad_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    batch["predictions"] = [x.split(INVOCATION_PROMPT)[1] for x in  tokenizer.batch_decode(outputs, skip_special_tokens=False)]
    return batch
    
model_name = "ibm-granite/granite-3.3-8b-instruct"
LORA_NAME = "intrinsics/granite-3.3-8b-instruct-lora-rag-data-leakage" # LoRA download location. We assume the directory shown in the top level README.md example for the lib was followed.
INVOCATION_PROMPT = "<|start_of_role|>rag_leakage<|end_of_role|>"
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

tokenizer.pad_token = tokenizer.eos_token

base_model = AutoModelForCausalLM.from_pretrained(model_name, trust_remote_code=True, 
                                            torch_dtype=torch.float16,).to(device)
model = PeftModel.from_pretrained(base_model, lora_path,
                                torch_dtype=torch.float16)
test_sample = [{'question': '...', 'output': '...',
                'documents': [{'doc_id': 1, 'text': '...'}, {'doc_id': 2, 'text': '...'}]}]
test_sample['eval_text'] = tokenizer.apply_chat_template(
                                                [                     
                                                    {'role':'user', 'content': test_sample['question']},
                                                    {'role': 'assistant', 'content': test_sample['output']},
                                                 ], documents = test_sample['documents'],
                                                tokenize=False)

generate_batch_lora(test_smaple, INVOCATION_PROMPT)
# Output should be a number.
# 0 - No leakage
# 1 - 4 - Between 1 and 4 sentences were leaked from a single document
# 5 - More than 5 sentences were leaked from a single document                                
```

## Training Details

The training data is based on 3 datasets:
1. glaiveai-rag1 - https://huggingface.co/datasets/glaiveai/RAG-v1
2. golaxy/rag-bench-asqa-mistral - https://huggingface.co/datasets/golaxy/rag-bench/viewer/asqa/mistral_7b?views%5B%5D=asqa_mistral_7b
3. golaxy/rag-bench-webglm-llama2 - https://huggingface.co/datasets/golaxy/rag-bench/viewer/webglm

Each data sample underwent a processing step that generated at least five new samples, each containing an answer that reveals different parts of the context documents.

## Evaluation

A subset of 15% of the dataset above was used as a test set.

| Model | Accuracy | MAE  |
| --- | --- | --- |  
| rag_leakage_lora | 0.86 | 0.14 | 


Confusion matrix:

| Actual \ Pred |  0  |  1  |  2  |  3  |  4  |  5  | 
|---------------|-----|-----|-----|-----|-----|-----|
| 0             |  1576   |  78   |  1   |  0   |  0   |  0   |
| 1             |  112   |  1391   | 87    | 12    |  1   |  0   |
| 2             |  3   |  196   |  1262   |  119   |  8   |  2   |
| 3             |  0   |  5   |  189   |  1271   |  151   |  11   |
| 4             |  0  |   0  |   3  |  91   |  1256   |  138   |
| 5             |  0  |   0  |  0   |  3   |  56   |  1352   | 


## Contact

Guy Amit, Natalia Razinkov, Shlomit Shachor, Abigail Goldsteen, Kristjan Greenwald