Self-Corrective Llama 3.1 8B
This is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct, augmented with a novel self-correction mechanism designed to mitigate hallucinations. The LoRA adapter has been merged into the base model for easy deployment.
This model features a custom hallucination detection head that works in parallel with the main language model. When it detects a potential error in its own generated text, it can insert special instructions like [rewrite sentence] or [rewrite response] into the output, effectively flagging its own mistakes for correction. This makes the model more reliable for tasks requiring factual accuracy.
How it Works
The model, an instance of the custom SelfCorrectiveLlama class, adds a small, efficient hallucination detection module to the standard Llama architecture. This module analyzes the model's internal states (hidden states) at each generation step to predict the likelihood of a hallucination.
The model's custom generate method then uses these predictions. If a hallucination is likely, it overrides the standard token generation process to insert a corrective instruction. This entire process happens in a single forward pass, making it significantly more efficient than multi-step, agent-based correction pipelines that require multiple LLM calls.
Intended Use & Prompting
This model is intended for tasks where factual accuracy and faithfulness to a source context are critical, such as question answering or summarization.
While it can be used with standard prompts, its self-correction behavior was reinforced during training using a specific instruction. To achieve the best results and fully leverage the self-correction mechanism, you should include the following note in your system prompt or at the beginning of your input:
Note on Self-Correction: As you generate your response, you may encounter an automated instruction. This indicates a potential error was detected.
- If you see the instruction
[rewrite sentence], it means the preceding sentence is incorrect. You must immediately provide a new, corrected version of that sentence.- If you see the instruction
[rewrite response], it means the entire preceding response is incorrect. You must immediately provide a new, complete response from the beginning.
How to Use
Because this model uses a custom architecture with a modified generate method, you must use trust_remote_code=True when loading it. The required modeling.py file is included in this repository.
Note: The custom generate method currently only supports a batch size of 1.
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "MathBite/self_corrective_llama_3.1_8B"
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Important: You must trust the remote code
model = AutoModelForCausalLM.from_pretrained(
model_name,
trust_remote_code=True,
torch_dtype=torch.bfloat16 # or your preferred dtype
).to("cuda") # move model to GPU
# Example prompt with the self-correction instruction
prompt = """
...
Note on Self-Correction: As you generate your response, you may encounter an automated instruction. This indicates a potential error was detected.
- If you see the instruction `[rewrite sentence]`, it means the preceding sentence is incorrect. You must immediately provide a new, corrected version of that sentence.
- If you see the instruction `[rewrite response]`, it means the entire preceding response is incorrect. You must immediately provide a new, complete response from the beginning.
---
Context: The Eiffel Tower is a wrought-iron lattice tower on the Champ de Mars in Paris, France. It is named after the engineer Gustave Eiffel, whose company designed and built the tower.
Question: Who was the first person to climb the Eiffel Tower?
"""
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
# The custom generate method requires the tokenizer instance
generated_ids = model.generate(
inputs.input_ids,
tokenizer=tokenizer,
max_new_tokens=100,
temperature=0.7
)
generated_text = tokenizer.decode(generated_ids[0], skip_special_tokens=True)
print(generated_text)
Model Details
This model was programmatically merged and uploaded using a deployment script. The custom class SelfCorrectiveLlama can be found in the modeling.py file included in this repository.
The code in modeling.py is licensed under the Apache 2.0 License. The model weights are subject to the original license of the base model.
- Downloads last month
- 7
Model tree for MathBite/self_corrective_llama_3.1_8B
Base model
meta-llama/Llama-3.1-8B