Qwen-4B-Instruct-2505-Self-correct
This is a Sherlock-style debiasing model trained using the Self-Correction approach for bias mitigation.
Model Description
- Base Model: Qwen/Qwen3-4B-Instruct-2507
- Training Method: LoRA (Low-Rank Adaptation) with QLoRA (4-bit quantization), then merged
- Task: Bias mitigation and self-correction
- Framework: PyTorch + Transformers + PEFT
Training Details
This model was trained using the Sherlock framework which includes:
- Stage I (SFT): Supervised fine-tuning on bias correction examples
- Stage II (Offline): Preference learning with DPO + Self-Correction loss
Key Features
- Self-correction capability for biased reasoning
- Trajectory-level preference learning
- Dynamic β adaptation based on divergence points
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model
model = AutoModelForCausalLM.from_pretrained(
"fenffef/Qwen-4B-Instruct-2505-Self-correct",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("fenffef/Qwen-4B-Instruct-2505-Self-correct")
# Generate
messages = [
{"role": "user", "content": "Your prompt here"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
Training Configuration
Configuration file not found.
Citation
If you use this model, please cite:
@article{sherlock2024,
title={Sherlock: Self-Correcting Framework for Bias Mitigation},
author={Your Name},
year={2024}
}
License
This model is released under the Apache 2.0 license.
- Downloads last month
- 25