---
license: apache-2.0
base_model: Qwen/Qwen3-4B-Instruct-2507
tags:
- text-generation
- bias-mitigation
- self-correction
- sherlock
- lora
- peft
library_name: transformers
---

# Qwen-4B-Instruct-2505-Self-correct

This is a **Sherlock-style debiasing model** trained using the Self-Correction approach for bias mitigation.

## Model Description

- **Base Model**: [Qwen/Qwen3-4B-Instruct-2507](https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507)
- **Training Method**: LoRA (Low-Rank Adaptation) with QLoRA (4-bit quantization), then merged
- **Task**: Bias mitigation and self-correction
- **Framework**: PyTorch + Transformers + PEFT

## Training Details

This model was trained using the Sherlock framework which includes:
1. **Stage I (SFT)**: Supervised fine-tuning on bias correction examples
2. **Stage II (Offline)**: Preference learning with DPO + Self-Correction loss

### Key Features
- Self-correction capability for biased reasoning
- Trajectory-level preference learning
- Dynamic β adaptation based on divergence points

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "fenffef/Qwen-4B-Instruct-2505-Self-correct",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained("fenffef/Qwen-4B-Instruct-2505-Self-correct")

# Generate
messages = [
    {"role": "user", "content": "Your prompt here"}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Training Configuration

Configuration file not found.

## Citation

If you use this model, please cite:

```bibtex
@article{sherlock2024,
  title={Sherlock: Self-Correcting Framework for Bias Mitigation},
  author={Your Name},
  year={2024}
}
```

## License

This model is released under the Apache 2.0 license.