---
library_name: transformers
license: apache-2.0
datasets:
- HuggingFaceH4/ultrafeedback_binarized
language:
- en
base_model:
- Qwen/Qwen2-0.5B-Instruct
pipeline_tag: text-generation
---

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->


## Model Details

- **Base Model:** Qwen2-0.5B
- **Fine-tuning Method:** Direct Preference Optimization (DPO)
- **Framework:** Unsloth
- **Quantization:** 4-bit QLoRA (during training)

## Uses

```python
from transformers import AutoTokenizer
from unsloth import FastLanguageModel

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "VinitT/Qwen2-0.5B-DPO",
    dtype = None,
    load_in_4bit = False,
)

messages = [{"role": "user", "content": "Hello,how can i develop a habit of drawing daily?"}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_dict=True,
    return_tensors="pt"
)
inputs = {k: v.to(model.device) for k, v in inputs.items()}

# Generate
outputs = model.generate(
    **inputs,
    max_new_tokens=100,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)
# Decode only the new response (not the prompt)
prompt_len = inputs["input_ids"].shape[-1]
response = tokenizer.decode(outputs[0][prompt_len:], skip_special_tokens=True)

print(response.strip())
```