---
license: apache-2.0
base_model: google/gemma-2b
tags:
- trl
- sft
- generated_from_trainer
- gemma
- conversational
model-index:
- name: gemma-bpo-sft
  results: []
---

# Gemma-2B Fine-tuned with SFT (400 Steps)

This model is a fine-tuned version of [google/gemma-2b](https://huggingface.co/google/gemma-2b) on the [won-bae/bpo_preference_hh_data](https://huggingface.co/datasets/won-bae/bpo_preference_hh_data) dataset using Supervised Fine-Tuning (SFT).

## Model Details

### Model Description
- **Developed by:** MohamadBazzi
- **Model type:** Conversational Language Model
- **Language(s):** English
- **License:** Apache 2.0
- **Finetuned from model:** google/gemma-2b

## Training Details

### Training Data
- **Dataset:** [won-bae/bpo_preference_hh_data](https://huggingface.co/datasets/won-bae/bpo_preference_hh_data)
- **Dataset Size:** BPO preference data from Anthropic HH-RLHF
- **Format:** Conversational (prompt-response pairs)

### Training Procedure

#### Training Hyperparameters
- **Training regime:** Best-of-n Preference Optimization (BPO) + Supervised Fine-tuning (SFT)
- **Training steps:** 400 (checkpoint from 750 total steps)
- **Learning rate:** 1e-5
- **Batch size:** 4 per device
- **Gradient accumulation:** 4 steps
- **Sequence length:** 512 tokens
- **Optimizer:** AdamW
- **LoRA configuration:** r=32, alpha=64, dropout=0.05
- **Precision:** bfloat16 with 4-bit quantization

## Usage

### Direct Use
This model is intended for conversational AI applications and can generate helpful, harmless responses to user queries.

### Inference

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load the model
model = AutoModelForCausalLM.from_pretrained(
    "MohamadBazzi/gemma-bpo-sft",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("MohamadBazzi/gemma-bpo-sft")

# Format your prompt
prompt = "Human: Hello, can you help me with Python programming?

Assistant: "

# Generate response
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(
    **inputs,
    max_length=512,
    do_sample=True,
    temperature=0.7,
    top_p=0.9,
    pad_token_id=tokenizer.eos_token_id
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

### Chat Format
The model expects input in conversational format similar to Anthropic's HH format:

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("MohamadBazzi/gemma-bpo-sft")
model = AutoModelForCausalLM.from_pretrained("MohamadBazzi/gemma-bpo-sft")

# Example usage
prompt = "Human: What are the key principles of machine learning?

Assistant: "
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=200, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## Model Performance

This checkpoint represents an intermediate stage in the training process:
- **Checkpoint:** 400/750 steps
- **Training method:** SFT
- **Alignment:** Optimized for helpful and harmless responses

## Training Infrastructure

This model was trained using:
- **Compute Canada's Digital Research Alliance (Narval cluster)**
- **TRL (Transformer Reinforcement Learning) library**
- **Hugging Face Transformers**
- **LoRA/PEFT for efficient fine-tuning**

## Limitations

- This is an intermediate checkpoint and may not represent the fully converged model
- Intended primarily for research purposes
- May exhibit biases present in the training data

## Citation

```bibtex
@model{gemma-bpo-sft-2024,
  title={Gemma-2B Fine-tuned with BPO and SFT},
  author={MohamadBazzi},
  year={2024},
  url={https://huggingface.co/MohamadBazzi/gemma-bpo-sft}
}
```