# 🎓 Model Training Guide

This guide covers how to fine-tune your own Stable Diffusion model using LoRA (Low-Rank Adaptation) for creating custom character models like our Pepe generator.

---

## 📖 Table of Contents

- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Dataset Preparation](#dataset-preparation)
- [Training Configuration](#training-configuration)
- [Running the Training](#running-the-training)
- [Model Upload](#model-upload)


---

## 🎯 Overview

### What is LoRA?

**LoRA (Low-Rank Adaptation)** is a parameter-efficient fine-tuning technique that:
- ✅ Trains only a small fraction of parameters (~0.5% of full model)
- ✅ Requires significantly less VRAM (~10GB vs 40GB+)
- ✅ Maintains base model quality while adding custom styles
- ✅ Produces small, portable adapter files (~100MB vs 4GB+)
- ✅ Can be combined with other LoRAs

### Our Training Setup

**Model**: Pepe the Frog LoRA  
**Base**: Stable Diffusion v1.5  
**Dataset**: [iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)  
**Result**: [MJaheen/Pepe_The_Frog_model_v1_lora](https://huggingface.co/MJaheen/Pepe_The_Frog_model_v1_lora)  
**Training Time**: ~2-3 hours on T4 GPU (Google Colab)

---

## 🛠️ Prerequisites

### Hardware Requirements

**Minimum**:
- GPU: NVIDIA GPU with 10GB+ VRAM (e.g., RTX 3080, T4)
- RAM: 16GB system RAM
- Storage: 20GB free space

**Recommended**:
- GPU: NVIDIA A100, V100, or RTX 4090
- RAM: 32GB system RAM
- Storage: 50GB+ SSD


### Software Requirements

```bash
# Core dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install diffusers==0.31.0
pip install transformers==4.45.1
pip install accelerate==0.34.2
pip install peft>=0.11.0
pip install safetensors==0.4.4
pip install datasets
pip install bitsandbytes  # For 8-bit Adam optimizer (optional)
```

---

## 📂 Dataset Preparation

### Dataset Structure

Your dataset should follow this structure:

```
dataset/
├── image_1.png
├── image_2.png
├── image_3.png
└── metadata.jsonl  # or metadata.csv
```

### Metadata Format

**Option 1: JSONL (Recommended)**

```jsonl
{"file_name": "image_1.png", "prompt": "pepe_style_frog, happy pepe smiling"}
{"file_name": "image_2.png", "prompt": "pepe_style_frog, sad pepe crying"}
{"file_name": "image_3.png", "prompt": "pepe_style_frog, pepe drinking coffee"}
```

**Option 2: CSV**

```csv
file_name,prompt
image_1.png,"pepe_style_frog, happy pepe smiling"
image_2.png,"pepe_style_frog, sad pepe crying"
image_3.png,"pepe_style_frog, pepe drinking coffee"
```

### Dataset Best Practices

1. **Image Quality**
   - Resolution: 512x512 or higher
   - Format: PNG or JPG
   - Clear, well-lit images
   - Varied poses and expressions

2. **Caption Quality**
   - Include trigger word (e.g., `pepe_style_frog`)
   - Describe key features and actions
   - Be consistent in naming conventions
   - 5-15 words per caption optimal

3. **Dataset Size**
   - Minimum: 20-50 images
   - Optimal: 100-500 images
   - More images = better generalization

4. **Diversity**
   - Various angles and poses
   - Different expressions
   - Multiple backgrounds
   - Different lighting conditions

### Our Pepe Dataset

We used **[iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)** which contains:
- ~200 high-quality Pepe images
- Consistent 512x512 resolution
- Varied expressions and styles
- Pre-captioned with trigger word

---

## ⚙️ Training Configuration

### Training Hyperparameters

Here's the exact configuration we used for the Pepe model:

```bash
accelerate launch train_text_to_image_lora.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
  --train_data_dir="/path/to/pepe-data" \
  --caption_column="prompt" \
  --image_column="image" \
  --resolution=512 \
  --center_crop \
  --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=2000 \
  --learning_rate=1e-4 \
  --lr_scheduler="cosine" \
  --lr_warmup_steps=0 \
  --output_dir="./output" \
  --rank=16 \
  --validation_prompt="pepe_style_frog, a high-quality, detailed image of pepe the frog smiling and holding a cup of coffee at sunrise" \
  --validation_epochs=5 \
  --seed=42 \
  --mixed_precision="fp16" \
  --checkpointing_steps=150
```

### Parameter Explanation

| Parameter | Value | Description |
|-----------|-------|-------------|
| `pretrained_model_name_or_path` | `runwayml/stable-diffusion-v1-5` | Base model to fine-tune |
| `train_data_dir` | `/path/to/data` | Path to your dataset |
| `resolution` | `512` | Image resolution (512x512) |
| `train_batch_size` | `1` | Batch size per GPU |
| `gradient_accumulation_steps` | `4` | Effective batch size = 1 * 4 = 4 |
| `max_train_steps` | `2000` | Total training steps |
| `learning_rate` | `1e-4` | Initial learning rate |
| `lr_scheduler` | `cosine` | Learning rate schedule |
| `rank` | `16` | LoRA rank (higher = more parameters) |
| `mixed_precision` | `fp16` | Use 16-bit precision for speed |
| `checkpointing_steps` | `150` | Save checkpoint every N steps |

### Hyperparameter Tuning Tips

**Learning Rate**:
- Too high: Training unstable, poor quality
- Too low: Slow convergence, underfitting
- Recommended: `1e-4` to `1e-5`

**LoRA Rank**:
- Lower (4-8): Faster training, smaller files, less expressive
- Medium (16-32): Balanced (recommended)
- Higher (64-128): More expressive, larger files, risk of overfitting

**Training Steps**:
- Small dataset (20-50 images): 500-1000 steps
- Medium dataset (50-200 images): 1000-2000 steps
- Large dataset (200+ images): 2000-5000 steps

**Batch Size**:
- Depends on VRAM availability
- Effective batch size = `batch_size × gradient_accumulation_steps`
- Recommended effective batch size: 4-8

---

## 🚀 Running the Training

### Option 1: Google Colab (Recommended for Beginners)

1. **Open the Notebook**:
   - Use our provided notebook: `diffusion_model_finetuning.ipynb`
   - Or create new Colab notebook

2. **Setup GPU**:
   ```
   Runtime → Change runtime type → GPU (T4)
   ```

3. **Mount Google Drive** (optional):
   ```python
   from google.colab import drive
   drive.mount('/content/drive')
   ```

4. **Install Dependencies**:
   ```python
   !pip install -q diffusers transformers accelerate peft
   ```

5. **Upload Dataset**:
   - Upload to Google Drive
   - Or download from Hugging Face

6. **Run Training**:
   ```python
   !accelerate launch train_text_to_image_lora.py \
     --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
     --train_data_dir="/content/drive/MyDrive/pepe-data" \
     --max_train_steps=2000 \
     --learning_rate=1e-4 \
     --output_dir="./output"
   ```

7. **Monitor Progress**:
   - Watch loss decrease
   - Check validation images
   - Save checkpoints to Drive


### Generate test image
image = pipe("pepe_style_frog, wizard casting spells").images[0]
image.save("validation.png")
```


## 📤 Model Upload

### Prepare for Upload

1. **Test Locally**:
   ```python
   from diffusers import StableDiffusionPipeline
   
   pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
   pipe.load_lora_weights("./output")
   
   # Test
   image = pipe("pepe_style_frog, happy pepe").images[0]
   image.save("test.png")
   ```

2. **Prepare Files**:
   ```
   output/
   ├── pytorch_lora_weights.safetensors  # Main file
   ├── README.md  # Model card
   └── sample_images/  # Example outputs
   ```

### Upload to Hugging Face

1. **Install Hub CLI**:
   ```bash
   pip install huggingface_hub
   huggingface-cli login
   ```

2. **Create Model Card** (`README.md`):
   ```markdown
   ---
   license: creativeml-openrail-m
   base_model: runwayml/stable-diffusion-v1-5
   tags:
   - stable-diffusion
   - lora
   - text-to-image
   ---
   
   # Pepe LoRA Model
   
   Fine-tuned LoRA for generating Pepe the Frog images.
   
   ## Usage
   ```python
   from diffusers import StableDiffusionPipeline
   
   pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
   pipe.load_lora_weights("YOUR_USERNAME/your-model-name")
   
   image = pipe("pepe_style_frog, happy pepe").images[0]
   ```
   ```

3. **Upload**:
   ```python
   from huggingface_hub import HfApi
   
   api = HfApi()
   api.create_repo("YOUR_USERNAME/pepe-lora", repo_type="model")
   api.upload_folder(
       folder_path="./output",
       repo_id="YOUR_USERNAME/pepe-lora",
       repo_type="model"
   )
   ```


### Common Issues

**Out of Memory**:
- Reduce `train_batch_size` to 1
- Enable `--gradient_checkpointing`
- Use `--mixed_precision="fp16"`
- Reduce image resolution