Pepe-Meme-Generator / docs /TRAINING.md
MJaheen's picture
Delete unused files , fix documentation
db4bc77
|
raw
history blame
8.73 kB
# πŸŽ“ Model Training Guide
This guide covers how to fine-tune your own Stable Diffusion model using LoRA (Low-Rank Adaptation) for creating custom character models like our Pepe generator.
---
## πŸ“– Table of Contents
- [Overview](#overview)
- [Prerequisites](#prerequisites)
- [Dataset Preparation](#dataset-preparation)
- [Training Configuration](#training-configuration)
- [Running the Training](#running-the-training)
- [Model Upload](#model-upload)
---
## 🎯 Overview
### What is LoRA?
**LoRA (Low-Rank Adaptation)** is a parameter-efficient fine-tuning technique that:
- βœ… Trains only a small fraction of parameters (~0.5% of full model)
- βœ… Requires significantly less VRAM (~10GB vs 40GB+)
- βœ… Maintains base model quality while adding custom styles
- βœ… Produces small, portable adapter files (~100MB vs 4GB+)
- βœ… Can be combined with other LoRAs
### Our Training Setup
**Model**: Pepe the Frog LoRA
**Base**: Stable Diffusion v1.5
**Dataset**: [iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)
**Result**: [MJaheen/Pepe_The_Frog_model_v1_lora](https://huggingface.co/MJaheen/Pepe_The_Frog_model_v1_lora)
**Training Time**: ~2-3 hours on T4 GPU (Google Colab)
---
## πŸ› οΈ Prerequisites
### Hardware Requirements
**Minimum**:
- GPU: NVIDIA GPU with 10GB+ VRAM (e.g., RTX 3080, T4)
- RAM: 16GB system RAM
- Storage: 20GB free space
**Recommended**:
- GPU: NVIDIA A100, V100, or RTX 4090
- RAM: 32GB system RAM
- Storage: 50GB+ SSD
### Software Requirements
```bash
# Core dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install diffusers==0.31.0
pip install transformers==4.45.1
pip install accelerate==0.34.2
pip install peft>=0.11.0
pip install safetensors==0.4.4
pip install datasets
pip install bitsandbytes # For 8-bit Adam optimizer (optional)
```
---
## πŸ“‚ Dataset Preparation
### Dataset Structure
Your dataset should follow this structure:
```
dataset/
β”œβ”€β”€ image_1.png
β”œβ”€β”€ image_2.png
β”œβ”€β”€ image_3.png
└── metadata.jsonl # or metadata.csv
```
### Metadata Format
**Option 1: JSONL (Recommended)**
```jsonl
{"file_name": "image_1.png", "prompt": "pepe_style_frog, happy pepe smiling"}
{"file_name": "image_2.png", "prompt": "pepe_style_frog, sad pepe crying"}
{"file_name": "image_3.png", "prompt": "pepe_style_frog, pepe drinking coffee"}
```
**Option 2: CSV**
```csv
file_name,prompt
image_1.png,"pepe_style_frog, happy pepe smiling"
image_2.png,"pepe_style_frog, sad pepe crying"
image_3.png,"pepe_style_frog, pepe drinking coffee"
```
### Dataset Best Practices
1. **Image Quality**
- Resolution: 512x512 or higher
- Format: PNG or JPG
- Clear, well-lit images
- Varied poses and expressions
2. **Caption Quality**
- Include trigger word (e.g., `pepe_style_frog`)
- Describe key features and actions
- Be consistent in naming conventions
- 5-15 words per caption optimal
3. **Dataset Size**
- Minimum: 20-50 images
- Optimal: 100-500 images
- More images = better generalization
4. **Diversity**
- Various angles and poses
- Different expressions
- Multiple backgrounds
- Different lighting conditions
### Our Pepe Dataset
We used **[iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)** which contains:
- ~200 high-quality Pepe images
- Consistent 512x512 resolution
- Varied expressions and styles
- Pre-captioned with trigger word
---
## βš™οΈ Training Configuration
### Training Hyperparameters
Here's the exact configuration we used for the Pepe model:
```bash
accelerate launch train_text_to_image_lora.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--train_data_dir="/path/to/pepe-data" \
--caption_column="prompt" \
--image_column="image" \
--resolution=512 \
--center_crop \
--random_flip \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=2000 \
--learning_rate=1e-4 \
--lr_scheduler="cosine" \
--lr_warmup_steps=0 \
--output_dir="./output" \
--rank=16 \
--validation_prompt="pepe_style_frog, a high-quality, detailed image of pepe the frog smiling and holding a cup of coffee at sunrise" \
--validation_epochs=5 \
--seed=42 \
--mixed_precision="fp16" \
--checkpointing_steps=150
```
### Parameter Explanation
| Parameter | Value | Description |
|-----------|-------|-------------|
| `pretrained_model_name_or_path` | `runwayml/stable-diffusion-v1-5` | Base model to fine-tune |
| `train_data_dir` | `/path/to/data` | Path to your dataset |
| `resolution` | `512` | Image resolution (512x512) |
| `train_batch_size` | `1` | Batch size per GPU |
| `gradient_accumulation_steps` | `4` | Effective batch size = 1 * 4 = 4 |
| `max_train_steps` | `2000` | Total training steps |
| `learning_rate` | `1e-4` | Initial learning rate |
| `lr_scheduler` | `cosine` | Learning rate schedule |
| `rank` | `16` | LoRA rank (higher = more parameters) |
| `mixed_precision` | `fp16` | Use 16-bit precision for speed |
| `checkpointing_steps` | `150` | Save checkpoint every N steps |
### Hyperparameter Tuning Tips
**Learning Rate**:
- Too high: Training unstable, poor quality
- Too low: Slow convergence, underfitting
- Recommended: `1e-4` to `1e-5`
**LoRA Rank**:
- Lower (4-8): Faster training, smaller files, less expressive
- Medium (16-32): Balanced (recommended)
- Higher (64-128): More expressive, larger files, risk of overfitting
**Training Steps**:
- Small dataset (20-50 images): 500-1000 steps
- Medium dataset (50-200 images): 1000-2000 steps
- Large dataset (200+ images): 2000-5000 steps
**Batch Size**:
- Depends on VRAM availability
- Effective batch size = `batch_size Γ— gradient_accumulation_steps`
- Recommended effective batch size: 4-8
---
## πŸš€ Running the Training
### Option 1: Google Colab (Recommended for Beginners)
1. **Open the Notebook**:
- Use our provided notebook: `diffusion_model_finetuning.ipynb`
- Or create new Colab notebook
2. **Setup GPU**:
```
Runtime β†’ Change runtime type β†’ GPU (T4)
```
3. **Mount Google Drive** (optional):
```python
from google.colab import drive
drive.mount('/content/drive')
```
4. **Install Dependencies**:
```python
!pip install -q diffusers transformers accelerate peft
```
5. **Upload Dataset**:
- Upload to Google Drive
- Or download from Hugging Face
6. **Run Training**:
```python
!accelerate launch train_text_to_image_lora.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--train_data_dir="/content/drive/MyDrive/pepe-data" \
--max_train_steps=2000 \
--learning_rate=1e-4 \
--output_dir="./output"
```
7. **Monitor Progress**:
- Watch loss decrease
- Check validation images
- Save checkpoints to Drive
### Generate test image
image = pipe("pepe_style_frog, wizard casting spells").images[0]
image.save("validation.png")
```
## πŸ“€ Model Upload
### Prepare for Upload
1. **Test Locally**:
```python
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe.load_lora_weights("./output")
# Test
image = pipe("pepe_style_frog, happy pepe").images[0]
image.save("test.png")
```
2. **Prepare Files**:
```
output/
β”œβ”€β”€ pytorch_lora_weights.safetensors # Main file
β”œβ”€β”€ README.md # Model card
└── sample_images/ # Example outputs
```
### Upload to Hugging Face
1. **Install Hub CLI**:
```bash
pip install huggingface_hub
huggingface-cli login
```
2. **Create Model Card** (`README.md`):
```markdown
---
license: creativeml-openrail-m
base_model: runwayml/stable-diffusion-v1-5
tags:
- stable-diffusion
- lora
- text-to-image
---
# Pepe LoRA Model
Fine-tuned LoRA for generating Pepe the Frog images.
## Usage
```python
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe.load_lora_weights("YOUR_USERNAME/your-model-name")
image = pipe("pepe_style_frog, happy pepe").images[0]
```
```
3. **Upload**:
```python
from huggingface_hub import HfApi
api = HfApi()
api.create_repo("YOUR_USERNAME/pepe-lora", repo_type="model")
api.upload_folder(
folder_path="./output",
repo_id="YOUR_USERNAME/pepe-lora",
repo_type="model"
)
```
### Common Issues
**Out of Memory**:
- Reduce `train_batch_size` to 1
- Enable `--gradient_checkpointing`
- Use `--mixed_precision="fp16"`
- Reduce image resolution