Pepe-Meme-Generator / docs /TRAINING.md
MJaheen's picture
Delete unused files , fix documentation
db4bc77
|
raw
history blame
8.73 kB

πŸŽ“ Model Training Guide

This guide covers how to fine-tune your own Stable Diffusion model using LoRA (Low-Rank Adaptation) for creating custom character models like our Pepe generator.


πŸ“– Table of Contents


🎯 Overview

What is LoRA?

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that:

  • βœ… Trains only a small fraction of parameters (~0.5% of full model)
  • βœ… Requires significantly less VRAM (~10GB vs 40GB+)
  • βœ… Maintains base model quality while adding custom styles
  • βœ… Produces small, portable adapter files (~100MB vs 4GB+)
  • βœ… Can be combined with other LoRAs

Our Training Setup

Model: Pepe the Frog LoRA
Base: Stable Diffusion v1.5
Dataset: iresidentevil/pepe_the_frog
Result: MJaheen/Pepe_The_Frog_model_v1_lora
Training Time: ~2-3 hours on T4 GPU (Google Colab)


πŸ› οΈ Prerequisites

Hardware Requirements

Minimum:

  • GPU: NVIDIA GPU with 10GB+ VRAM (e.g., RTX 3080, T4)
  • RAM: 16GB system RAM
  • Storage: 20GB free space

Recommended:

  • GPU: NVIDIA A100, V100, or RTX 4090
  • RAM: 32GB system RAM
  • Storage: 50GB+ SSD

Software Requirements

# Core dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install diffusers==0.31.0
pip install transformers==4.45.1
pip install accelerate==0.34.2
pip install peft>=0.11.0
pip install safetensors==0.4.4
pip install datasets
pip install bitsandbytes  # For 8-bit Adam optimizer (optional)

πŸ“‚ Dataset Preparation

Dataset Structure

Your dataset should follow this structure:

dataset/
β”œβ”€β”€ image_1.png
β”œβ”€β”€ image_2.png
β”œβ”€β”€ image_3.png
└── metadata.jsonl  # or metadata.csv

Metadata Format

Option 1: JSONL (Recommended)

{"file_name": "image_1.png", "prompt": "pepe_style_frog, happy pepe smiling"}
{"file_name": "image_2.png", "prompt": "pepe_style_frog, sad pepe crying"}
{"file_name": "image_3.png", "prompt": "pepe_style_frog, pepe drinking coffee"}

Option 2: CSV

file_name,prompt
image_1.png,"pepe_style_frog, happy pepe smiling"
image_2.png,"pepe_style_frog, sad pepe crying"
image_3.png,"pepe_style_frog, pepe drinking coffee"

Dataset Best Practices

  1. Image Quality

    • Resolution: 512x512 or higher
    • Format: PNG or JPG
    • Clear, well-lit images
    • Varied poses and expressions
  2. Caption Quality

    • Include trigger word (e.g., pepe_style_frog)
    • Describe key features and actions
    • Be consistent in naming conventions
    • 5-15 words per caption optimal
  3. Dataset Size

    • Minimum: 20-50 images
    • Optimal: 100-500 images
    • More images = better generalization
  4. Diversity

    • Various angles and poses
    • Different expressions
    • Multiple backgrounds
    • Different lighting conditions

Our Pepe Dataset

We used iresidentevil/pepe_the_frog which contains:

  • ~200 high-quality Pepe images
  • Consistent 512x512 resolution
  • Varied expressions and styles
  • Pre-captioned with trigger word

βš™οΈ Training Configuration

Training Hyperparameters

Here's the exact configuration we used for the Pepe model:

accelerate launch train_text_to_image_lora.py \
  --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
  --train_data_dir="/path/to/pepe-data" \
  --caption_column="prompt" \
  --image_column="image" \
  --resolution=512 \
  --center_crop \
  --random_flip \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --max_train_steps=2000 \
  --learning_rate=1e-4 \
  --lr_scheduler="cosine" \
  --lr_warmup_steps=0 \
  --output_dir="./output" \
  --rank=16 \
  --validation_prompt="pepe_style_frog, a high-quality, detailed image of pepe the frog smiling and holding a cup of coffee at sunrise" \
  --validation_epochs=5 \
  --seed=42 \
  --mixed_precision="fp16" \
  --checkpointing_steps=150

Parameter Explanation

Parameter Value Description
pretrained_model_name_or_path runwayml/stable-diffusion-v1-5 Base model to fine-tune
train_data_dir /path/to/data Path to your dataset
resolution 512 Image resolution (512x512)
train_batch_size 1 Batch size per GPU
gradient_accumulation_steps 4 Effective batch size = 1 * 4 = 4
max_train_steps 2000 Total training steps
learning_rate 1e-4 Initial learning rate
lr_scheduler cosine Learning rate schedule
rank 16 LoRA rank (higher = more parameters)
mixed_precision fp16 Use 16-bit precision for speed
checkpointing_steps 150 Save checkpoint every N steps

Hyperparameter Tuning Tips

Learning Rate:

  • Too high: Training unstable, poor quality
  • Too low: Slow convergence, underfitting
  • Recommended: 1e-4 to 1e-5

LoRA Rank:

  • Lower (4-8): Faster training, smaller files, less expressive
  • Medium (16-32): Balanced (recommended)
  • Higher (64-128): More expressive, larger files, risk of overfitting

Training Steps:

  • Small dataset (20-50 images): 500-1000 steps
  • Medium dataset (50-200 images): 1000-2000 steps
  • Large dataset (200+ images): 2000-5000 steps

Batch Size:

  • Depends on VRAM availability
  • Effective batch size = batch_size Γ— gradient_accumulation_steps
  • Recommended effective batch size: 4-8

πŸš€ Running the Training

Option 1: Google Colab (Recommended for Beginners)

  1. Open the Notebook:

    • Use our provided notebook: diffusion_model_finetuning.ipynb
    • Or create new Colab notebook
  2. Setup GPU:

    Runtime β†’ Change runtime type β†’ GPU (T4)
    
  3. Mount Google Drive (optional):

    from google.colab import drive
    drive.mount('/content/drive')
    
  4. Install Dependencies:

    !pip install -q diffusers transformers accelerate peft
    
  5. Upload Dataset:

    • Upload to Google Drive
    • Or download from Hugging Face
  6. Run Training:

    !accelerate launch train_text_to_image_lora.py \
      --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
      --train_data_dir="/content/drive/MyDrive/pepe-data" \
      --max_train_steps=2000 \
      --learning_rate=1e-4 \
      --output_dir="./output"
    
  7. Monitor Progress:

    • Watch loss decrease
    • Check validation images
    • Save checkpoints to Drive

Generate test image

image = pipe("pepe_style_frog, wizard casting spells").images[0] image.save("validation.png")



## πŸ“€ Model Upload

### Prepare for Upload

1. **Test Locally**:
   ```python
   from diffusers import StableDiffusionPipeline
   
   pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
   pipe.load_lora_weights("./output")
   
   # Test
   image = pipe("pepe_style_frog, happy pepe").images[0]
   image.save("test.png")
  1. Prepare Files:
    output/
    β”œβ”€β”€ pytorch_lora_weights.safetensors  # Main file
    β”œβ”€β”€ README.md  # Model card
    └── sample_images/  # Example outputs
    

Upload to Hugging Face

  1. Install Hub CLI:

    pip install huggingface_hub
    huggingface-cli login
    
  2. Create Model Card (README.md): ```markdown

    license: creativeml-openrail-m base_model: runwayml/stable-diffusion-v1-5 tags: - stable-diffusion - lora - text-to-image

    Pepe LoRA Model

    Fine-tuned LoRA for generating Pepe the Frog images.

    Usage

    from diffusers import StableDiffusionPipeline
    
    pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
    pipe.load_lora_weights("YOUR_USERNAME/your-model-name")
    
    image = pipe("pepe_style_frog, happy pepe").images[0]
    
    
    
  3. Upload:

    from huggingface_hub import HfApi
    
    api = HfApi()
    api.create_repo("YOUR_USERNAME/pepe-lora", repo_type="model")
    api.upload_folder(
        folder_path="./output",
        repo_id="YOUR_USERNAME/pepe-lora",
        repo_type="model"
    )
    

Common Issues

Out of Memory:

  • Reduce train_batch_size to 1
  • Enable --gradient_checkpointing
  • Use --mixed_precision="fp16"
  • Reduce image resolution