Spaces:
Sleeping
π Model Training Guide
This guide covers how to fine-tune your own Stable Diffusion model using LoRA (Low-Rank Adaptation) for creating custom character models like our Pepe generator.
π Table of Contents
π― Overview
What is LoRA?
LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that:
- β Trains only a small fraction of parameters (~0.5% of full model)
- β Requires significantly less VRAM (~10GB vs 40GB+)
- β Maintains base model quality while adding custom styles
- β Produces small, portable adapter files (~100MB vs 4GB+)
- β Can be combined with other LoRAs
Our Training Setup
Model: Pepe the Frog LoRA
Base: Stable Diffusion v1.5
Dataset: iresidentevil/pepe_the_frog
Result: MJaheen/Pepe_The_Frog_model_v1_lora
Training Time: ~2-3 hours on T4 GPU (Google Colab)
π οΈ Prerequisites
Hardware Requirements
Minimum:
- GPU: NVIDIA GPU with 10GB+ VRAM (e.g., RTX 3080, T4)
- RAM: 16GB system RAM
- Storage: 20GB free space
Recommended:
- GPU: NVIDIA A100, V100, or RTX 4090
- RAM: 32GB system RAM
- Storage: 50GB+ SSD
Software Requirements
# Core dependencies
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
pip install diffusers==0.31.0
pip install transformers==4.45.1
pip install accelerate==0.34.2
pip install peft>=0.11.0
pip install safetensors==0.4.4
pip install datasets
pip install bitsandbytes # For 8-bit Adam optimizer (optional)
π Dataset Preparation
Dataset Structure
Your dataset should follow this structure:
dataset/
βββ image_1.png
βββ image_2.png
βββ image_3.png
βββ metadata.jsonl # or metadata.csv
Metadata Format
Option 1: JSONL (Recommended)
{"file_name": "image_1.png", "prompt": "pepe_style_frog, happy pepe smiling"}
{"file_name": "image_2.png", "prompt": "pepe_style_frog, sad pepe crying"}
{"file_name": "image_3.png", "prompt": "pepe_style_frog, pepe drinking coffee"}
Option 2: CSV
file_name,prompt
image_1.png,"pepe_style_frog, happy pepe smiling"
image_2.png,"pepe_style_frog, sad pepe crying"
image_3.png,"pepe_style_frog, pepe drinking coffee"
Dataset Best Practices
Image Quality
- Resolution: 512x512 or higher
- Format: PNG or JPG
- Clear, well-lit images
- Varied poses and expressions
Caption Quality
- Include trigger word (e.g.,
pepe_style_frog) - Describe key features and actions
- Be consistent in naming conventions
- 5-15 words per caption optimal
- Include trigger word (e.g.,
Dataset Size
- Minimum: 20-50 images
- Optimal: 100-500 images
- More images = better generalization
Diversity
- Various angles and poses
- Different expressions
- Multiple backgrounds
- Different lighting conditions
Our Pepe Dataset
We used iresidentevil/pepe_the_frog which contains:
- ~200 high-quality Pepe images
- Consistent 512x512 resolution
- Varied expressions and styles
- Pre-captioned with trigger word
βοΈ Training Configuration
Training Hyperparameters
Here's the exact configuration we used for the Pepe model:
accelerate launch train_text_to_image_lora.py \
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
--train_data_dir="/path/to/pepe-data" \
--caption_column="prompt" \
--image_column="image" \
--resolution=512 \
--center_crop \
--random_flip \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--max_train_steps=2000 \
--learning_rate=1e-4 \
--lr_scheduler="cosine" \
--lr_warmup_steps=0 \
--output_dir="./output" \
--rank=16 \
--validation_prompt="pepe_style_frog, a high-quality, detailed image of pepe the frog smiling and holding a cup of coffee at sunrise" \
--validation_epochs=5 \
--seed=42 \
--mixed_precision="fp16" \
--checkpointing_steps=150
Parameter Explanation
| Parameter | Value | Description |
|---|---|---|
pretrained_model_name_or_path |
runwayml/stable-diffusion-v1-5 |
Base model to fine-tune |
train_data_dir |
/path/to/data |
Path to your dataset |
resolution |
512 |
Image resolution (512x512) |
train_batch_size |
1 |
Batch size per GPU |
gradient_accumulation_steps |
4 |
Effective batch size = 1 * 4 = 4 |
max_train_steps |
2000 |
Total training steps |
learning_rate |
1e-4 |
Initial learning rate |
lr_scheduler |
cosine |
Learning rate schedule |
rank |
16 |
LoRA rank (higher = more parameters) |
mixed_precision |
fp16 |
Use 16-bit precision for speed |
checkpointing_steps |
150 |
Save checkpoint every N steps |
Hyperparameter Tuning Tips
Learning Rate:
- Too high: Training unstable, poor quality
- Too low: Slow convergence, underfitting
- Recommended:
1e-4to1e-5
LoRA Rank:
- Lower (4-8): Faster training, smaller files, less expressive
- Medium (16-32): Balanced (recommended)
- Higher (64-128): More expressive, larger files, risk of overfitting
Training Steps:
- Small dataset (20-50 images): 500-1000 steps
- Medium dataset (50-200 images): 1000-2000 steps
- Large dataset (200+ images): 2000-5000 steps
Batch Size:
- Depends on VRAM availability
- Effective batch size =
batch_size Γ gradient_accumulation_steps - Recommended effective batch size: 4-8
π Running the Training
Option 1: Google Colab (Recommended for Beginners)
Open the Notebook:
- Use our provided notebook:
diffusion_model_finetuning.ipynb - Or create new Colab notebook
- Use our provided notebook:
Setup GPU:
Runtime β Change runtime type β GPU (T4)Mount Google Drive (optional):
from google.colab import drive drive.mount('/content/drive')Install Dependencies:
!pip install -q diffusers transformers accelerate peftUpload Dataset:
- Upload to Google Drive
- Or download from Hugging Face
Run Training:
!accelerate launch train_text_to_image_lora.py \ --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \ --train_data_dir="/content/drive/MyDrive/pepe-data" \ --max_train_steps=2000 \ --learning_rate=1e-4 \ --output_dir="./output"Monitor Progress:
- Watch loss decrease
- Check validation images
- Save checkpoints to Drive
Generate test image
image = pipe("pepe_style_frog, wizard casting spells").images[0] image.save("validation.png")
## π€ Model Upload
### Prepare for Upload
1. **Test Locally**:
```python
from diffusers import StableDiffusionPipeline
pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe.load_lora_weights("./output")
# Test
image = pipe("pepe_style_frog, happy pepe").images[0]
image.save("test.png")
- Prepare Files:
output/ βββ pytorch_lora_weights.safetensors # Main file βββ README.md # Model card βββ sample_images/ # Example outputs
Upload to Hugging Face
Install Hub CLI:
pip install huggingface_hub huggingface-cli loginCreate Model Card (
README.md): ```markdownlicense: creativeml-openrail-m base_model: runwayml/stable-diffusion-v1-5 tags: - stable-diffusion - lora - text-to-image
Pepe LoRA Model
Fine-tuned LoRA for generating Pepe the Frog images.
Usage
from diffusers import StableDiffusionPipeline pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") pipe.load_lora_weights("YOUR_USERNAME/your-model-name") image = pipe("pepe_style_frog, happy pepe").images[0]Upload:
from huggingface_hub import HfApi api = HfApi() api.create_repo("YOUR_USERNAME/pepe-lora", repo_type="model") api.upload_folder( folder_path="./output", repo_id="YOUR_USERNAME/pepe-lora", repo_type="model" )
Common Issues
Out of Memory:
- Reduce
train_batch_sizeto 1 - Enable
--gradient_checkpointing - Use
--mixed_precision="fp16" - Reduce image resolution