Spaces:

MJaheen
/

Pepe-Meme-Generator

Sleeping

App Files Files Community

Pepe-Meme-Generator / docs /TRAINING.md

MJaheen

Delete unused files , fix documentation

db4bc77 2 months ago

preview code

raw

history blame

8.73 kB

	# 🎓 Model Training Guide

	This guide covers how to fine-tune your own Stable Diffusion model using LoRA (Low-Rank Adaptation) for creating custom character models like our Pepe generator.

	---

	## 📖 Table of Contents

	- [Overview](#overview)
	- [Prerequisites](#prerequisites)
	- [Dataset Preparation](#dataset-preparation)
	- [Training Configuration](#training-configuration)
	- [Running the Training](#running-the-training)
	- [Model Upload](#model-upload)


	---

	## 🎯 Overview

	### What is LoRA?

	LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that:
	- ✅ Trains only a small fraction of parameters (~0.5% of full model)
	- ✅ Requires significantly less VRAM (~10GB vs 40GB+)
	- ✅ Maintains base model quality while adding custom styles
	- ✅ Produces small, portable adapter files (~100MB vs 4GB+)
	- ✅ Can be combined with other LoRAs

	### Our Training Setup

	Model: Pepe the Frog LoRA
	Base: Stable Diffusion v1.5
	Dataset: [iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)
	Result: [MJaheen/Pepe_The_Frog_model_v1_lora](https://huggingface.co/MJaheen/Pepe_The_Frog_model_v1_lora)
	Training Time: ~2-3 hours on T4 GPU (Google Colab)

	---

	## 🛠️ Prerequisites

	### Hardware Requirements

	Minimum:
	- GPU: NVIDIA GPU with 10GB+ VRAM (e.g., RTX 3080, T4)
	- RAM: 16GB system RAM
	- Storage: 20GB free space

	Recommended:
	- GPU: NVIDIA A100, V100, or RTX 4090
	- RAM: 32GB system RAM
	- Storage: 50GB+ SSD


	### Software Requirements

	```bash
	# Core dependencies
	pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118
	pip install diffusers==0.31.0
	pip install transformers==4.45.1
	pip install accelerate==0.34.2
	pip install peft>=0.11.0
	pip install safetensors==0.4.4
	pip install datasets
	pip install bitsandbytes # For 8-bit Adam optimizer (optional)
	```

	---

	## 📂 Dataset Preparation

	### Dataset Structure

	Your dataset should follow this structure:

	```
	dataset/
	├── image_1.png
	├── image_2.png
	├── image_3.png
	└── metadata.jsonl # or metadata.csv
	```

	### Metadata Format

	Option 1: JSONL (Recommended)

	```jsonl
	{"file_name": "image_1.png", "prompt": "pepe_style_frog, happy pepe smiling"}
	{"file_name": "image_2.png", "prompt": "pepe_style_frog, sad pepe crying"}
	{"file_name": "image_3.png", "prompt": "pepe_style_frog, pepe drinking coffee"}
	```

	Option 2: CSV

	```csv
	file_name,prompt
	image_1.png,"pepe_style_frog, happy pepe smiling"
	image_2.png,"pepe_style_frog, sad pepe crying"
	image_3.png,"pepe_style_frog, pepe drinking coffee"
	```

	### Dataset Best Practices

	1. Image Quality
	- Resolution: 512x512 or higher
	- Format: PNG or JPG
	- Clear, well-lit images
	- Varied poses and expressions

	2. Caption Quality
	- Include trigger word (e.g., `pepe_style_frog`)
	- Describe key features and actions
	- Be consistent in naming conventions
	- 5-15 words per caption optimal

	3. Dataset Size
	- Minimum: 20-50 images
	- Optimal: 100-500 images
	- More images = better generalization

	4. Diversity
	- Various angles and poses
	- Different expressions
	- Multiple backgrounds
	- Different lighting conditions

	### Our Pepe Dataset

	We used [iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog) which contains:
	- ~200 high-quality Pepe images
	- Consistent 512x512 resolution
	- Varied expressions and styles
	- Pre-captioned with trigger word

	---

	## ⚙️ Training Configuration

	### Training Hyperparameters

	Here's the exact configuration we used for the Pepe model:

	```bash
	accelerate launch train_text_to_image_lora.py \
	--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
	--train_data_dir="/path/to/pepe-data" \
	--caption_column="prompt" \
	--image_column="image" \
	--resolution=512 \
	--center_crop \
	--random_flip \
	--train_batch_size=1 \
	--gradient_accumulation_steps=4 \
	--max_train_steps=2000 \
	--learning_rate=1e-4 \
	--lr_scheduler="cosine" \
	--lr_warmup_steps=0 \
	--output_dir="./output" \
	--rank=16 \
	--validation_prompt="pepe_style_frog, a high-quality, detailed image of pepe the frog smiling and holding a cup of coffee at sunrise" \
	--validation_epochs=5 \
	--seed=42 \
	--mixed_precision="fp16" \
	--checkpointing_steps=150
	```

	### Parameter Explanation

	\| Parameter \| Value \| Description \|
	\|-----------\|-------\|-------------\|
	\| `pretrained_model_name_or_path` \| `runwayml/stable-diffusion-v1-5` \| Base model to fine-tune \|
	\| `train_data_dir` \| `/path/to/data` \| Path to your dataset \|
	\| `resolution` \| `512` \| Image resolution (512x512) \|
	\| `train_batch_size` \| `1` \| Batch size per GPU \|
	\| `gradient_accumulation_steps` \| `4` \| Effective batch size = 1 * 4 = 4 \|
	\| `max_train_steps` \| `2000` \| Total training steps \|
	\| `learning_rate` \| `1e-4` \| Initial learning rate \|
	\| `lr_scheduler` \| `cosine` \| Learning rate schedule \|
	\| `rank` \| `16` \| LoRA rank (higher = more parameters) \|
	\| `mixed_precision` \| `fp16` \| Use 16-bit precision for speed \|
	\| `checkpointing_steps` \| `150` \| Save checkpoint every N steps \|

	### Hyperparameter Tuning Tips

	Learning Rate:
	- Too high: Training unstable, poor quality
	- Too low: Slow convergence, underfitting
	- Recommended: `1e-4` to `1e-5`

	LoRA Rank:
	- Lower (4-8): Faster training, smaller files, less expressive
	- Medium (16-32): Balanced (recommended)
	- Higher (64-128): More expressive, larger files, risk of overfitting

	Training Steps:
	- Small dataset (20-50 images): 500-1000 steps
	- Medium dataset (50-200 images): 1000-2000 steps
	- Large dataset (200+ images): 2000-5000 steps

	Batch Size:
	- Depends on VRAM availability
	- Effective batch size = `batch_size × gradient_accumulation_steps`
	- Recommended effective batch size: 4-8

	---

	## 🚀 Running the Training

	### Option 1: Google Colab (Recommended for Beginners)

	1. Open the Notebook:
	- Use our provided notebook: `diffusion_model_finetuning.ipynb`
	- Or create new Colab notebook

	2. Setup GPU:
	```
	Runtime → Change runtime type → GPU (T4)
	```

	3. Mount Google Drive (optional):
	```python
	from google.colab import drive
	drive.mount('/content/drive')
	```

	4. Install Dependencies:
	```python
	!pip install -q diffusers transformers accelerate peft
	```

	5. Upload Dataset:
	- Upload to Google Drive
	- Or download from Hugging Face

	6. Run Training:
	```python
	!accelerate launch train_text_to_image_lora.py \
	--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \
	--train_data_dir="/content/drive/MyDrive/pepe-data" \
	--max_train_steps=2000 \
	--learning_rate=1e-4 \
	--output_dir="./output"
	```

	7. Monitor Progress:
	- Watch loss decrease
	- Check validation images
	- Save checkpoints to Drive


	### Generate test image
	image = pipe("pepe_style_frog, wizard casting spells").images[0]
	image.save("validation.png")
	```


	## 📤 Model Upload

	### Prepare for Upload

	1. Test Locally:
	```python
	from diffusers import StableDiffusionPipeline

	pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
	pipe.load_lora_weights("./output")

	# Test
	image = pipe("pepe_style_frog, happy pepe").images[0]
	image.save("test.png")
	```

	2. Prepare Files:
	```
	output/
	├── pytorch_lora_weights.safetensors # Main file
	├── README.md # Model card
	└── sample_images/ # Example outputs
	```

	### Upload to Hugging Face

	1. Install Hub CLI:
	```bash
	pip install huggingface_hub
	huggingface-cli login
	```

	2. Create Model Card (`README.md`):
	```markdown
	---
	license: creativeml-openrail-m
	base_model: runwayml/stable-diffusion-v1-5
	tags:
	- stable-diffusion
	- lora
	- text-to-image
	---

	# Pepe LoRA Model

	Fine-tuned LoRA for generating Pepe the Frog images.

	## Usage
	```python
	from diffusers import StableDiffusionPipeline

	pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
	pipe.load_lora_weights("YOUR_USERNAME/your-model-name")

	image = pipe("pepe_style_frog, happy pepe").images[0]
	```
	```

	3. Upload:
	```python
	from huggingface_hub import HfApi

	api = HfApi()
	api.create_repo("YOUR_USERNAME/pepe-lora", repo_type="model")
	api.upload_folder(
	folder_path="./output",
	repo_id="YOUR_USERNAME/pepe-lora",
	repo_type="model"
	)
	```


	### Common Issues

	Out of Memory:
	- Reduce `train_batch_size` to 1
	- Enable `--gradient_checkpointing`
	- Use `--mixed_precision="fp16"`
	- Reduce image resolution