Spaces:
Sleeping
Sleeping
| # π Model Training Guide | |
| This guide covers how to fine-tune your own Stable Diffusion model using LoRA (Low-Rank Adaptation) for creating custom character models like our Pepe generator. | |
| --- | |
| ## π Table of Contents | |
| - [Overview](#overview) | |
| - [Prerequisites](#prerequisites) | |
| - [Dataset Preparation](#dataset-preparation) | |
| - [Training Configuration](#training-configuration) | |
| - [Running the Training](#running-the-training) | |
| - [Model Upload](#model-upload) | |
| --- | |
| ## π― Overview | |
| ### What is LoRA? | |
| **LoRA (Low-Rank Adaptation)** is a parameter-efficient fine-tuning technique that: | |
| - β Trains only a small fraction of parameters (~0.5% of full model) | |
| - β Requires significantly less VRAM (~10GB vs 40GB+) | |
| - β Maintains base model quality while adding custom styles | |
| - β Produces small, portable adapter files (~100MB vs 4GB+) | |
| - β Can be combined with other LoRAs | |
| ### Our Training Setup | |
| **Model**: Pepe the Frog LoRA | |
| **Base**: Stable Diffusion v1.5 | |
| **Dataset**: [iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog) | |
| **Result**: [MJaheen/Pepe_The_Frog_model_v1_lora](https://huggingface.co/MJaheen/Pepe_The_Frog_model_v1_lora) | |
| **Training Time**: ~2-3 hours on T4 GPU (Google Colab) | |
| --- | |
| ## π οΈ Prerequisites | |
| ### Hardware Requirements | |
| **Minimum**: | |
| - GPU: NVIDIA GPU with 10GB+ VRAM (e.g., RTX 3080, T4) | |
| - RAM: 16GB system RAM | |
| - Storage: 20GB free space | |
| **Recommended**: | |
| - GPU: NVIDIA A100, V100, or RTX 4090 | |
| - RAM: 32GB system RAM | |
| - Storage: 50GB+ SSD | |
| ### Software Requirements | |
| ```bash | |
| # Core dependencies | |
| pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118 | |
| pip install diffusers==0.31.0 | |
| pip install transformers==4.45.1 | |
| pip install accelerate==0.34.2 | |
| pip install peft>=0.11.0 | |
| pip install safetensors==0.4.4 | |
| pip install datasets | |
| pip install bitsandbytes # For 8-bit Adam optimizer (optional) | |
| ``` | |
| --- | |
| ## π Dataset Preparation | |
| ### Dataset Structure | |
| Your dataset should follow this structure: | |
| ``` | |
| dataset/ | |
| βββ image_1.png | |
| βββ image_2.png | |
| βββ image_3.png | |
| βββ metadata.jsonl # or metadata.csv | |
| ``` | |
| ### Metadata Format | |
| **Option 1: JSONL (Recommended)** | |
| ```jsonl | |
| {"file_name": "image_1.png", "prompt": "pepe_style_frog, happy pepe smiling"} | |
| {"file_name": "image_2.png", "prompt": "pepe_style_frog, sad pepe crying"} | |
| {"file_name": "image_3.png", "prompt": "pepe_style_frog, pepe drinking coffee"} | |
| ``` | |
| **Option 2: CSV** | |
| ```csv | |
| file_name,prompt | |
| image_1.png,"pepe_style_frog, happy pepe smiling" | |
| image_2.png,"pepe_style_frog, sad pepe crying" | |
| image_3.png,"pepe_style_frog, pepe drinking coffee" | |
| ``` | |
| ### Dataset Best Practices | |
| 1. **Image Quality** | |
| - Resolution: 512x512 or higher | |
| - Format: PNG or JPG | |
| - Clear, well-lit images | |
| - Varied poses and expressions | |
| 2. **Caption Quality** | |
| - Include trigger word (e.g., `pepe_style_frog`) | |
| - Describe key features and actions | |
| - Be consistent in naming conventions | |
| - 5-15 words per caption optimal | |
| 3. **Dataset Size** | |
| - Minimum: 20-50 images | |
| - Optimal: 100-500 images | |
| - More images = better generalization | |
| 4. **Diversity** | |
| - Various angles and poses | |
| - Different expressions | |
| - Multiple backgrounds | |
| - Different lighting conditions | |
| ### Our Pepe Dataset | |
| We used **[iresidentevil/pepe_the_frog](https://huggingface.co/datasets/iresidentevil/pepe_the_frog)** which contains: | |
| - ~200 high-quality Pepe images | |
| - Consistent 512x512 resolution | |
| - Varied expressions and styles | |
| - Pre-captioned with trigger word | |
| --- | |
| ## βοΈ Training Configuration | |
| ### Training Hyperparameters | |
| Here's the exact configuration we used for the Pepe model: | |
| ```bash | |
| accelerate launch train_text_to_image_lora.py \ | |
| --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \ | |
| --train_data_dir="/path/to/pepe-data" \ | |
| --caption_column="prompt" \ | |
| --image_column="image" \ | |
| --resolution=512 \ | |
| --center_crop \ | |
| --random_flip \ | |
| --train_batch_size=1 \ | |
| --gradient_accumulation_steps=4 \ | |
| --max_train_steps=2000 \ | |
| --learning_rate=1e-4 \ | |
| --lr_scheduler="cosine" \ | |
| --lr_warmup_steps=0 \ | |
| --output_dir="./output" \ | |
| --rank=16 \ | |
| --validation_prompt="pepe_style_frog, a high-quality, detailed image of pepe the frog smiling and holding a cup of coffee at sunrise" \ | |
| --validation_epochs=5 \ | |
| --seed=42 \ | |
| --mixed_precision="fp16" \ | |
| --checkpointing_steps=150 | |
| ``` | |
| ### Parameter Explanation | |
| | Parameter | Value | Description | | |
| |-----------|-------|-------------| | |
| | `pretrained_model_name_or_path` | `runwayml/stable-diffusion-v1-5` | Base model to fine-tune | | |
| | `train_data_dir` | `/path/to/data` | Path to your dataset | | |
| | `resolution` | `512` | Image resolution (512x512) | | |
| | `train_batch_size` | `1` | Batch size per GPU | | |
| | `gradient_accumulation_steps` | `4` | Effective batch size = 1 * 4 = 4 | | |
| | `max_train_steps` | `2000` | Total training steps | | |
| | `learning_rate` | `1e-4` | Initial learning rate | | |
| | `lr_scheduler` | `cosine` | Learning rate schedule | | |
| | `rank` | `16` | LoRA rank (higher = more parameters) | | |
| | `mixed_precision` | `fp16` | Use 16-bit precision for speed | | |
| | `checkpointing_steps` | `150` | Save checkpoint every N steps | | |
| ### Hyperparameter Tuning Tips | |
| **Learning Rate**: | |
| - Too high: Training unstable, poor quality | |
| - Too low: Slow convergence, underfitting | |
| - Recommended: `1e-4` to `1e-5` | |
| **LoRA Rank**: | |
| - Lower (4-8): Faster training, smaller files, less expressive | |
| - Medium (16-32): Balanced (recommended) | |
| - Higher (64-128): More expressive, larger files, risk of overfitting | |
| **Training Steps**: | |
| - Small dataset (20-50 images): 500-1000 steps | |
| - Medium dataset (50-200 images): 1000-2000 steps | |
| - Large dataset (200+ images): 2000-5000 steps | |
| **Batch Size**: | |
| - Depends on VRAM availability | |
| - Effective batch size = `batch_size Γ gradient_accumulation_steps` | |
| - Recommended effective batch size: 4-8 | |
| --- | |
| ## π Running the Training | |
| ### Option 1: Google Colab (Recommended for Beginners) | |
| 1. **Open the Notebook**: | |
| - Use our provided notebook: `diffusion_model_finetuning.ipynb` | |
| - Or create new Colab notebook | |
| 2. **Setup GPU**: | |
| ``` | |
| Runtime β Change runtime type β GPU (T4) | |
| ``` | |
| 3. **Mount Google Drive** (optional): | |
| ```python | |
| from google.colab import drive | |
| drive.mount('/content/drive') | |
| ``` | |
| 4. **Install Dependencies**: | |
| ```python | |
| !pip install -q diffusers transformers accelerate peft | |
| ``` | |
| 5. **Upload Dataset**: | |
| - Upload to Google Drive | |
| - Or download from Hugging Face | |
| 6. **Run Training**: | |
| ```python | |
| !accelerate launch train_text_to_image_lora.py \ | |
| --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" \ | |
| --train_data_dir="/content/drive/MyDrive/pepe-data" \ | |
| --max_train_steps=2000 \ | |
| --learning_rate=1e-4 \ | |
| --output_dir="./output" | |
| ``` | |
| 7. **Monitor Progress**: | |
| - Watch loss decrease | |
| - Check validation images | |
| - Save checkpoints to Drive | |
| ### Generate test image | |
| image = pipe("pepe_style_frog, wizard casting spells").images[0] | |
| image.save("validation.png") | |
| ``` | |
| ## π€ Model Upload | |
| ### Prepare for Upload | |
| 1. **Test Locally**: | |
| ```python | |
| from diffusers import StableDiffusionPipeline | |
| pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") | |
| pipe.load_lora_weights("./output") | |
| # Test | |
| image = pipe("pepe_style_frog, happy pepe").images[0] | |
| image.save("test.png") | |
| ``` | |
| 2. **Prepare Files**: | |
| ``` | |
| output/ | |
| βββ pytorch_lora_weights.safetensors # Main file | |
| βββ README.md # Model card | |
| βββ sample_images/ # Example outputs | |
| ``` | |
| ### Upload to Hugging Face | |
| 1. **Install Hub CLI**: | |
| ```bash | |
| pip install huggingface_hub | |
| huggingface-cli login | |
| ``` | |
| 2. **Create Model Card** (`README.md`): | |
| ```markdown | |
| --- | |
| license: creativeml-openrail-m | |
| base_model: runwayml/stable-diffusion-v1-5 | |
| tags: | |
| - stable-diffusion | |
| - lora | |
| - text-to-image | |
| --- | |
| # Pepe LoRA Model | |
| Fine-tuned LoRA for generating Pepe the Frog images. | |
| ## Usage | |
| ```python | |
| from diffusers import StableDiffusionPipeline | |
| pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5") | |
| pipe.load_lora_weights("YOUR_USERNAME/your-model-name") | |
| image = pipe("pepe_style_frog, happy pepe").images[0] | |
| ``` | |
| ``` | |
| 3. **Upload**: | |
| ```python | |
| from huggingface_hub import HfApi | |
| api = HfApi() | |
| api.create_repo("YOUR_USERNAME/pepe-lora", repo_type="model") | |
| api.upload_folder( | |
| folder_path="./output", | |
| repo_id="YOUR_USERNAME/pepe-lora", | |
| repo_type="model" | |
| ) | |
| ``` | |
| ### Common Issues | |
| **Out of Memory**: | |
| - Reduce `train_batch_size` to 1 | |
| - Enable `--gradient_checkpointing` | |
| - Use `--mixed_precision="fp16"` | |
| - Reduce image resolution | |