Model Description

This model is a fine-tuned Stable Diffusion model to generate realistic pedestrian-perspective images of crosswalks. It was fine-tuned on a dataset of 150 first-person view (FPV) images, primarily captured in sunny conditions, to enable controlled text-to-image generation for data augmentation in crosswalk segmentation tasks.

Base model: Stable Diffusion v1.4
Fine-tuning method: Text-to-image fine-tuning using custom FPV crosswalk dataset
Components:
- unet — fine-tuned U-Net weights
- vae — fine-tuned VAE weights
Intended use: Synthetic data generation for semantic segmentation augmentation

Use Cases

Data augmentation for crosswalk segmentation models
Generating diverse weather and lighting scenarios (e.g., fog, rain, snow, night) from text prompts
Research on assistive navigation systems for visually impaired pedestrians
Benchmarking model generalization across diverse environments

How to Use

You can generate images with the provided Python inference script:

# Clone the repository
git clone https://huggingface.co/kromic/sd-crosswalk-augmentation
cd sd-crosswalk-augmentation

# Install dependencies
pip install diffusers transformers torch

# Run inference
python generate.py

# Customize your prompt
prompt = "a crosswalk image"

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kromic/sd-crosswalk-augmentation

Base model

CompVis/stable-diffusion-v1-4

Finetuned

(1156)

this model