Spaces:
Running
Running
| # Image Preprocessing Service | |
| This service automatically processes various image formats during upload to ensure compatibility and optimal storage. | |
| ## Overview | |
| The `ImagePreprocessor` service automatically detects and converts various image formats to PNG or JPEG before storing them in the system. This ensures that all images are in a standard, web-compatible format. | |
| ## Supported Input Formats | |
| ### Direct Storage (No Preprocessing) | |
| - **PNG** (`image/png`) - Already optimal format | |
| - **JPEG** (`image/jpeg`, `image/jpg`) - Already optimal format | |
| ### Formats Requiring Preprocessing | |
| #### HEIC/HEIF Files | |
| - **Input**: HEIC/HEIF files from modern smartphones | |
| - **Processing**: Convert to RGB and flatten alpha channel | |
| - **Output**: PNG or JPEG | |
| #### WebP Files | |
| - **Input**: WebP format (Google's web image format) | |
| - **Processing**: Convert to RGB and flatten alpha channel | |
| - **Output**: PNG or JPEG | |
| #### GIF Files | |
| - **Input**: GIF files (static or animated) | |
| - **Processing**: Extract first frame for animated GIFs, convert to RGB | |
| - **Output**: PNG or JPEG | |
| #### TIFF/GeoTIFF Files | |
| - **Input**: TIFF or GeoTIFF files | |
| - **Processing**: Render RGB view, handle various color spaces | |
| - **Output**: PNG or JPEG | |
| #### PDF Files | |
| - **Input**: PDF documents | |
| - **Processing**: Rasterize first page at 2x zoom for quality | |
| - **Output**: PNG or JPEG | |
| - **Performance Note**: PDF processing is inherently slower due to complex format parsing and rasterization | |
| ## How It Works | |
| ### 1. MIME Type Detection | |
| The service first detects the file format using: | |
| - File extension analysis | |
| - File signature (magic bytes) detection | |
| - Fallback to generic binary if unknown | |
| ### 2. Preprocessing Decision | |
| - If format is already PNG/JPEG β No processing needed | |
| - If format requires conversion β Apply appropriate processor | |
| ### 3. Format Conversion | |
| Each format has a specialized processor that: | |
| - Opens the file using appropriate library (PIL, PyMuPDF) | |
| - Converts to RGB color space | |
| - Flattens alpha channels | |
| - Optimizes output quality | |
| - Generates new filename with correct extension | |
| ### 4. Storage | |
| - Processed image is stored with new filename | |
| - Original filename is preserved in metadata | |
| - SHA256 hash is calculated from processed content | |
| ## Integration Points | |
| ### Upload Endpoint (`/api/images/`) | |
| - All file uploads go through preprocessing | |
| - Supports drag & drop and file picker | |
| - Handles both crisis maps and drone imagery | |
| ### Contribution Endpoint (`/api/contribute/from-url`) | |
| - Images contributed from existing URLs are also preprocessed | |
| - Ensures consistency across all image sources | |
| ## Configuration | |
| ### Target Format | |
| - **Default**: PNG (better quality, lossless) | |
| - **Alternative**: JPEG (smaller file size, lossy) | |
| - **Quality**: 95% for JPEG (configurable) | |
| ### Error Handling | |
| - If preprocessing fails, falls back to original content | |
| - Logs errors for debugging | |
| - Continues upload process | |
| ## Dependencies | |
| - **Pillow (PIL)**: Core image processing | |
| - **PyMuPDF**: PDF rasterization | |
| - **Python standard library**: MIME type detection, file handling | |
| ## Benefits | |
| 1. **Format Consistency**: All stored images are in web-compatible formats | |
| 2. **Quality Assurance**: Automatic optimization and color space conversion | |
| 3. **User Experience**: Users can upload any common image format | |
| 4. **Storage Efficiency**: Optimized file sizes and formats | |
| 5. **Compatibility**: Ensures images work across all platforms and browsers | |
| ## Example Usage | |
| ```python | |
| from app.services.image_preprocessor import ImagePreprocessor | |
| # Process an image | |
| processed_content, new_filename, mime_type = ImagePreprocessor.preprocess_image( | |
| file_content, | |
| "original.heic", | |
| target_format='PNG', | |
| quality=95 | |
| ) | |
| # Check if preprocessing is needed | |
| if ImagePreprocessor.needs_preprocessing(mime_type): | |
| print(f"Converting {mime_type} to PNG...") | |
| ``` | |
| ## Error Handling | |
| The service gracefully handles errors: | |
| - **Unsupported formats**: Falls back to generic processing | |
| - **Corrupted files**: Logs error and continues with original | |
| - **Processing failures**: Maintains upload functionality | |
| - **Memory issues**: Handles large files efficiently | |
| ## Performance Considerations | |
| ### PDF Processing Performance | |
| PDF conversion is the most computationally expensive operation due to: | |
| - **Complex Format**: PDFs require parsing, interpretation, and rendering | |
| - **Rasterization**: Vector-to-pixel conversion is CPU-intensive | |
| - **Memory Usage**: Large PDFs can consume significant memory | |
| - **Quality vs Speed**: Higher zoom factors increase quality but decrease speed | |
| ### Performance Tuning Options | |
| ```python | |
| from app.services.image_preprocessor import ImagePreprocessor | |
| # Fast mode - lower quality, much faster | |
| ImagePreprocessor.configure_pdf_processing(quality_mode='fast') | |
| # Balanced mode - good quality, reasonable speed (default) | |
| ImagePreprocessor.configure_pdf_processing(quality_mode='balanced') | |
| # Quality mode - highest quality, slower processing | |
| ImagePreprocessor.configure_pdf_processing(quality_mode='quality') | |
| # Custom configuration | |
| ImagePreprocessor.configure_pdf_processing( | |
| zoom_factor=1.2, # Lower zoom = faster | |
| compress_level=4, # Lower compression = faster | |
| quality_mode='balanced' | |
| ) | |
| ``` | |
| ### Expected Processing Times | |
| - **Small PDFs (<1MB)**: 2-5 seconds | |
| - **Medium PDFs (1-5MB)**: 5-15 seconds | |
| - **Large PDFs (5-25MB)**: 15-60 seconds | |
| - **Complex PDFs**: May take longer due to graphics complexity | |
| ## Future Enhancements | |
| - **Batch processing**: Process multiple images simultaneously | |
| - **Format preferences**: User-configurable output formats | |
| - **Quality settings**: Adjustable compression levels | |
| - **Metadata preservation**: Keep EXIF and other metadata | |
| - **Progressive processing**: Stream large files | |