--- license: apache-2.0 datasets: - usm3d/hoho25k language: - en tags: - hoho25k - s23dr2025 --- # S23DR 2025 Challenge - Winning Solution 🏆 This repository contains the **winning solution** for the [Structured Semantic 3D Reconstruction (S23DR) Challenge 2025](https://huggingface.co/spaces/usm3d/S23DR2025) at CVPR 2025 Workshop. Our method achieves the winning performance in 3D wireframe reconstruction from multi-view images, combining COLMAP point clouds, semantic segmentation, and deep learning models to predict building wireframes with high accuracy. ## 🎯 Performance Our solution achieved the best scores across key metrics: - **HSS (Hybrid Structure Score)**: Superior spatial accuracy - **F1 Score**: Excellent balance of precision and recall - **IoU (Intersection over Union)**: High overlap with ground truth ## 🏗️ Method Overview Our approach consists of two main components: ### 1. Multi-Modal Data Fusion - **COLMAP Point Clouds**: Dense 3D reconstruction from multi-view images - **Semantic Segmentation**: ADE20K and Gestalt segmentation for building elements - **Depth Information**: Fitted dense depth maps aligned with 3D structure ### 2. Deep Learning Models We employ two specialized neural networks: #### FastPointNet (Vertex Prediction) - **Input**: 11D point cloud patches (xyz + rgb + features) - **Architecture**: Enhanced PointNet with residual connections, channel attention, and multi-scale pooling - **Output**: 3D vertex coordinates + confidence scores + classification - **Model File**: `pnet.pth` - **Features**: - Deeper architecture with 7 conv layers - Lightweight channel attention mechanism - Group normalization for stability - Multi-scale global pooling (max + average) #### ClassificationPointNet (Edge Classification) - **Input**: 6D point cloud patches (xyz + rgb) - **Architecture**: Binary classification PointNet with deep feature extraction - **Output**: Edge/no-edge classification with confidence - **Model File**: `pnet_class.pth` - **Features**: - 6-layer convolutional feature extraction - Dropout regularization (0.3-0.5) - Xavier initialization ### 3. Patch-Based Processing Pipeline Our pipeline processes local 3D patches around potential vertices: 1. **Initial Vertex Detection**: Extract candidates from semantic segmentation maps 2. **Point Cloud Clustering**: Group nearby 3D points using spatial clustering 3. **Patch Generation**: Create local point cloud patches (1-2m radius) centered on clusters 4. **Neural Refinement**: Use FastPointNet to refine vertex locations and classify validity 5. **Edge Prediction**: Generate candidate edges between vertices and classify using ClassificationPointNet 6. **Post-Processing**: Filter and merge results across multiple views ## 🚀 Quick Start ### Training ```bash # Train vertex prediction model python train_pnet_v2.py # Train edge classification model python train_pnet_class.py ``` ### Evaluation ```bash # Run evaluation on HoHo25k dataset python train.py --vertex_threshold 0.59 --edge_threshold 0.65 --only_predicted_connections True ``` ### Inference ```bash # Generate predictions (used in competition) # Uses pnet.pth and pnet_class.pth models python script.py ``` ## 📁 Key Files ### Core Models - `fast_pointnet_v2.py` - Enhanced PointNet for vertex prediction - `fast_pointnet_class.py` - PointNet for edge classification - `end_to_end.py` - VoxelUNet implementation (available but not used in final solution) ### Pipeline - `predict.py` - Main wireframe prediction pipeline (2900+ lines) - `train.py` - Training and evaluation script - `utils.py` - COLMAP utilities and helper functions - `visu.py` - 3D visualization tools using Open3D ### Data Processing - `generate_pcloud_dataset.py` - Dataset generation from HoHo25k - `create_pcloud()` - Multi-view point cloud fusion with semantic features ### Analysis - `find_best_results.py` - Hyperparameter optimization and result analysis - `color_visu.py` - Color legend generation for semantic classes ## 🔧 Technical Details ### Input Data Processing - **Multi-view RGB images** with camera poses from COLMAP - **Depth maps** fitted to COLMAP sparse reconstruction - **ADE20k segmentation** for building detection - **Gestalt segmentation** for architectural elements (roof, walls, windows, etc.) ### Feature Engineering - **11D Point Features**: xyz coordinates + rgb colors + semantic labels + multi-view consistency - **Patch Normalization**: Center patches at local centroids with 0.5-2.0m radius - **Data Augmentation**: Random rotation, translation, scaling, and noise injection ### Training Strategy - **Multi-task Learning**: Joint vertex position + confidence + classification prediction - **Combined Loss**: SmoothL1 (position) + SoftPlus (confidence) + BCE (classification) - **Optimization**: AdamW with cosine annealing, gradient clipping - **Regularization**: Dropout, weight decay, label smoothing ### Hyperparameter Optimization Our best configuration: - `vertex_threshold`: 0.59 - `edge_threshold`: 0.65 - `only_predicted_connections`: True ## 📊 Architecture Highlights ### FastPointNet Enhancements - **Residual Connections**: Improved gradient flow in deep networks - **Channel Attention**: Focus on important feature channels - **Multi-Scale Features**: Combine max and average pooling (0.7 + 0.3 weighting) - **Group Normalization**: Better stability for small batches - **Leaky ReLU**: Prevent dying neurons (negative_slope=0.01) ### Patch Processing Strategy - **Hierarchical Clustering**: Group points by spatial proximity - **Multi-View Consistency**: Aggregate features across camera views - **Semantic-Aware Sampling**: Prioritize building-relevant regions - **Edge-Aware Patches**: Generate candidate patches for all vertex pairs ## 🎨 Visualization The repository includes comprehensive 3D visualization tools: - **Point Cloud Rendering**: COLMAP reconstructions with semantic colors - **Wireframe Overlay**: Ground truth vs predicted wireframes - **Patch Visualization**: Local point cloud patches with predicted vertices - **Camera Frustums**: Multi-view camera poses and coverage ## 📈 Evaluation Metrics We evaluate using three key metrics: - **HSS (Half-Space Score)**: Measures spatial accuracy of vertex positions - **F1 Score**: Harmonic mean of precision and recall for edge detection - **IoU**: Intersection over Union for overall wireframe quality ## 📄 Citation Please cite our work if you use this code: ```bibtex TODO ``` ## 📋 Requirements - Python 3.8+ - PyTorch 1.12+ - CUDA-capable GPU (recommended) - OpenCV, NumPy, SciPy - Open3D (visualization) - PyVista (optional, for advanced visualization) - HuggingFace Datasets ## 📜 License Apache 2.0 - See LICENSE file for details. ## 🤝 Acknowledgments The research was supported by Czech Science Foundation Grant No. 24-10738M. The access to the computational infrastructure of the OP VVV funded project CZ.02.1.01/0.0/0.0/16_019/0000765 "Research Center for Informatics" is also gratefully acknowledged. We also acknowledge the support from the Student Grant Competition of the Czech Technical University in Prague, grant No. SGS23/173/OHK3/3T/13. Thanks to the S23DR Challenge organizers and the HoHo25k dataset creators for providing this excellent benchmark for 3D wireframe reconstruction research. --- For detailed technical description, please refer to our paper and the comprehensive code documentation throughout the repository.