--- language: - en license: mit library_name: autogluon tags: - image-classification - computer-vision - automl - binary-classification - autogluon datasets: - scottymcgee/duo-image-dataset metrics: - accuracy - f1 model-index: - name: duo-image-automl-predictor results: - task: type: image-classification name: Binary Image Classification dataset: type: scottymcgee/duo-image-dataset name: Duo Image Dataset split: test metrics: - type: accuracy value: 0.95 name: Test Accuracy (Original Data) - type: f1 value: 0.95 name: Test F1-Score --- # Model Card for Duo Image Classification AutoML Binary image classifier trained using AutoGluon's AutoML framework with neural network architecture search across multiple CNN backbones. ## Model Details ### Model Description This model performs binary image classification using automated machine learning (AutoML) to select the optimal CNN architecture. The model was trained through systematic comparison of ResNet, EfficientNet, and MobileNet variants, with selection based on performance on unseen original data rather than augmented validation data. - **Developed by:** [Your Name/Institution] - **Model type:** Convolutional Neural Network (CNN) via AutoML - **Language(s):** English (image classification - not NLP) - **License:** MIT - **Finetuned from model:** Pre-trained TIMM models (various CNN architectures) ### Model Sources - **Repository:** https://huggingface.co/maryzhang/24679-image-automl-nn-duo-predictor - **Dataset:** https://huggingface.co/datasets/scottymcgee/duo-image-dataset ## Uses ### Direct Use This model classifies images into two binary categories based on the duo-image dataset. Intended applications include: - Educational demonstrations of AutoML techniques - Binary image classification for similar visual domains - Benchmarking CNN architecture performance on binary tasks ### Downstream Use The model could potentially be fine-tuned for related binary image classification tasks, though performance would depend on visual similarity to the training domain. ### Out-of-Scope Use - Production systems requiring high reliability without further validation - Multi-class classification without retraining - Classification of images significantly different from the training domain - Applications where model explainability is critical - Safety-critical applications ## Bias, Risks, and Limitations **Methodological Concerns:** - Validation accuracy approached 100% across multiple architectures, suggesting potential data leakage or overly simplistic task - Model may have learned to recognize augmentation artifacts rather than true visual features - High performance across diverse architectures indicates the classification task may not represent real-world complexity **Technical Limitations:** - Trained on a specific image domain with limited diversity - Binary classification only - Performance heavily dependent on synthetic data augmentation - Potential overfitting to augmented data patterns ### Recommendations Users should focus on the test set performance (95% accuracy) rather than validation metrics. The model should be validated on additional diverse datasets before real-world deployment. Results suggest the dataset may not represent a challenging classification problem, limiting generalizability. ## How to Get Started with the Model ```python import cloudpickle from huggingface_hub import hf_hub_download import pandas as pd # Download and load model model_path = hf_hub_download( repo_id="maryzhang/24679-image-automl-nn-duo-predictor", filename="autogluon_best_image_predictor.pkl" ) with open(model_path, "rb") as f: predictor = cloudpickle.load(f) # Prepare data (DataFrame with 'image' column containing file paths) test_data = pd.DataFrame({'image': ['path/to/your/image.jpg']}) # Make predictions predictions = predictor.predict(test_data) probabilities = predictor.predict_proba(test_data) print(f"Prediction: {predictions[0]}") print(f"Probabilities: {probabilities.iloc[0].to_dict()}") ``` ## Training Details ### Training Data **Dataset:** scottymcgee/duo-image-dataset - Training split: 70% of total data (augmented subset) - Validation split: 20% of augmented data - Test split: 30% of total data (original, non-augmented images) - **Problem type:** Binary classification - **Preprocessing:** Images materialized to disk, automatic preprocessing by AutoGluon ### Training Procedure #### Preprocessing Images were extracted from the Hugging Face dataset format and saved to disk as required by AutoGluon MultiModalPredictor. AutoGluon handled all image preprocessing automatically based on the selected CNN architecture. #### Training Hyperparameters - **AutoML Framework:** AutoGluon MultiModalPredictor - **Preset:** medium_quality - **Time budget:** 10 minutes per architecture (40 minutes total) - **Architectures tested:** ResNet18, ResNet34, EfficientNet-B0, MobileNetV3-Small - **Model selection:** Based on performance on original (non-augmented) test data - **Training regime:** Mixed precision (handled automatically by AutoGluon) #### Speeds, Sizes, Times - **Total training time:** ~40 minutes across 4 architectures - **Architecture evaluation:** 10 minutes per CNN variant - **Model selection:** Automatic based on test performance - **Hardware:** Single GPU training environment ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data **Primary evaluation:** Original dataset (30% of total data, non-augmented) **Secondary evaluation:** Augmented validation set (20% of augmented data) The original dataset represents the true test performance as it contains unmodified images not seen during training. #### Factors Evaluation considered both augmented and original data performance to detect potential overfitting to synthetic augmentation patterns. #### Metrics - **Primary:** Accuracy (appropriate for balanced binary classification) - **Secondary:** F1-score (weighted and binary) - **Analysis:** Per-class precision and recall via classification reports ### Results #### Test Set Performance (Original Data) - Primary Metric - **Accuracy:** 95.0% - **Weighted F1:** 95.0% - **Binary F1:** 96.0% **Per-class breakdown:** - Class 0: Precision=0.93, Recall=0.93, F1=0.93 - Class 1: Precision=0.96, Recall=0.96, F1=0.96 #### Validation Performance (Augmented Data) - **Accuracy:** ~100% (concerning - see limitations) #### Summary The model achieves strong performance on original test data (95% accuracy) but showed unrealistically high validation performance across multiple architectures. This pattern suggests the validation methodology may have issues, making the test set results more trustworthy for assessing real-world performance. ## Environmental Impact Training was conducted efficiently using AutoGluon's automated approach with a limited time budget. ## Technical Specifications ### Model Architecture and Objective **Best Architecture:** [To be updated based on AutoML results] - **Input:** RGB images (automatic resizing by AutoGluon) - **Output:** Binary classification probabilities - **Backbone:** Pre-trained TIMM model selected via AutoML - **Objective:** Cross-entropy loss for binary classification ### Compute Infrastructure #### Hardware - GPU-accelerated training (CUDA compatible) - Sufficient memory for batch processing of images - Standard Google Colab environment #### Software - **AutoGluon:** 1.4.0+ - **Python:** 3.7+ - **PyTorch:** Latest compatible version - **TIMM:** For pre-trained CNN backbones - **Dependencies:** pandas, scikit-learn, PIL/Pillow ## Citation **BibTeX:** ```bibtex @model{duo_image_automl_2024, title={Duo Image Classification via AutoML Neural Architecture Search}, author={[Your Name]}, year={2024}, url={https://huggingface.co/maryzhang/24679-image-automl-nn-duo-predictor}, note={Educational AutoML demonstration with methodological considerations} } ``` **Dataset Citation:** ```bibtex @dataset{scottymcgee_duo_dataset, title={Duo Image Dataset}, author={Scotty McGee}, year={2024}, url={https://huggingface.co/datasets/scottymcgee/duo-image-dataset} } ``` ## More Information This model was developed as part of an educational assignment to explore AutoML techniques for neural network architecture selection in computer vision. The concerning validation results (near-perfect accuracy across architectures) highlight important lessons about: - Proper experimental design in machine learning - The importance of realistic evaluation methodologies - Potential pitfalls in synthetic data augmentation - The value of honest reporting in academic work The 95% test accuracy represents a more realistic assessment of model performance and should be the primary metric for evaluating this work. ## Model Card Authors Mary Zhang ## Model Card Contact maryzhan@andrew.cmu.edu ## AI Usage Claude used to edit functions and debug code