Wargon Clothing Classifier
A Vision Transformer (ViT) based model for clothing classification, trained on secondhand clothing images. This model can classify 27 different types of clothing items with 73% accuracy.
Model Details
Model Description
This is a Vision Transformer model fine-tuned for clothing classification. It was developed to solve real-world clothing categorization challenges in secondhand fashion applications.
- Developed by: Wargon Innovation
- Model type: Image Classification
- Language(s): N/A (Vision model)
- License: Apache 2.0
- Finetuned from model: google/vit-base-patch16-224
Model Sources
- Repository: Wargon Innovation Clothing Dataset
- Base Model: google/vit-base-patch16-224
Uses
Direct Use
This model can be used for:
- Automatic clothing categorization in e-commerce
- Fashion inventory management
- Secondhand clothing marketplaces
- Fashion recommendation systems
Downstream Use
The model can be fine-tuned for:
- Specific clothing brand recognition
- Size estimation from images
- Style classification
- Multi-label clothing attribute detection
How to Get Started with the Model
from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch
# Load model and processor
processor = AutoImageProcessor.from_pretrained("wargoninnovation/wargon-clothing-classifier")
model = AutoModelForImageClassification.from_pretrained("wargoninnovation/wargon-clothing-classifier")
# Load and preprocess image
image = Image.open("path_to_clothing_image.jpg")
inputs = processor(image, return_tensors="pt")
# Make prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
# Get top prediction
predicted_class_id = predictions.argmax().item()
Training Details
Training Data
The model was trained on the wargoninnovation/clothingdatasetsecondhand dataset, which contains over 30,000 images of secondhand clothing items across 34+ categories.
Data Preprocessing:
- Filtered classes with fewer than 10 samples to ensure robust train/validation splits
- Final dataset contains 27 clothing categories
- Images resized to 224x224 pixels
- Stratified train/validation split (80/20)
Training Procedure
Preprocessing
- Image Size: 224x224 pixels
- Normalization: ImageNet statistics
- Data Augmentation: Standard transformations applied
Training Hyperparameters
- Training regime: Mixed precision (fp16)
- Learning Rate: 2e-5
- Batch Size: 16
- Epochs: 6
- Optimizer: AdamW
- Weight Decay: 0.01
- Warmup Steps: 500
- Label Smoothing: 0.1
Hardware
- GPU: NVIDIA RTX 3060 (12GB VRAM)
- Training Time: ~1.5 hours
Evaluation
Testing Data, Factors & Metrics
The model was evaluated on a stratified validation set (20% of the filtered dataset).
Metrics
- Validation Accuracy: 73.0%
- F1 Score: 72.7%
- Precision: 72.8%
- Recall: 73.0%
Results
The model achieves balanced performance across major clothing categories, with particular strength in:
- Common items (T-shirts, Jeans, Dresses)
- Well-represented categories in the training data
- Clean product photography (as in the training dataset)
Clothing Categories
The model can classify the following 27 clothing types:
- Blazer
- Blouse
- Cardigan
- Dress
- Hoodie
- Jacket
- Jeans
- Nightgown
- Outerwear
- Pajamas
- Rain jacket
- Rain trousers
- Robe
- Shirt
- Shorts
- Skirt
- Sweater
- T-shirt
- Tank top
- Tights
- Top
- Training top
- Trousers
- Tunic
- Vest
- Winter jacket
- Winter trousers
Limitations and Bias
Limitations
- Image Quality: Best performance on clean, well-lit product photos similar to training data
- Background: Optimized for images with minimal background distractions
- Viewpoint: Trained primarily on front-facing clothing images
- Categories: Limited to the 27 categories present in training data
Bias
- Data Source: Trained on secondhand clothing, may not generalize well to new/luxury items
- Cultural Bias: Dataset may reflect specific regional fashion preferences
- Class Imbalance: Some categories had limited representation even after filtering
Environmental Impact
- Hardware Type: NVIDIA RTX 3060
- Hours Used: ~1.5 hours training time
- Cloud Provider: N/A (Local training)
- Compute Region: Local
Technical Specifications
Model Architecture
- Base: Vision Transformer (ViT-Base/16)
- Parameters: ~86M parameters
- Input Size: 224x224x3
- Patch Size: 16x16
- Number of Classes: 27
Software
- Framework: PyTorch
- Libraries: HuggingFace Transformers, Datasets
- Training Libraries: Weights & Biases (W&B)
Citation
@misc{wargon_clothing_classifier_2024,
  title={Wargon Clothing Classifier: A Vision Transformer for Secondhand Fashion Classification},
  author={Wargon Innovation},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/wargoninnovation/wargon-clothing-classifier}},
}
Model Card Authors
Wargon Innovation Team
Model Card Contact
For questions and feedback, please open an issue in the model repository or contact the Wargon Innovation team.
- Downloads last month
- 112
Model tree for wargoninnovation/wargon-clothing-classifier
Base model
google/vit-base-patch16-224