Wargon Clothing Classifier

A Vision Transformer (ViT) based model for clothing classification, trained on secondhand clothing images. This model can classify 27 different types of clothing items with 73% accuracy.

Model Details

Model Description

This is a Vision Transformer model fine-tuned for clothing classification. It was developed to solve real-world clothing categorization challenges in secondhand fashion applications.

  • Developed by: Wargon Innovation
  • Model type: Image Classification
  • Language(s): N/A (Vision model)
  • License: Apache 2.0
  • Finetuned from model: google/vit-base-patch16-224

Model Sources

Uses

Direct Use

This model can be used for:

  • Automatic clothing categorization in e-commerce
  • Fashion inventory management
  • Secondhand clothing marketplaces
  • Fashion recommendation systems

Downstream Use

The model can be fine-tuned for:

  • Specific clothing brand recognition
  • Size estimation from images
  • Style classification
  • Multi-label clothing attribute detection

How to Get Started with the Model

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

# Load model and processor
processor = AutoImageProcessor.from_pretrained("wargoninnovation/wargon-clothing-classifier")
model = AutoModelForImageClassification.from_pretrained("wargoninnovation/wargon-clothing-classifier")

# Load and preprocess image
image = Image.open("path_to_clothing_image.jpg")
inputs = processor(image, return_tensors="pt")

# Make prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get top prediction
predicted_class_id = predictions.argmax().item()

Training Details

Training Data

The model was trained on the wargoninnovation/clothingdatasetsecondhand dataset, which contains over 30,000 images of secondhand clothing items across 34+ categories.

Data Preprocessing:

  • Filtered classes with fewer than 10 samples to ensure robust train/validation splits
  • Final dataset contains 27 clothing categories
  • Images resized to 224x224 pixels
  • Stratified train/validation split (80/20)

Training Procedure

Preprocessing

  • Image Size: 224x224 pixels
  • Normalization: ImageNet statistics
  • Data Augmentation: Standard transformations applied

Training Hyperparameters

  • Training regime: Mixed precision (fp16)
  • Learning Rate: 2e-5
  • Batch Size: 16
  • Epochs: 6
  • Optimizer: AdamW
  • Weight Decay: 0.01
  • Warmup Steps: 500
  • Label Smoothing: 0.1

Hardware

  • GPU: NVIDIA RTX 3060 (12GB VRAM)
  • Training Time: ~1.5 hours

Evaluation

Testing Data, Factors & Metrics

The model was evaluated on a stratified validation set (20% of the filtered dataset).

Metrics

  • Validation Accuracy: 73.0%
  • F1 Score: 72.7%
  • Precision: 72.8%
  • Recall: 73.0%

Results

The model achieves balanced performance across major clothing categories, with particular strength in:

  • Common items (T-shirts, Jeans, Dresses)
  • Well-represented categories in the training data
  • Clean product photography (as in the training dataset)

Clothing Categories

The model can classify the following 27 clothing types:

  1. Blazer
  2. Blouse
  3. Cardigan
  4. Dress
  5. Hoodie
  6. Jacket
  7. Jeans
  8. Nightgown
  9. Outerwear
  10. Pajamas
  11. Rain jacket
  12. Rain trousers
  13. Robe
  14. Shirt
  15. Shorts
  16. Skirt
  17. Sweater
  18. T-shirt
  19. Tank top
  20. Tights
  21. Top
  22. Training top
  23. Trousers
  24. Tunic
  25. Vest
  26. Winter jacket
  27. Winter trousers

Limitations and Bias

Limitations

  • Image Quality: Best performance on clean, well-lit product photos similar to training data
  • Background: Optimized for images with minimal background distractions
  • Viewpoint: Trained primarily on front-facing clothing images
  • Categories: Limited to the 27 categories present in training data

Bias

  • Data Source: Trained on secondhand clothing, may not generalize well to new/luxury items
  • Cultural Bias: Dataset may reflect specific regional fashion preferences
  • Class Imbalance: Some categories had limited representation even after filtering

Environmental Impact

  • Hardware Type: NVIDIA RTX 3060
  • Hours Used: ~1.5 hours training time
  • Cloud Provider: N/A (Local training)
  • Compute Region: Local

Technical Specifications

Model Architecture

  • Base: Vision Transformer (ViT-Base/16)
  • Parameters: ~86M parameters
  • Input Size: 224x224x3
  • Patch Size: 16x16
  • Number of Classes: 27

Software

  • Framework: PyTorch
  • Libraries: HuggingFace Transformers, Datasets
  • Training Libraries: Weights & Biases (W&B)

Citation

@misc{wargon_clothing_classifier_2024,
  title={Wargon Clothing Classifier: A Vision Transformer for Secondhand Fashion Classification},
  author={Wargon Innovation},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/wargoninnovation/wargon-clothing-classifier}},
}

Model Card Authors

Wargon Innovation Team

Model Card Contact

For questions and feedback, please open an issue in the model repository or contact the Wargon Innovation team.

Downloads last month
112
Safetensors
Model size
85.8M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wargoninnovation/wargon-clothing-classifier

Finetuned
(907)
this model

Dataset used to train wargoninnovation/wargon-clothing-classifier