Wargon Clothing Classifier

A Vision Transformer (ViT) based model for clothing classification, trained on secondhand clothing images. This model can classify 27 different types of clothing items with 73% accuracy.

Model Details

Model Description

This is a Vision Transformer model fine-tuned for clothing classification. It was developed to solve real-world clothing categorization challenges in secondhand fashion applications.

Developed by: Wargon Innovation
Model type: Image Classification
Language(s): N/A (Vision model)
License: Apache 2.0
Finetuned from model: google/vit-base-patch16-224

Model Sources

Repository: Wargon Innovation Clothing Dataset
Base Model: google/vit-base-patch16-224

Uses

Direct Use

This model can be used for:

Automatic clothing categorization in e-commerce
Fashion inventory management
Secondhand clothing marketplaces
Fashion recommendation systems

Downstream Use

The model can be fine-tuned for:

Specific clothing brand recognition
Size estimation from images
Style classification
Multi-label clothing attribute detection

How to Get Started with the Model

from transformers import AutoImageProcessor, AutoModelForImageClassification
from PIL import Image
import torch

# Load model and processor
processor = AutoImageProcessor.from_pretrained("wargoninnovation/wargon-clothing-classifier")
model = AutoModelForImageClassification.from_pretrained("wargoninnovation/wargon-clothing-classifier")

# Load and preprocess image
image = Image.open("path_to_clothing_image.jpg")
inputs = processor(image, return_tensors="pt")

# Make prediction
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)

# Get top prediction
predicted_class_id = predictions.argmax().item()

Training Details

Training Data

The model was trained on the wargoninnovation/clothingdatasetsecondhand dataset, which contains over 30,000 images of secondhand clothing items across 34+ categories.

Data Preprocessing:

Filtered classes with fewer than 10 samples to ensure robust train/validation splits
Final dataset contains 27 clothing categories
Images resized to 224x224 pixels
Stratified train/validation split (80/20)

Training Procedure

Preprocessing

Image Size: 224x224 pixels
Normalization: ImageNet statistics
Data Augmentation: Standard transformations applied

Training Hyperparameters

Training regime: Mixed precision (fp16)
Learning Rate: 2e-5
Batch Size: 16
Epochs: 6
Optimizer: AdamW
Weight Decay: 0.01
Warmup Steps: 500
Label Smoothing: 0.1

Hardware

GPU: NVIDIA RTX 3060 (12GB VRAM)
Training Time: ~1.5 hours

Evaluation

Testing Data, Factors & Metrics

The model was evaluated on a stratified validation set (20% of the filtered dataset).

Metrics

Validation Accuracy: 73.0%
F1 Score: 72.7%
Precision: 72.8%
Recall: 73.0%

Results

The model achieves balanced performance across major clothing categories, with particular strength in:

Common items (T-shirts, Jeans, Dresses)
Well-represented categories in the training data
Clean product photography (as in the training dataset)

Clothing Categories

The model can classify the following 27 clothing types:

Blazer
Blouse
Cardigan
Dress
Hoodie
Jacket
Jeans
Nightgown
Outerwear
Pajamas
Rain jacket
Rain trousers
Robe
Shirt
Shorts
Skirt
Sweater
T-shirt
Tank top
Tights
Top
Training top
Trousers
Tunic
Vest
Winter jacket
Winter trousers

Limitations and Bias

Limitations

Image Quality: Best performance on clean, well-lit product photos similar to training data
Background: Optimized for images with minimal background distractions
Viewpoint: Trained primarily on front-facing clothing images
Categories: Limited to the 27 categories present in training data

Bias

Data Source: Trained on secondhand clothing, may not generalize well to new/luxury items
Cultural Bias: Dataset may reflect specific regional fashion preferences
Class Imbalance: Some categories had limited representation even after filtering

Environmental Impact

Hardware Type: NVIDIA RTX 3060
Hours Used: ~1.5 hours training time
Cloud Provider: N/A (Local training)
Compute Region: Local

Technical Specifications

Model Architecture

Base: Vision Transformer (ViT-Base/16)
Parameters: ~86M parameters
Input Size: 224x224x3
Patch Size: 16x16
Number of Classes: 27

Software

Framework: PyTorch
Libraries: HuggingFace Transformers, Datasets
Training Libraries: Weights & Biases (W&B)

Citation

@misc{wargon_clothing_classifier_2024,
  title={Wargon Clothing Classifier: A Vision Transformer for Secondhand Fashion Classification},
  author={Wargon Innovation},
  year={2024},
  publisher={Hugging Face},
  howpublished={\url{https://huggingface.co/wargoninnovation/wargon-clothing-classifier}},
}

Model Card Authors

Wargon Innovation Team

Model Card Contact

For questions and feedback, please open an issue in the model repository or contact the Wargon Innovation team.

Downloads last month: 112

Safetensors

Model size

85.8M params

Tensor type

F32

Model tree for wargoninnovation/wargon-clothing-classifier

Base model

google/vit-base-patch16-224

Finetuned

(907)

this model

wargoninnovation
/

wargon-clothing-classifier