--- language: en license: apache-2.0 tags: - product-classification - transformers - pytorch - distilbert datasets: - lokeshparab/amazon-products-dataset model-index: - name: Product Classifier B2 results: [] --- # Product Classifier B2 Tento model slouží k predikci kategorií produktů na základě jejich názvu nebo popisu... # 🏍️ Amazon Product Classifier (Balanced B2) This is a fine-tuned DistilBERT model for **multi-class classification** of product titles into Amazon-like product categories. The model is based on `distilbert-base-uncased` and was trained on a **balanced subset** of the Amazon Products dataset. ## 🧠 Model Architecture - Base: `distilbert-base-uncased` (6-layer, 768 hidden size) - Classification Head: 2 dense layers with dropout + ReLU - Output: softmax over 19 product categories ## 📊 Training Data The model was trained on a balanced subset (≈40k samples) of the [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset), which contains product titles and their corresponding categories. Preprocessing included: - Removing empty/missing titles - Keeping top-level categories only - Balancing the dataset to avoid category bias ## 🍿 Example Categories - beauty & health - home & kitchen - tv, audio & cameras - computers & accessories - clothing & accessories - appliances - sports & fitness - grocery & gourmet foods - ... (total 19) ## 🧪 Example Usage (Python) ```python from transformers import pipeline classifier = pipeline("text-classification", model="your-username/product-classifier-model-B2") result = classifier("Smartwatch with heart rate monitor and GPS tracking") print(result) # [{'label': 'stores', 'score': 0.94}] ``` ## 🚀 Intended Use The model is designed to help developers quickly classify product titles into e-commerce categories, useful for: - Auto-tagging items in online stores - Cleaning and organizing product catalogs - Building recommendation engines (in combination with embeddings) ## 📌 Limitations - English-only (trained on `distilbert-base-uncased`) - May not perform well on very short or ambiguous product names - Not suitable for legal/medical/financial applications ## 📄 License & Source - Model: MIT License - Training Data: [Amazon Products Dataset](https://www.kaggle.com/datasets/lokeshparab/amazon-products-dataset) on Kaggle (check license and attribution requirements on Kaggle page)