--- library_name: transformers tags: - finance license: mit datasets: - Recompense/amazon-appliances-lite-data language: - en base_model: - meta-llama/Llama-3.1-8B-Instruct --- # Model Card for Model ID Predicts Prices based on product description. ## Model Details ### Model Description This model predicts prices of amazon aplliances data based on a product description - **Developed by:** https://huggingface.co/Recompense - **Model type:** Transformer (causal, autoregressive) - **Language(s) (NLP):** English - **License:** MIT - **Finetuned from model:** meta-llama/Llama-3.1-8B-Instruct ### Model Sources - **Repository:** [https://huggingface.co/Recompense/Midas-pricer/] ## Uses - Primary use case: Generating estimated retail prices for household appliances from textual descriptions. - Example applications: Assisting e-commerce teams in setting competitive price points Supporting market analysis dashboards with on-the-fly price estimates - Not intended for: Financial advice or investment decisions ### Out-of-Scope Use - Attempting to predict prices outside the appliances domain (e.g., electronics, furniture, vehicles) will likely yield unreliable results. - Using this model for any price-sensitive or regulatory decisions without human oversight is discouraged. ## Bias, Risks, and Limitations - Data biases: The training dataset is drawn exclusively from Amazon appliance listings. Price distributions are skewed toward mid-range consumer electronics; extreme low or high‐end appliances are underrepresented. - Input sensitivity: Minor changes in phrasing or additional noisy tokens can shift predictions noticeably. - Generalization: The model does not understand supply chain disruptions, seasonality, or promotions—it only captures patterns seen in historical listing data. ### Recommendations - Always validate model outputs against a small set of ground-truth prices before production deployment. - Use this model as an assistant, not an oracle: incorporate downstream business rules or domain expertise. - Regularly retrain or fine-tune on updated listing data to capture shifting market trends. ## How to Get Started with the Model ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch # Load tokenizer and model tokenizer = AutoTokenizer.from_pretrained("Recompense/Midas-pricer") model = AutoModelForCausalLM.from_pretrained("Recompense/Midas-pricer", torch_dtype=torch.bfloat16) # Prepare prompt product_desc = "How much does this cost to the nearest dollar?\n\nSamsung 7kg top-load washing machine with digital inverter motor" prompt = f"{product_desc}\n\nPrice is $" # Tokenize and generate inputs = tokenizer(prompt, return_tensors="pt") attention_mask = torch.ones(inputs.shape, device="cuda") generated = model.generate(inputs, attention_mask=attention_mask, max_new_tokens=3, num_return_sequences=1) price_text = tokenizer.decode(generated[0], skip_special_tokens=True) print(f"Estimated price: ${price_text}") ``` ## Training Details ### Training Data - Dataset: Recompense/amazon-appliances-lite-data - Train/validation/test split: 80/10/10 ### Training Procedure #### Training Hyperparameters - Fine-tuning framework: PyTorch + Hugging Face Accelerate - Precision: bf16 mixed precision - Batch size: 1 sequence - Learning rate: 1e-5 with linear warmup (10% of total steps) - Optimizer: AdamW ## Evaluation ### Testing Data, Factors & Metrics - Test set: Held-out 10% of listings (≈5 000 examples) - Metric: Root Mean Squared Logarithmic Error (RMSLE) - Hit@$40: Percentage of predictions within ±$40 of true price | Metric | Value | | -------- | ------ | | RMSLE | 0.61 | | Hit@\$40 | 85.2 % | #### Summary The model achieves an RMSLE of 0.61, indicating good alignment between predicted and actual prices on a log scale, and correctly estimates within $40 in over 85% of test cases. This performance is competitive for rapid prototyping in price-sensitive applications. ## Environmental Impact - Approximate compute emissions for fine-tuning (using ML CO₂ impact calculator): - Hardware: Tesla T4 - Duration: 2 hours(0.06 epoch) - Cloud provider: Google Cloud, region US-Central - Estimated CO₂ emitted: 6 kg CO₂e ## Technical Specifications ## Model Architecture - Base model: Llama-3.1-8B (8 billion parameters) - Objective: Autoregressive language modeling with instruction tuning ## Compute Infrastructure - Hardware: 4× Tesla T4 GPUs - Software: PyTorch 2.x transformers 5.x accelerate 1.x bitsandbytes (for 8-bit quantization optional inference) ## Glossary - RMSLE (Root Mean Squared Logarithmic Error): Measures the square root of the average squared difference between log-transformed predictions and targets. Less sensitive to large absolute errors. - Hit@$40: Fraction of predictions whose absolute error is ≤ $40. ## Model Card Authors Damola Jimoh(Recompense)