5-Point Sentiment Classifier (Longformer) β by spacesedan
A fine-tuned Longformer model for 5-point sentiment classification, optimized to analyze long-form user-generated content like Reddit posts. This model is ideal for understanding nuanced sentiment across a spectrum from very negative to very positive.
Labels
| Label Index | Sentiment |
|---|---|
| 0 | Very Negative |
| 1 | Negative |
| 2 | Neutral |
| 3 | Positive |
| 4 | Very Positive |
Datasets Used
This model was fine-tuned using a combination of diverse and reliable datasets:
GoEmotions by Google
β Converted 27 emotion labels into a 5-point sentiment scale.Amazon Reviews (fine-grained)
β Large-scale consumer review dataset with fine-grained sentiment labels.Kaggle: Twitter and Reddit Sentimental Analysis Dataset
β Adapted into a 3-class and eventually 5-class format for compatibility.
Training Configuration
| Setting | Value |
|---|---|
| Model Base | Longformer (4096) |
| Max Sequence Length | 1024 tokens |
| Epochs | 4 |
| Batch Size | 8 |
| Gradient Accumulation | 4 |
| Optimizer | adamw_torch |
| Learning Rate | 2e-5 |
| Scheduler | Linear |
| Mixed Precision | FP16 |
| Weight Decay | 0.01 |
| Warmup Proportion | 0.1 |
| Early Stopping | patience=5, threshold=0.01 |
Final Evaluation Metrics
| Metric | Score |
|---|---|
| Accuracy | 0.671 |
| F1 Score (Macro) | 0.642 |
| F1 Score (Weighted) | 0.673 |
| Precision (Macro) | 0.642 |
| Recall (Macro) | 0.646 |
| Loss | 0.882 |
Use Cases
- Tracking sentiment across Reddit posts, especially for news or trending headlines.
- Analyzing long-form product reviews.
- Building a sentiment dashboard for user forums or blogs.
Limitations
- Model is trained on English text only.
- Sentiment can be subjective, especially across edge cases (e.g., sarcasm or dark humor).
- 5-class mapping from GoEmotions is heuristic and might introduce some overlap.
Acknowledgements
Special thanks to the original dataset creators:
- Google (GoEmotions)
- Yassir Acharki (Amazon Reviews fine-grained)
- Charan Gowda et al. (Kaggle Reddit/Twitter Sentiment Dataset)
License
This model is available under the same license as the base model (Longformer) and is intended for research and educational use.
β Created and maintained by spacesedan
- Downloads last month
- 95
Model tree for spacesedan/reddit-sentiment-analysis-longformer
Base model
spacesedan/autotrain-iz7hp-zi6ki