20-Emotion Text Classification Model
A deep learning model for fine-grained emotion classification that can detect 20 distinct emotions in text with high accuracy.
Model Description
This model uses a combination of Word2Vec embeddings and a Neural Network classifier to identify emotions in text. Unlike simple sentiment analysis (positive/negative), this model can distinguish between 20 different emotional states, providing nuanced understanding of emotional content.
Architecture
Embedding Layer: Word2Vec (100-dimensional vectors)
- Trained on 79,595 emotion-labeled sentences
- Optimized model size: 2.9MB
Classifier: Feedforward Neural Network
- Input: Sentence embeddings (mean-pooled word vectors)
- Hidden layers with dropout for regularization
- Output: 20-class softmax classification
- Model size: 111KB
20 Emotions Detected
The model can classify text into these 20 emotions:
- Happiness
- Sadness
- Fear
- Anger
- Disgust
- Surprise
- Love
- Excitement
- Embarrassment
- Loneliness
- Anxiety
- Frustration
- Guilt
- Disappointment
- Jealousy
- Gratitude
- Pride
- Relief
- Hope
- Confusion
Usage
Installation
pip install tensorflow gensim nltk numpy scikit-learn
Quick Start
import numpy as np
from tensorflow import keras
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize
import pickle
import re
from huggingface_hub import hf_hub_download
# Download model files
model_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
filename="best_model.keras")
w2v_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
filename="word2vec_optimized.model")
encoder_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
filename="label_encoder.pkl")
# Load models
w2v_model = Word2Vec.load(w2v_path)
classifier = keras.models.load_model(model_path, compile=False)
with open(encoder_path, 'rb') as f:
label_encoder = pickle.load(f)
# Preprocessing function
def preprocess_text(text):
text = str(text).lower()
text = re.sub(r'http\S+|www\S+|https\S+', '', text)
text = re.sub(r'@\w+', '', text)
text = re.sub(r'#\w+', '', text)
harmful_punctuation = '"#$%&()*+-/:;<=>@[\\]^_`{|}~'
text = text.translate(str.maketrans('', '', harmful_punctuation))
text = re.sub(r'\s+', ' ', text).strip()
return text
# Sentence to vector
def sentence_to_vector(sentence, w2v_model):
words = word_tokenize(sentence.lower())
word_vectors = [w2v_model.wv[word] for word in words if word in w2v_model.wv]
if len(word_vectors) == 0:
return np.zeros(w2v_model.wv.vector_size)
return np.mean(word_vectors, axis=0)
# Prediction function
def predict_emotion(text, top_k=5):
# Preprocess
cleaned = preprocess_text(text)
# Convert to vector
vector = sentence_to_vector(cleaned, w2v_model).reshape(1, -1)
# Predict
probs = classifier.predict(vector, verbose=0)[0]
# Get top-k predictions
top_indices = np.argsort(probs)[-top_k:][::-1]
results = []
for idx in top_indices:
emotion = label_encoder.inverse_transform([idx])[0]
confidence = float(probs[idx])
results.append({
'emotion': emotion,
'confidence': confidence,
'percentage': round(confidence * 100, 1)
})
return results
# Example usage
text = "I'm so excited about this amazing opportunity!"
predictions = predict_emotion(text)
print(f"Text: {text}")
print("\nTop predictions:")
for pred in predictions:
print(f" {pred['emotion']}: {pred['percentage']}%")
Output Example
Text: I'm so excited about this amazing opportunity!
Top predictions:
excitement: 78.5%
happiness: 12.3%
hope: 4.2%
gratitude: 2.8%
pride: 2.2%
Training Data
This model was trained on the emotion-dataset-20-emotions dataset, which contains:
- 79,595 sentences with emotion labels
- 20 balanced emotion categories
- Synthetically generated using advanced language models
- Cleaned and preprocessed text
Performance
The model achieves strong performance across all 20 emotion categories:
- Training accuracy: ~95%
- Balanced emotion distribution: Each emotion well-represented
- Fast inference: < 100ms per prediction on CPU
Strengths
- Can distinguish between subtle emotional differences (e.g., anxiety vs. fear, disappointment vs. sadness)
- Works well with everyday conversational language
- Lightweight and fast inference
- No external API calls required
Limitations
- English only: Currently supports only English text
- Synthetic training data: May not capture all real-world emotional expressions
- Single emotion: Assigns one primary emotion (though provides confidence scores for others)
- Context-dependent: May struggle with sarcasm, irony, or culturally-specific expressions
- Short text optimized: Best performance on sentence-level text (10-50 words)
Use Cases
This model is ideal for:
- Mental Health Apps: Detect emotional states in user journals or messages
- Customer Service: Analyze customer sentiment in support tickets and feedback
- Social Media Analytics: Understand emotional tone of posts and comments
- Chatbots: Enable emotion-aware conversational AI
- Content Moderation: Flag content expressing concerning emotions
- UX Research: Analyze user feedback and reviews for emotional insights
- Educational Tools: Help students identify and understand emotions in text
Model Files
- best_model.keras (111KB): Neural network classifier
- word2vec_optimized.model (2.9MB): Word2Vec embeddings
- label_encoder.pkl (457B): Label encoder for emotion categories
Technical Details
Preprocessing Pipeline
- Lowercase conversion
- URL removal
- Mention/hashtag removal
- Special character removal
- Whitespace normalization
Inference Pipeline
- Text preprocessing
- Tokenization (NLTK word_tokenize)
- Word vector lookup
- Mean pooling of word vectors
- Neural network classification
- Softmax probability output
Dependencies
tensorflow>=2.13.0
gensim>=4.3.0
nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.3.0
Ethical Considerations
Responsible Use
- This model should complement, not replace human judgment in sensitive applications
- Emotion detection has limitations and may not always be accurate
- Consider privacy implications when analyzing personal communications
- Be aware of potential biases in synthetic training data
Not Recommended For
- Clinical mental health diagnosis
- Legal or law enforcement decisions
- Employment decisions
- Automated content removal without human review
Bias Considerations
- The model was trained on synthetically generated data, which may not represent all demographic groups equally
- Emotional expression varies across cultures, age groups, and contexts
- The model may perform differently on various writing styles and dialects
Citation
If you use this model in your research or applications, please cite:
@model{emotion_classifier_20_2025,
author = {Shreyas Pulle},
title = {20-Emotion Text Classification Model},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/shreyaspulle98/emotion-classifier-20-emotions}
}
Dataset
The training dataset is available at: shreyaspulle98/emotion-dataset-20-emotions
License
This model is released under the MIT License. You are free to use, modify, and distribute this model for commercial and non-commercial purposes.
Contact
- HuggingFace: @shreyaspulle98
- Model Repository: emotion-classifier-20-emotions
Acknowledgments
- Training data generated using DeepInfra API
- Built with TensorFlow/Keras and Gensim
- Inspired by advances in emotion AI and affective computing
Try it out! Test the model with your own text and explore the 20 emotions it can detect.
- Downloads last month
- 46