20-Emotion Text Classification Model

A deep learning model for fine-grained emotion classification that can detect 20 distinct emotions in text with high accuracy.

Model Description

This model uses a combination of Word2Vec embeddings and a Neural Network classifier to identify emotions in text. Unlike simple sentiment analysis (positive/negative), this model can distinguish between 20 different emotional states, providing nuanced understanding of emotional content.

Architecture

Embedding Layer: Word2Vec (100-dimensional vectors)
- Trained on 79,595 emotion-labeled sentences
- Optimized model size: 2.9MB
Classifier: Feedforward Neural Network
- Input: Sentence embeddings (mean-pooled word vectors)
- Hidden layers with dropout for regularization
- Output: 20-class softmax classification
- Model size: 111KB

20 Emotions Detected

The model can classify text into these 20 emotions:

Happiness
Sadness
Fear
Anger
Disgust
Surprise
Love
Excitement
Embarrassment
Loneliness
Anxiety
Frustration
Guilt
Disappointment
Jealousy
Gratitude
Pride
Relief
Hope
Confusion

Usage

Installation

pip install tensorflow gensim nltk numpy scikit-learn

Quick Start

import numpy as np
from tensorflow import keras
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize
import pickle
import re
from huggingface_hub import hf_hub_download

# Download model files
model_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
                              filename="best_model.keras")
w2v_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
                           filename="word2vec_optimized.model")
encoder_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
                               filename="label_encoder.pkl")

# Load models
w2v_model = Word2Vec.load(w2v_path)
classifier = keras.models.load_model(model_path, compile=False)
with open(encoder_path, 'rb') as f:
    label_encoder = pickle.load(f)

# Preprocessing function
def preprocess_text(text):
    text = str(text).lower()
    text = re.sub(r'http\S+|www\S+|https\S+', '', text)
    text = re.sub(r'@\w+', '', text)
    text = re.sub(r'#\w+', '', text)
    harmful_punctuation = '"#$%&()*+-/:;<=>@[\\]^_`{|}~'
    text = text.translate(str.maketrans('', '', harmful_punctuation))
    text = re.sub(r'\s+', ' ', text).strip()
    return text

# Sentence to vector
def sentence_to_vector(sentence, w2v_model):
    words = word_tokenize(sentence.lower())
    word_vectors = [w2v_model.wv[word] for word in words if word in w2v_model.wv]
    if len(word_vectors) == 0:
        return np.zeros(w2v_model.wv.vector_size)
    return np.mean(word_vectors, axis=0)

# Prediction function
def predict_emotion(text, top_k=5):
    # Preprocess
    cleaned = preprocess_text(text)

    # Convert to vector
    vector = sentence_to_vector(cleaned, w2v_model).reshape(1, -1)

    # Predict
    probs = classifier.predict(vector, verbose=0)[0]

    # Get top-k predictions
    top_indices = np.argsort(probs)[-top_k:][::-1]

    results = []
    for idx in top_indices:
        emotion = label_encoder.inverse_transform([idx])[0]
        confidence = float(probs[idx])
        results.append({
            'emotion': emotion,
            'confidence': confidence,
            'percentage': round(confidence * 100, 1)
        })

    return results

# Example usage
text = "I'm so excited about this amazing opportunity!"
predictions = predict_emotion(text)

print(f"Text: {text}")
print("\nTop predictions:")
for pred in predictions:
    print(f"  {pred['emotion']}: {pred['percentage']}%")

Output Example

Text: I'm so excited about this amazing opportunity!

Top predictions:
  excitement: 78.5%
  happiness: 12.3%
  hope: 4.2%
  gratitude: 2.8%
  pride: 2.2%

Training Data

This model was trained on the emotion-dataset-20-emotions dataset, which contains:

79,595 sentences with emotion labels
20 balanced emotion categories
Synthetically generated using advanced language models
Cleaned and preprocessed text

Performance

The model achieves strong performance across all 20 emotion categories:

Training accuracy: ~95%
Balanced emotion distribution: Each emotion well-represented
Fast inference: < 100ms per prediction on CPU

Strengths

Can distinguish between subtle emotional differences (e.g., anxiety vs. fear, disappointment vs. sadness)
Works well with everyday conversational language
Lightweight and fast inference
No external API calls required

Limitations

English only: Currently supports only English text
Synthetic training data: May not capture all real-world emotional expressions
Single emotion: Assigns one primary emotion (though provides confidence scores for others)
Context-dependent: May struggle with sarcasm, irony, or culturally-specific expressions
Short text optimized: Best performance on sentence-level text (10-50 words)

Use Cases

This model is ideal for:

Mental Health Apps: Detect emotional states in user journals or messages
Customer Service: Analyze customer sentiment in support tickets and feedback
Social Media Analytics: Understand emotional tone of posts and comments
Chatbots: Enable emotion-aware conversational AI
Content Moderation: Flag content expressing concerning emotions
UX Research: Analyze user feedback and reviews for emotional insights
Educational Tools: Help students identify and understand emotions in text

Model Files

best_model.keras (111KB): Neural network classifier
word2vec_optimized.model (2.9MB): Word2Vec embeddings
label_encoder.pkl (457B): Label encoder for emotion categories

Technical Details

Preprocessing Pipeline

Lowercase conversion
URL removal
Mention/hashtag removal
Special character removal
Whitespace normalization

Inference Pipeline

Text preprocessing
Tokenization (NLTK word_tokenize)
Word vector lookup
Mean pooling of word vectors
Neural network classification
Softmax probability output

Dependencies

tensorflow>=2.13.0
gensim>=4.3.0
nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.3.0

Ethical Considerations

Responsible Use

This model should complement, not replace human judgment in sensitive applications
Emotion detection has limitations and may not always be accurate
Consider privacy implications when analyzing personal communications
Be aware of potential biases in synthetic training data

Not Recommended For

Clinical mental health diagnosis
Legal or law enforcement decisions
Employment decisions
Automated content removal without human review

Bias Considerations

The model was trained on synthetically generated data, which may not represent all demographic groups equally
Emotional expression varies across cultures, age groups, and contexts
The model may perform differently on various writing styles and dialects

Citation

If you use this model in your research or applications, please cite:

@model{emotion_classifier_20_2025,
  author = {Shreyas Pulle},
  title = {20-Emotion Text Classification Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/shreyaspulle98/emotion-classifier-20-emotions}
}

Dataset

The training dataset is available at: shreyaspulle98/emotion-dataset-20-emotions

License

This model is released under the MIT License. You are free to use, modify, and distribute this model for commercial and non-commercial purposes.

Contact

HuggingFace: @shreyaspulle98
Model Repository: emotion-classifier-20-emotions

Acknowledgments

Training data generated using DeepInfra API
Built with TensorFlow/Keras and Gensim
Inspired by advances in emotion AI and affective computing

Try it out! Test the model with your own text and explore the 20 emotions it can detect.

Downloads last month: 46

shreyaspulle98
/

emotion-classifier-20-emotions