20-Emotion Text Classification Model

A deep learning model for fine-grained emotion classification that can detect 20 distinct emotions in text with high accuracy.

Model Description

This model uses a combination of Word2Vec embeddings and a Neural Network classifier to identify emotions in text. Unlike simple sentiment analysis (positive/negative), this model can distinguish between 20 different emotional states, providing nuanced understanding of emotional content.

Architecture

  • Embedding Layer: Word2Vec (100-dimensional vectors)

    • Trained on 79,595 emotion-labeled sentences
    • Optimized model size: 2.9MB
  • Classifier: Feedforward Neural Network

    • Input: Sentence embeddings (mean-pooled word vectors)
    • Hidden layers with dropout for regularization
    • Output: 20-class softmax classification
    • Model size: 111KB

20 Emotions Detected

The model can classify text into these 20 emotions:

  1. Happiness
  2. Sadness
  3. Fear
  4. Anger
  5. Disgust
  6. Surprise
  7. Love
  8. Excitement
  9. Embarrassment
  10. Loneliness
  11. Anxiety
  12. Frustration
  13. Guilt
  14. Disappointment
  15. Jealousy
  16. Gratitude
  17. Pride
  18. Relief
  19. Hope
  20. Confusion

Usage

Installation

pip install tensorflow gensim nltk numpy scikit-learn

Quick Start

import numpy as np
from tensorflow import keras
from gensim.models import Word2Vec
from nltk.tokenize import word_tokenize
import pickle
import re
from huggingface_hub import hf_hub_download

# Download model files
model_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
                              filename="best_model.keras")
w2v_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
                           filename="word2vec_optimized.model")
encoder_path = hf_hub_download(repo_id="shreyaspulle98/emotion-classifier-20-emotions",
                               filename="label_encoder.pkl")

# Load models
w2v_model = Word2Vec.load(w2v_path)
classifier = keras.models.load_model(model_path, compile=False)
with open(encoder_path, 'rb') as f:
    label_encoder = pickle.load(f)

# Preprocessing function
def preprocess_text(text):
    text = str(text).lower()
    text = re.sub(r'http\S+|www\S+|https\S+', '', text)
    text = re.sub(r'@\w+', '', text)
    text = re.sub(r'#\w+', '', text)
    harmful_punctuation = '"#$%&()*+-/:;<=>@[\\]^_`{|}~'
    text = text.translate(str.maketrans('', '', harmful_punctuation))
    text = re.sub(r'\s+', ' ', text).strip()
    return text

# Sentence to vector
def sentence_to_vector(sentence, w2v_model):
    words = word_tokenize(sentence.lower())
    word_vectors = [w2v_model.wv[word] for word in words if word in w2v_model.wv]
    if len(word_vectors) == 0:
        return np.zeros(w2v_model.wv.vector_size)
    return np.mean(word_vectors, axis=0)

# Prediction function
def predict_emotion(text, top_k=5):
    # Preprocess
    cleaned = preprocess_text(text)

    # Convert to vector
    vector = sentence_to_vector(cleaned, w2v_model).reshape(1, -1)

    # Predict
    probs = classifier.predict(vector, verbose=0)[0]

    # Get top-k predictions
    top_indices = np.argsort(probs)[-top_k:][::-1]

    results = []
    for idx in top_indices:
        emotion = label_encoder.inverse_transform([idx])[0]
        confidence = float(probs[idx])
        results.append({
            'emotion': emotion,
            'confidence': confidence,
            'percentage': round(confidence * 100, 1)
        })

    return results

# Example usage
text = "I'm so excited about this amazing opportunity!"
predictions = predict_emotion(text)

print(f"Text: {text}")
print("\nTop predictions:")
for pred in predictions:
    print(f"  {pred['emotion']}: {pred['percentage']}%")

Output Example

Text: I'm so excited about this amazing opportunity!

Top predictions:
  excitement: 78.5%
  happiness: 12.3%
  hope: 4.2%
  gratitude: 2.8%
  pride: 2.2%

Training Data

This model was trained on the emotion-dataset-20-emotions dataset, which contains:

  • 79,595 sentences with emotion labels
  • 20 balanced emotion categories
  • Synthetically generated using advanced language models
  • Cleaned and preprocessed text

Performance

The model achieves strong performance across all 20 emotion categories:

  • Training accuracy: ~95%
  • Balanced emotion distribution: Each emotion well-represented
  • Fast inference: < 100ms per prediction on CPU

Strengths

  • Can distinguish between subtle emotional differences (e.g., anxiety vs. fear, disappointment vs. sadness)
  • Works well with everyday conversational language
  • Lightweight and fast inference
  • No external API calls required

Limitations

  • English only: Currently supports only English text
  • Synthetic training data: May not capture all real-world emotional expressions
  • Single emotion: Assigns one primary emotion (though provides confidence scores for others)
  • Context-dependent: May struggle with sarcasm, irony, or culturally-specific expressions
  • Short text optimized: Best performance on sentence-level text (10-50 words)

Use Cases

This model is ideal for:

  • Mental Health Apps: Detect emotional states in user journals or messages
  • Customer Service: Analyze customer sentiment in support tickets and feedback
  • Social Media Analytics: Understand emotional tone of posts and comments
  • Chatbots: Enable emotion-aware conversational AI
  • Content Moderation: Flag content expressing concerning emotions
  • UX Research: Analyze user feedback and reviews for emotional insights
  • Educational Tools: Help students identify and understand emotions in text

Model Files

  • best_model.keras (111KB): Neural network classifier
  • word2vec_optimized.model (2.9MB): Word2Vec embeddings
  • label_encoder.pkl (457B): Label encoder for emotion categories

Technical Details

Preprocessing Pipeline

  1. Lowercase conversion
  2. URL removal
  3. Mention/hashtag removal
  4. Special character removal
  5. Whitespace normalization

Inference Pipeline

  1. Text preprocessing
  2. Tokenization (NLTK word_tokenize)
  3. Word vector lookup
  4. Mean pooling of word vectors
  5. Neural network classification
  6. Softmax probability output

Dependencies

tensorflow>=2.13.0
gensim>=4.3.0
nltk>=3.8.0
numpy>=1.24.0
scikit-learn>=1.3.0

Ethical Considerations

Responsible Use

  • This model should complement, not replace human judgment in sensitive applications
  • Emotion detection has limitations and may not always be accurate
  • Consider privacy implications when analyzing personal communications
  • Be aware of potential biases in synthetic training data

Not Recommended For

  • Clinical mental health diagnosis
  • Legal or law enforcement decisions
  • Employment decisions
  • Automated content removal without human review

Bias Considerations

  • The model was trained on synthetically generated data, which may not represent all demographic groups equally
  • Emotional expression varies across cultures, age groups, and contexts
  • The model may perform differently on various writing styles and dialects

Citation

If you use this model in your research or applications, please cite:

@model{emotion_classifier_20_2025,
  author = {Shreyas Pulle},
  title = {20-Emotion Text Classification Model},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/shreyaspulle98/emotion-classifier-20-emotions}
}

Dataset

The training dataset is available at: shreyaspulle98/emotion-dataset-20-emotions

License

This model is released under the MIT License. You are free to use, modify, and distribute this model for commercial and non-commercial purposes.

Contact

Acknowledgments

  • Training data generated using DeepInfra API
  • Built with TensorFlow/Keras and Gensim
  • Inspired by advances in emotion AI and affective computing

Try it out! Test the model with your own text and explore the 20 emotions it can detect.

Downloads last month
46
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train shreyaspulle98/emotion-classifier-20-emotions