PSewmuthu commited on
Commit
7515162
Β·
1 Parent(s): 32d4156

Update default readme file

Browse files
Files changed (1) hide show
  1. README.md +259 -0
README.md CHANGED
@@ -1,3 +1,262 @@
1
  ---
 
2
  license: apache-2.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
1
  ---
2
+ language: en
3
  license: apache-2.0
4
+ library_name: tensorflow
5
+ tags:
6
+ - tensorflow
7
+ - keras
8
+ - tflite
9
+ - emotion-recognition
10
+ - transformer
11
+ - lstm
12
+ - mediapipe
13
+ - computer-vision
14
+ - deep-learning
15
+ - facial-expression
16
+ - affective-computing
17
+ - sequential-data
18
+ model-index:
19
+ - name: emotion_landmark_lstm_model
20
+ results:
21
+ - task:
22
+ type: sequence-classification
23
+ dataset:
24
+ type: dataset
25
+ name: Optimized 478-Point 3D Facial Landmark Dataset
26
+ metrics:
27
+ - name: accuracy
28
+ type: float
29
+ value: 0.7289
30
+ inference: "Supports TensorFlow and TensorFlow Lite real-time inference"
31
+ ---
32
+
33
+ # πŸŽ₯ Emotion Sequence Transformer (TensorFlow) β€” Mediapipe 478 Landmarks (Seq256)
34
+
35
+ **Version:** v1.0
36
+ **Framework:** TensorFlow 2.x
37
+ **Optimized format:** TensorFlow Lite
38
+ **Input:** 478 Mediapipe Face Mesh landmarks per frame (up to 300 frames)
39
+ **Output:** 6-class emotion prediction (`Angry`, `Disgust`, `Fear`, `Happy`, `Neutral`, `Sad`)
40
+
41
+ ---
42
+
43
+ ## 🧠 Model Overview
44
+
45
+ The **Emotion Sequence Transformer** is a deep learning model built using TensorFlow for recognizing **human emotions** from continuous **video clips**.
46
+ It uses **478 Mediapipe facial landmarks per frame** to capture spatiotemporal patterns of facial movements across time.
47
+ The model predicts one of six basic emotions by analyzing both facial geometry and temporal variation within sequences of up to **300 frames**.
48
+
49
+ This model is suitable for **real-time video-based emotion detection**, **affective computing**, **human-computer interaction**, and **emotion-aware AI systems**.
50
+
51
+ ---
52
+
53
+ ## πŸ“Š Dataset
54
+
55
+ This model was trained on the **[Optimized 478-Point 3D Facial Landmark Dataset](https://www.kaggle.com/datasets/psewmuthu/optimized-video-facial-landmarks)** β€”
56
+ a dataset derived from the **Video Emotion Dataset**, optimized for emotion recognition using Mediapipe’s 3D face mesh landmarks.
57
+
58
+ Each sample in the dataset includes:
59
+
60
+ - Up to **300 frames per clip**
61
+ - **478 facial landmarks per frame**
62
+ - Corresponding **emotion label**
63
+
64
+ ---
65
+
66
+ ## 🧩 Model Architecture
67
+
68
+ The architecture is based on a **Transformer encoder** design that processes sequential data of facial landmarks.
69
+
70
+ **Pipeline:**
71
+
72
+ 1. Input normalization using precomputed mean and std (global stats)
73
+ 2. Sequence embedding via positional encodings
74
+ 3. Transformer encoder blocks to capture temporal and spatial dependencies
75
+ 4. Dense layers for emotion classification (6 output neurons with softmax)
76
+
77
+ **Core Components:**
78
+
79
+ - Transformer Encoder Layers (Multi-Head Self-Attention)
80
+ - Layer Normalization and Dropout
81
+ - Dense classification head
82
+
83
+ ---
84
+
85
+ ## πŸ“ˆ Performance
86
+
87
+ | Metric | Value |
88
+ | --------------------- | ---------- |
89
+ | **Test Accuracy** | 0.7289 |
90
+ | **Test Loss** | 1.1336 |
91
+ | **Macro F1-Score** | 0.73 |
92
+ | **Weighted F1-Score** | 0.73 |
93
+ | **Max Clip Length** | 300 frames |
94
+ | **Input Shape** | (300, 478) |
95
+
96
+ ### 🧾 Classification Report
97
+
98
+ | Emotion | Precision | Recall | F1-score | Support |
99
+ | -------------------- | --------- | ------ | ------------------- | ------- |
100
+ | Angry | 0.75 | 0.73 | 0.74 | 139 |
101
+ | Disgust | 0.88 | 0.70 | 0.78 | 128 |
102
+ | Fear | 0.52 | 0.60 | 0.55 | 114 |
103
+ | Happy | 0.88 | 0.97 | 0.92 | 129 |
104
+ | Neutral | 0.66 | 0.79 | 0.72 | 101 |
105
+ | Sad | 0.70 | 0.58 | 0.64 | 134 |
106
+ | **Overall Accuracy** | **0.73** | | **Macro Avg: 0.73** | 745 |
107
+
108
+ ---
109
+
110
+ ## πŸ“Š Visualizations
111
+
112
+ ### πŸ”Ή Training Accuracy and Loss
113
+
114
+ ![Accuracy and Loss](images/Accuracies_and_Losses.png)
115
+
116
+ ### πŸ”Ή Confusion Matrix
117
+
118
+ ![Confusion Matrix](images/Confusion_Matrix.png)
119
+
120
+ ### πŸ”Ή ROC Curves (Per Class)
121
+
122
+ ![ROC Curves](images/ROC_Curves.png)
123
+
124
+ ---
125
+
126
+ ## πŸ“‚ Repository Structure
127
+
128
+ ```
129
+ TF-Emotion-Sequence-Transformer/
130
+ β”œβ”€β”€ tf_emotion_sequence_transformer_mp478_seq256.h5
131
+ β”œβ”€β”€ tf_emotion_sequence_transformer_mp478_seq256_optimized.tflite
132
+ β”œβ”€β”€ tf_emotion-sequence-transformer-bilstm-usage.ipynb
133
+ β”œβ”€β”€ assets/
134
+ β”‚ β”œβ”€β”€ global_mean.npy
135
+ β”‚ β”œβ”€β”€ global_std.npy
136
+ β”‚ β”œβ”€β”€ label_encoder.pkl
137
+ β”‚ └── metadata.json
138
+ └── README.md
139
+ ```
140
+
141
+ ### File Descriptions
142
+
143
+ | File | Description |
144
+ | --------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------- |
145
+ | `tf_emotion_sequence_transformer_mp478_seq256.h5` | Main TensorFlow model trained on 478 landmarks (300 frames max). |
146
+ | `tf_emotion_sequence_transformer_mp478_seq256_optimized.tflite` | Optimized TensorFlow Lite version for deployment (mobile, edge). |
147
+ | `tf_emotion-sequence-transformer-bilstm-usage.ipynb` | Example notebook demonstrating how to use the model for emotion prediction from Mediapipe landmarks. |
148
+ | `assets/global_mean.npy` | Precomputed global mean for normalization. |
149
+ | `assets/global_std.npy` | Precomputed global standard deviation for normalization. |
150
+ | `assets/label_encoder.pkl` | Encoder mapping integer labels to emotion names. |
151
+ | `assets/metadata.json` | Model metadata and configuration details. |
152
+
153
+ ---
154
+
155
+ ## πŸš€ Example Usage
156
+
157
+ ### πŸ”Έ TensorFlow (.h5) Model
158
+
159
+ ```python
160
+ import numpy as np
161
+ import tensorflow as tf
162
+ import joblib
163
+ import json
164
+
165
+ # Load Model
166
+ model = tf.keras.models.load_model("tf_emotion_sequence_transformer_mp478_seq256.h5")
167
+
168
+ # Load assets
169
+ mean = np.load("assets/global_mean.npy")
170
+ std = np.load("assets/global_std.npy")
171
+ label_encoder = joblib.load("assets/label_encoder.pkl")
172
+
173
+ # Preprocess input
174
+ input_seq = np.load("example_input.npy") # shape: (300, 478)
175
+ input_seq = (input_seq - mean) / std
176
+ input_seq = np.expand_dims(input_seq, axis=0)
177
+
178
+ # Predict
179
+ pred = model.predict(input_seq)
180
+ emotion = label_encoder.inverse_transform([np.argmax(pred)])[0]
181
+ print("Predicted Emotion:", emotion)
182
+ ```
183
+
184
+ ---
185
+
186
+ ### πŸ”Έ TensorFlow Lite (Optimized) Model
187
+
188
+ ```python
189
+ import numpy as np
190
+ import tensorflow as tf
191
+ import joblib
192
+
193
+ # Load TFLite model
194
+ interpreter = tf.lite.Interpreter(model_path="tf_emotion_sequence_transformer_mp478_seq256_optimized.tflite")
195
+ interpreter.allocate_tensors()
196
+
197
+ # Get input and output tensors
198
+ input_details = interpreter.get_input_details()
199
+ output_details = interpreter.get_output_details()
200
+
201
+ # Load preprocessing assets
202
+ mean = np.load("assets/global_mean.npy")
203
+ std = np.load("assets/global_std.npy")
204
+ label_encoder = joblib.load("assets/label_encoder.pkl")
205
+
206
+ # Prepare input
207
+ input_seq = np.load("example_input.npy") # shape: (300, 478)
208
+ input_seq = (input_seq - mean) / std
209
+ input_seq = np.expand_dims(input_seq, axis=0).astype(np.float32)
210
+
211
+ # Inference
212
+ interpreter.set_tensor(input_details[0]['index'], input_seq)
213
+ interpreter.invoke()
214
+ pred = interpreter.get_tensor(output_details[0]['index'])
215
+
216
+ # Decode emotion
217
+ emotion = label_encoder.inverse_transform([np.argmax(pred)])[0]
218
+ print("Predicted Emotion:", emotion)
219
+ ```
220
+
221
+ ---
222
+
223
+ ## πŸ”– Version Information
224
+
225
+ **Version:** v1.0
226
+ **Date:** November 2025
227
+ **Author:** [P.S. Abewickrama Singhe](https://www.kaggle.com/psewmuthu)
228
+ **Framework:** TensorFlow 2.x
229
+ **Exported Models:** `.h5`, `.tflite`
230
+ **Landmarks per frame:** 478
231
+ **Max frames per clip:** 300
232
+
233
+ ---
234
+
235
+ ## 🏷️ Tags
236
+
237
+ `tensorflow` β€’ `emotion-recognition` β€’ `mediapipe` β€’ `transformer` β€’ `sequence-model` β€’ `facial-landmarks` β€’ `video-analysis` β€’ `tflite` β€’ `human-emotion-ai` β€’ `affective-computing` β€’ `computer-vision` β€’ `deep-learning`
238
+
239
+ ---
240
+
241
+ ## πŸ“š Citation
242
+
243
+ If you use this model in your research, please cite it as:
244
+
245
+ ```bibtex
246
+ @misc{pasindu_sewmuthu_abewickrama_singhe_2025,
247
+ author = { Pasindu Sewmuthu Abewickrama Singhe },
248
+ title = { EmotionFormer-BiLSTM (Revision f329517) },
249
+ year = 2025,
250
+ url = { https://huggingface.co/PSewmuthu/EmotionFormer-BiLSTM },
251
+ doi = { 10.57967/hf/6899 },
252
+ publisher = { Hugging Face }
253
+ }
254
+ ```
255
+
256
+ ---
257
+
258
+ ## πŸͺͺ License
259
+
260
+ This model is released under the **Apache 2.0 License** β€” free for academic and commercial use with attribution.
261
+
262
  ---