Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
|
@@ -9,35 +9,22 @@ pinned: false
|
|
| 9 |
|
| 10 |
# BRIGHTER Dataset Organization
|
| 11 |
|
| 12 |
-
## π BRIdging the Gap in Human-Annotated Textual Emotion Recognition
|
| 13 |
|
| 14 |
Welcome to the official Hugging Face organization for **BRIGHTER** - a multilingual emotion recognition dataset collection spanning 28 languages from 7 distinct language families.
|
| 15 |
|
| 16 |
-
###
|
| 17 |
-
|
| 18 |
-
BRIGHTER addresses the critical gap in emotion recognition resources for low-resource languages, particularly those spoken in Africa, Asia, and Latin America. Our dataset provides human-annotated emotion labels across diverse linguistic landscapes, enabling more inclusive and representative emotion AI systems.
|
| 19 |
-
|
| 20 |
-
### π― Key Features
|
| 21 |
|
| 22 |
-
|
| 23 |
-
- **7 Language Families**: Diverse linguistic representation
|
| 24 |
-
- **Human-Annotated**: High-quality annotations for reliable emotion recognition
|
| 25 |
-
- **Research-Ready**: Standardized format for easy integration into ML pipelines
|
| 26 |
|
| 27 |
-
|
| 28 |
|
| 29 |
-
|
| 30 |
-
- Text samples with emotion labels
|
| 31 |
-
- Language-specific preprocessing
|
| 32 |
-
- Train/validation/test splits
|
| 33 |
-
- Detailed documentation
|
| 34 |
|
| 35 |
-
|
| 36 |
|
| 37 |
-
|
| 38 |
-
-
|
| 39 |
-
- **Few-Shot Learning**: Performance improves consistently with increased examples
|
| 40 |
-
- **Prompt Sensitivity**: Different prompt formulations significantly impact model performance
|
| 41 |
|
| 42 |
### π Citation
|
| 43 |
|
|
@@ -70,8 +57,4 @@ If you use our datasets, please cite our papers:
|
|
| 70 |
- **Shamsuddeen Hassan Muhammad**: [[email protected]](mailto:[email protected])
|
| 71 |
- **Nedjma Ousidhoum**: [[email protected]](mailto:[email protected])
|
| 72 |
|
| 73 |
-
### π Project Website
|
| 74 |
-
|
| 75 |
-
Visit our official project page for more information: [https://brighter-dataset.github.io/](https://brighter-dataset.github.io/)
|
| 76 |
-
|
| 77 |
*Equal contribution by lead authors.
|
|
|
|
| 9 |
|
| 10 |
# BRIGHTER Dataset Organization
|
| 11 |
|
| 12 |
+
## π BRIdging the Gap in Human-Annotated Textual Emotion Recognition (BRIGHTER)
|
| 13 |
|
| 14 |
Welcome to the official Hugging Face organization for **BRIGHTER** - a multilingual emotion recognition dataset collection spanning 28 languages from 7 distinct language families.
|
| 15 |
|
| 16 |
+
### π Project Website
|
|
|
|
|
|
|
|
|
|
|
|
|
| 17 |
|
| 18 |
+
Visit our official project page for more information: [https://brighter-dataset.github.io/](https://brighter-dataset.github.io/)
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
+
## TL;DR
|
| 21 |
|
| 22 |
+
We introduce BRIGHTER: a new emotion recognition dataset collection in 28 languages that originate from 7 distinct language families. Many of these languages are considered low-resource, and are mainly spoken in regions characterised by a limited availability of NLP resources (e.g., Africa, Asia, Latin America).
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
|
| 24 |
+
Our contribuitions:
|
| 25 |
|
| 26 |
+
- A linguistically diverse multilingual dataset: BRIGHTER consists of nearly 100k emotion-annotated instances in 28 languages, predominantly from Africa, Asia, Eastern Europe, and Latin America. The dataset spans 7 language families and covers a variety of domains, including social media, speeches, news, literature, and reviews. Each instance is multi-labeled with six emotion classes β joy, sadness, anger, fear, surprise, disgust, and neutral β and annotated within four emotion intensity levels, ranging from 0 to 3.
|
| 27 |
+
- Baseline Evaluation: We provide an initial set of monolingual and crosslingual experiments, benchmarking Large Language Models (LLMs) for multi-label emotion identification and intensity prediction. Our results highlight the performance disparities across languages, showing that LLMs struggle with perceived emotions in text, especially for low-resource languages, and often perform better when prompted in English.
|
|
|
|
|
|
|
| 28 |
|
| 29 |
### π Citation
|
| 30 |
|
|
|
|
| 57 |
- **Shamsuddeen Hassan Muhammad**: [[email protected]](mailto:[email protected])
|
| 58 |
- **Nedjma Ousidhoum**: [[email protected]](mailto:[email protected])
|
| 59 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 60 |
*Equal contribution by lead authors.
|