jpwahle commited on
Commit
b847ba8
Β·
verified Β·
1 Parent(s): df768d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -25
README.md CHANGED
@@ -9,35 +9,22 @@ pinned: false
9
 
10
  # BRIGHTER Dataset Organization
11
 
12
- ## 🌍 BRIdging the Gap in Human-Annotated Textual Emotion Recognition
13
 
14
  Welcome to the official Hugging Face organization for **BRIGHTER** - a multilingual emotion recognition dataset collection spanning 28 languages from 7 distinct language families.
15
 
16
- ### πŸ“Š Overview
17
-
18
- BRIGHTER addresses the critical gap in emotion recognition resources for low-resource languages, particularly those spoken in Africa, Asia, and Latin America. Our dataset provides human-annotated emotion labels across diverse linguistic landscapes, enabling more inclusive and representative emotion AI systems.
19
-
20
- ### 🎯 Key Features
21
 
22
- - **28 Languages**: Comprehensive coverage including many low-resource languages
23
- - **7 Language Families**: Diverse linguistic representation
24
- - **Human-Annotated**: High-quality annotations for reliable emotion recognition
25
- - **Research-Ready**: Standardized format for easy integration into ML pipelines
26
 
27
- ### πŸ“š Datasets Available
28
 
29
- Browse our collection of emotion-annotated datasets across multiple languages. Each dataset includes:
30
- - Text samples with emotion labels
31
- - Language-specific preprocessing
32
- - Train/validation/test splits
33
- - Detailed documentation
34
 
35
- ### πŸ”¬ Research Findings
36
 
37
- Our research demonstrates important insights for multilingual emotion recognition:
38
- - **Language-Specific Prompting**: Models show varying performance when prompted in English vs. target languages
39
- - **Few-Shot Learning**: Performance improves consistently with increased examples
40
- - **Prompt Sensitivity**: Different prompt formulations significantly impact model performance
41
 
42
  ### πŸ“– Citation
43
 
@@ -70,8 +57,4 @@ If you use our datasets, please cite our papers:
70
  - **Shamsuddeen Hassan Muhammad**: [[email protected]](mailto:[email protected])
71
  - **Nedjma Ousidhoum**: [[email protected]](mailto:[email protected])
72
 
73
- ### 🌐 Project Website
74
-
75
- Visit our official project page for more information: [https://brighter-dataset.github.io/](https://brighter-dataset.github.io/)
76
-
77
  *Equal contribution by lead authors.
 
9
 
10
  # BRIGHTER Dataset Organization
11
 
12
+ ## 🌍 BRIdging the Gap in Human-Annotated Textual Emotion Recognition (BRIGHTER)
13
 
14
  Welcome to the official Hugging Face organization for **BRIGHTER** - a multilingual emotion recognition dataset collection spanning 28 languages from 7 distinct language families.
15
 
16
+ ### 🌐 Project Website
 
 
 
 
17
 
18
+ Visit our official project page for more information: [https://brighter-dataset.github.io/](https://brighter-dataset.github.io/)
 
 
 
19
 
20
+ ## TL;DR
21
 
22
+ We introduce BRIGHTER: a new emotion recognition dataset collection in 28 languages that originate from 7 distinct language families. Many of these languages are considered low-resource, and are mainly spoken in regions characterised by a limited availability of NLP resources (e.g., Africa, Asia, Latin America).
 
 
 
 
23
 
24
+ Our contribuitions:
25
 
26
+ - A linguistically diverse multilingual dataset: BRIGHTER consists of nearly 100k emotion-annotated instances in 28 languages, predominantly from Africa, Asia, Eastern Europe, and Latin America. The dataset spans 7 language families and covers a variety of domains, including social media, speeches, news, literature, and reviews. Each instance is multi-labeled with six emotion classes β€” joy, sadness, anger, fear, surprise, disgust, and neutral β€” and annotated within four emotion intensity levels, ranging from 0 to 3.
27
+ - Baseline Evaluation: We provide an initial set of monolingual and crosslingual experiments, benchmarking Large Language Models (LLMs) for multi-label emotion identification and intensity prediction. Our results highlight the performance disparities across languages, showing that LLMs struggle with perceived emotions in text, especially for low-resource languages, and often perform better when prompted in English.
 
 
28
 
29
  ### πŸ“– Citation
30
 
 
57
  - **Shamsuddeen Hassan Muhammad**: [[email protected]](mailto:[email protected])
58
  - **Nedjma Ousidhoum**: [[email protected]](mailto:[email protected])
59
 
 
 
 
 
60
  *Equal contribution by lead authors.