Spaces:

brighter-dataset
/

README

Running

App Files Files Community

jpwahle commited on Jun 15

Commit

b847ba8

verified ·

1 Parent(s): df768d6

Update README.md

Browse files

Files changed (1) hide show

README.md +8 -25

README.md CHANGED Viewed

@@ -9,35 +9,22 @@ pinned: false
 # BRIGHTER Dataset Organization
-## 🌍 BRIdging the Gap in Human-Annotated Textual Emotion Recognition
 Welcome to the official Hugging Face organization for **BRIGHTER** - a multilingual emotion recognition dataset collection spanning 28 languages from 7 distinct language families.
-### 📊 Overview
-BRIGHTER addresses the critical gap in emotion recognition resources for low-resource languages, particularly those spoken in Africa, Asia, and Latin America. Our dataset provides human-annotated emotion labels across diverse linguistic landscapes, enabling more inclusive and representative emotion AI systems.
-### 🎯 Key Features
-- **28 Languages**: Comprehensive coverage including many low-resource languages
-- **7 Language Families**: Diverse linguistic representation
-- **Human-Annotated**: High-quality annotations for reliable emotion recognition
-- **Research-Ready**: Standardized format for easy integration into ML pipelines
-### 📚 Datasets Available
-Browse our collection of emotion-annotated datasets across multiple languages. Each dataset includes:
-- Text samples with emotion labels
-- Language-specific preprocessing
-- Train/validation/test splits
-- Detailed documentation
-### 🔬 Research Findings
-Our research demonstrates important insights for multilingual emotion recognition:
-- **Language-Specific Prompting**: Models show varying performance when prompted in English vs. target languages
-- **Few-Shot Learning**: Performance improves consistently with increased examples
-- **Prompt Sensitivity**: Different prompt formulations significantly impact model performance
 ### 📖 Citation
@@ -70,8 +57,4 @@ If you use our datasets, please cite our papers:
 - **Shamsuddeen Hassan Muhammad**: [[email protected]](mailto:[email protected])
 - **Nedjma Ousidhoum**: [[email protected]](mailto:[email protected])
-### 🌐 Project Website
-Visit our official project page for more information: [https://brighter-dataset.github.io/](https://brighter-dataset.github.io/)
 *Equal contribution by lead authors.

 # BRIGHTER Dataset Organization
+## 🌍 BRIdging the Gap in Human-Annotated Textual Emotion Recognition (BRIGHTER)
 Welcome to the official Hugging Face organization for **BRIGHTER** - a multilingual emotion recognition dataset collection spanning 28 languages from 7 distinct language families.
+### 🌐 Project Website
+Visit our official project page for more information: [https://brighter-dataset.github.io/](https://brighter-dataset.github.io/)
+## TL;DR
+We introduce BRIGHTER: a new emotion recognition dataset collection in 28 languages that originate from 7 distinct language families. Many of these languages are considered low-resource, and are mainly spoken in regions characterised by a limited availability of NLP resources (e.g., Africa, Asia, Latin America).
+Our contribuitions:
+- A linguistically diverse multilingual dataset: BRIGHTER consists of nearly 100k emotion-annotated instances in 28 languages, predominantly from Africa, Asia, Eastern Europe, and Latin America. The dataset spans 7 language families and covers a variety of domains, including social media, speeches, news, literature, and reviews. Each instance is multi-labeled with six emotion classes — joy, sadness, anger, fear, surprise, disgust, and neutral — and annotated within four emotion intensity levels, ranging from 0 to 3.
+- Baseline Evaluation: We provide an initial set of monolingual and crosslingual experiments, benchmarking Large Language Models (LLMs) for multi-label emotion identification and intensity prediction. Our results highlight the performance disparities across languages, showing that LLMs struggle with perceived emotions in text, especially for low-resource languages, and often perform better when prompted in English.
 ### 📖 Citation
 - **Shamsuddeen Hassan Muhammad**: [[email protected]](mailto:[email protected])
 - **Nedjma Ousidhoum**: [[email protected]](mailto:[email protected])
 *Equal contribution by lead authors.