Commit
·
2d9a5d7
1
Parent(s):
27b518a
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,4 +1,5 @@
|
|
| 1 |
# AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages
|
|
|
|
| 2 |
|
| 3 |
This repository contains the model for our paper `AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages` which will appear at the Third Simple and Efficient Natural Language Processing, at EMNLP 2022.
|
| 4 |
|
|
@@ -9,7 +10,7 @@ This repository contains the model for our paper `AfroLM: A Self-Active Learning
|
|
| 9 |
AfroLM has been pretrained from scratch on 23 African Languages: Amharic, Afan Oromo, Bambara, Ghomalá, Éwé, Fon, Hausa, Ìgbò, Kinyarwanda, Lingala, Luganda, Luo, Mooré, Chewa, Naija, Shona, Swahili, Setswana, Twi, Wolof, Xhosa, Yorùbá, and Zulu.
|
| 10 |
|
| 11 |
## Evaluation Results
|
| 12 |
-
AfroLM was evaluated on MasakhaNER1.0 (10 African Languages) and MasakhaNER2.0 (21 African Languages) datasets; on text classification and sentiment analysis. AfroLM outperformed AfriBERTa, mBERT, and XLMR-base, and was very competitive with AfroXLMR. AfroLM is also very data efficient because it was pretrained on a dataset 14x+ smaller than its competitors' datasets. Below is the average
|
| 13 |
|
| 14 |
Model | MasakhaNER | MasakhaNER2.0* | Text Classification (Yoruba/Hausa) | Sentiment Analysis (YOSM) | OOD Sentiment Analysis (Twitter -> YOSM) |
|
| 15 |
|:---: |:---: |:---: | :---: |:---: | :---: |
|
|
|
|
| 1 |
# AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages
|
| 2 |
+
- [GitHub Repository of the Paper](https://github.com/bonaventuredossou/MLM_AL)
|
| 3 |
|
| 4 |
This repository contains the model for our paper `AfroLM: A Self-Active Learning-based Multilingual Pretrained Language Model for 23 African Languages` which will appear at the Third Simple and Efficient Natural Language Processing, at EMNLP 2022.
|
| 5 |
|
|
|
|
| 10 |
AfroLM has been pretrained from scratch on 23 African Languages: Amharic, Afan Oromo, Bambara, Ghomalá, Éwé, Fon, Hausa, Ìgbò, Kinyarwanda, Lingala, Luganda, Luo, Mooré, Chewa, Naija, Shona, Swahili, Setswana, Twi, Wolof, Xhosa, Yorùbá, and Zulu.
|
| 11 |
|
| 12 |
## Evaluation Results
|
| 13 |
+
AfroLM was evaluated on MasakhaNER1.0 (10 African Languages) and MasakhaNER2.0 (21 African Languages) datasets; on text classification and sentiment analysis. AfroLM outperformed AfriBERTa, mBERT, and XLMR-base, and was very competitive with AfroXLMR. AfroLM is also very data efficient because it was pretrained on a dataset 14x+ smaller than its competitors' datasets. Below is the average performance of various models, across various datasets. Please consult our paper for more language-level performance.
|
| 14 |
|
| 15 |
Model | MasakhaNER | MasakhaNER2.0* | Text Classification (Yoruba/Hausa) | Sentiment Analysis (YOSM) | OOD Sentiment Analysis (Twitter -> YOSM) |
|
| 16 |
|:---: |:---: |:---: | :---: |:---: | :---: |
|