EYEDOL commited on
Commit
cee8898
Β·
verified Β·
1 Parent(s): cd7c5a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +130 -65
README.md CHANGED
@@ -20,116 +20,181 @@ metrics:
20
  pipeline_tag: text-generation
21
  ---
22
 
23
- # Uploaded model
24
 
25
- - **Developed by:** EYEDOL
26
- - **License:** apache-2.0
27
- - **Finetuned from model :** unsloth/llama-3.2-3b-instruct
 
 
 
28
 
29
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
30
 
31
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
32
 
 
 
33
 
34
- # Model Card: SALAMA LLM
35
 
36
- **Model Name:** SALAMA LLM
37
- **Developed by:** [Your Team or Organization Name]
38
- **Model Type:** Large Language Model (LLM)
39
- **Base Models:** UlizaLlama-7B, Llama 3.2, Google Gemma (2B–9B)
40
- **Language(s):** Swahili, English
41
- **License:** Apache 2.0
42
- **Repository:** [Hugging Face Link Here]
 
 
 
 
 
 
 
 
 
 
43
 
44
  ---
45
 
46
- ## Overview
47
 
48
- SALAMA LLM is the central **language understanding and generation module** within the **SALAMA Framework** β€” a scalable, end-to-end **speech-to-speech AI system** for African languages.
49
- It interprets transcribed speech, performs reasoning, and generates contextually appropriate responses in Swahili and English.
 
 
 
50
 
51
- This model was fine-tuned on Swahili-centric instruction data to enhance fluency, comprehension, and cultural relevance for conversational and task-based applications.
 
 
 
 
 
 
 
 
52
 
53
  ---
54
 
55
- ## ✳️ Architecture
56
 
57
- SALAMA LLM builds on top of **UlizaLlama (7B)** and leverages **Parameter-Efficient Fine-Tuning (PEFT)** using **LoRA/QLoRA** for resource-efficient adaptation.
58
- Training was conducted on a mixture of:
59
- - Instructional and dialogue datasets in Swahili and English
60
- - Domain-specific corpora for comprehension, summarization, question answering, and translation
 
 
 
61
 
62
  ---
63
 
64
- ## 🧾 Training Data
 
 
 
 
 
 
65
 
66
- | Dataset | Source | Tokens / Examples | Purpose |
67
- |----------|---------|------------------|----------|
68
- | Jacaranda/kiswallama-pretrained | Hugging Face | 321M Swahili tokens | Base pretraining |
69
- | Google Gemma Swahili Fine-tuning | Internal dataset | 20+ prompt-response pairs | Instruction tuning |
70
- | Custom Swahili QA corpus | Local compilation | 50K examples | Conversational fine-tuning |
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
71
 
72
  ---
73
 
74
- ## βš™οΈ Training Details
75
 
76
- - **Technique:** QLoRA Fine-tuning
77
- - **Precision:** 4-bit quantization
78
- - **Optimizer:** AdamW
79
- - **Learning Rate:** 2e-5
80
- - **Batch Size:** 8
81
- - **Epochs:** 3–5
82
- - **Hardware:** 1x A100 (24GB)
83
 
84
  ---
85
 
86
- ## 🧠 Capabilities
87
 
88
- - Contextual understanding of Swahili and English queries
89
- - Instruction following and summarization
90
- - Question answering and translation
91
- - Conversational generation
92
- - Named entity recognition and sentiment analysis
93
 
94
  ---
95
 
96
- ## πŸ“Š Evaluation Metrics
97
 
98
- | Task | Precision | Recall | F1 | BLEU | ROUGE | Accuracy |
99
- |------|------------|--------|----|------|--------|----------|
100
- | Question Answering | 0.955 | 0.782 | 0.879 | 0.50 | 0.61 | β€” |
101
- | Translation | β€” | β€” | β€” | 0.49 | 0.59 | β€” |
102
- | Sentiment Analysis | 0.968 | 0.943 | 0.954 | β€” | β€” | 97.9% |
103
- | Entity Recognition | 0.853 | 0.847 | 0.887 | β€” | β€” | β€” |
104
 
105
  ---
106
 
107
- ## πŸš€ Applications
108
 
109
- - Conversational voice assistants for Swahili
110
- - Educational bots and content summarizers
111
- - Low-resource multilingual chat systems
112
- - Research in African LLM adaptation
113
 
114
  ---
115
 
116
- ## 🧩 Limitations
 
 
117
 
118
- - Performance declines for code-mixed (Swahili-English) slang
119
- - May misinterpret rare dialectal expressions
120
- - Dependent on STT transcription accuracy in full STS pipeline
121
 
122
  ---
123
 
124
- ## 🀝 Citation
125
 
126
- If you use this model, please cite:
127
 
128
- > Adegoke Israel et al. (2025). *SALAMA: Scalable African Language Multimodal AI Framework*. Technical Report.
 
 
 
129
 
130
  ---
131
 
132
- ## πŸ”— Related Models
 
 
 
 
 
133
 
134
- - [`SALAMA-STT`](https://huggingface.co/yourname/salama-stt) β€” Swahili Whisper Fine-tuned
135
- - [`SALAMA-TTS`](https://huggingface.co/yourname/salama-tts) β€” Swahili VITS-based TTS
 
 
 
20
  pipeline_tag: text-generation
21
  ---
22
 
23
+ # 🧠 SALAMA LLM β€” Swahili Instruction-Tuned Text Generation Model
24
 
25
+ **Developer:** DressMatic AI Labs / EYEDOL Research
26
+ **Authors:** Israel Adegoke et al.
27
+ **Version:** v1.0
28
+ **License:** Apache 2.0
29
+ **Model Type:** Instruction-Tuned Large Language Model
30
+ **Base Model:** `unsloth/llama-3.2-3b-instruct`
31
 
32
+ ---
33
 
34
+ ## 🌍 Overview
35
 
36
+ **SALAMA LLM** is the **language understanding and generation engine** of the **SALAMA Framework** β€” a modular Speech-to-Speech (STS) AI pipeline built for African languages.
37
+ The model is fine-tuned on Swahili instruction datasets to enable natural, culturally relevant responses in text generation, summarization, question answering, and translation.
38
 
39
+ This model represents a major step in bridging the linguistic digital divide by providing high-quality Swahili AI text generation capabilities within an open, scalable framework.
40
 
41
+ ---
42
+
43
+ ## 🧱 Model Architecture
44
+
45
+ SALAMA LLM is based on **Unsloth’s optimized Llama-3.2-3B-Instruct**, fine-tuned using **Parameter-Efficient Fine-Tuning (PEFT)** via **LoRA/QLoRA**.
46
+ The architecture supports mixed Swahili-English text inputs while focusing on fluent Swahili text generation for both casual and formal domains.
47
+
48
+ | Parameter | Value |
49
+ |------------|--------|
50
+ | Base Model | `unsloth/llama-3.2-3b-instruct` |
51
+ | Fine-Tuning | QLoRA / LoRA (PEFT) |
52
+ | Precision | 4-bit quantization |
53
+ | Optimizer | AdamW |
54
+ | Learning Rate | 2e-5 |
55
+ | Epochs | 3–5 |
56
+ | Frameworks | Transformers, TRL, PEFT, Unsloth |
57
+ | Languages | Swahili (sw), English (en) |
58
 
59
  ---
60
 
61
+ ## πŸ“š Datasets
62
 
63
+ | Dataset | Description | Purpose |
64
+ |----------|--------------|----------|
65
+ | `saillab/alpaca_swahili_taco` | Swahili Alpaca-style instruction-response dataset | Instruction tuning |
66
+ | `Jacaranda/kiswallama-pretrained` | 321M Swahili tokens, custom tokenizer (20K vocab) | Base Swahili adaptation |
67
+ | Custom Swahili QA corpus | Curated Q&A and summarization samples | Conversational fine-tuning |
68
 
69
+ ---
70
+
71
+ ## 🧠 Model Capabilities
72
+
73
+ - Text generation in **Swahili and English**
74
+ - Instruction-following, summarization, and dialogue
75
+ - Question answering and translation (EN ↔ SW)
76
+ - Sentiment and named-entity recognition
77
+ - Contextually and culturally aligned text generation
78
 
79
  ---
80
 
81
+ ## πŸ“Š Evaluation Metrics
82
 
83
+ | Metric | Score | Description |
84
+ |---------|-------|-------------|
85
+ | **BLEU** | 0.49 | Measures fluency and translation accuracy |
86
+ | **ROUGE-L** | 0.61 | Summarization recall and overlap |
87
+ | **Accuracy (QA)** | 95.5% | Accuracy on Swahili QA tasks |
88
+ | **CER** | 0.28 | Character Error Rate |
89
+ | **F1 (avg)** | 0.90+ | Weighted average across tasks |
90
 
91
  ---
92
 
93
+ ## βš™οΈ Usage (Python Example)
94
+
95
+ Below is a quick example to load and use **SALAMA LLM** for Swahili text generation:
96
+
97
+ ```python
98
+ from transformers import AutoTokenizer, AutoModelForCausalLM
99
+ import torch
100
 
101
+ # Load model and tokenizer
102
+ model_name = "EYEDOL/salama-llm" # Change to your Hugging Face repo name
103
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
104
+ model = AutoModelForCausalLM.from_pretrained(
105
+ model_name,
106
+ torch_dtype=torch.bfloat16,
107
+ device_map="auto"
108
+ )
109
+
110
+ # Swahili text prompt
111
+ prompt = "Andika sentensi fupi kuhusu umuhimu wa elimu."
112
+
113
+ inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
114
+ outputs = model.generate(
115
+ **inputs,
116
+ max_new_tokens=120,
117
+ temperature=0.7,
118
+ top_p=0.9,
119
+ repetition_penalty=1.05
120
+ )
121
+
122
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
123
+
124
+
125
+ **Example Output:**
126
+
127
+ > *β€œElimu ni msingi wa maendeleo, humwezesha mtu kuelewa dunia na kuboresha maisha yake na jamii kwa ujumla.”*
128
 
129
  ---
130
 
131
+ ## πŸ” Model Performance Summary
132
 
133
+ | Task | Model | F1 | BLEU | ROUGE-L | Accuracy |
134
+ |------|--------|----|-------|----------|-----------|
135
+ | Sentiment Analysis | SALAMA LLM | 0.96 | β€” | β€” | 97.9% |
136
+ | Translation | SALAMA LLM | β€” | 0.49 | 0.61 | β€” |
137
+ | Q&A | SALAMA LLM | 0.88 | 0.50 | 0.59 | 95.5% |
138
+ | Named Entity Recognition | SALAMA LLM | 0.89 | β€” | β€” | β€” |
 
139
 
140
  ---
141
 
142
+ ## ⚑ Key Features
143
 
144
+ - 🧩 **Optimized for African low-resource NLP contexts**
145
+ - πŸ’¬ **Instruction-following in Swahili and English**
146
+ - βš™οΈ **Lightweight and efficient** (QLoRA-fine-tuned, runs on single 24 GB GPU)
147
+ - 🌍 **Culturally aligned text generation**
148
+ - πŸͺΆ **Open-source and extendable** to other African languages
149
 
150
  ---
151
 
152
+ ## 🚫 Limitations
153
 
154
+ - ⚠️ May underperform with heavy code-switching (Swahili-English mix)
155
+ - πŸ—£οΈ Not yet optimized for rare dialects or poetic forms
156
+ - πŸ“š Limited exposure to specialized (medical/legal) corpora
157
+ - πŸ”Š Relies on accurate STT transcription in end-to-end speech-to-speech use
 
 
158
 
159
  ---
160
 
161
+ ## πŸ”— Related Models
162
 
163
+ | Model | Description |
164
+ |--------|-------------|
165
+ | [`EYEDOL/salama-stt`](https://huggingface.co/EYEDOL/salama-stt) | Swahili Speech-to-Text model (Whisper-small fine-tuned) |
166
+ | [`EYEDOL/salama-tts`](https://huggingface.co/EYEDOL/salama-tts) | Swahili Text-to-Speech model (VITS architecture) |
167
 
168
  ---
169
 
170
+ ## πŸ“œ Citation
171
+
172
+ If you use this model in your research or development, please cite:
173
 
174
+ > **Adegoke, I., et al. (2025).** *SALAMA: Scalable African Language Multimodal AI Framework.*
175
+ > DressMatic AI Labs / EYEDOL Research. Technical Report.
 
176
 
177
  ---
178
 
179
+ ## 🀝 Acknowledgements
180
 
181
+ We acknowledge the contributions of:
182
 
183
+ - **Masakhane** β€” advancing open African NLP research
184
+ - **Jacaranda AI** β€” for UlizaLlama and Swahili pretraining corpora
185
+ - **Google Research** β€” for Gemma multilingual models
186
+ - **Meta AI** β€” for open-weight Llama foundation models
187
 
188
  ---
189
 
190
+ ## πŸͺ„ License
191
+
192
+ This model is released under the **Apache 2.0 License**.
193
+ You are free to use, modify, and distribute for research and commercial purposes with proper attribution.
194
+
195
+ ---
196
 
197
+ **Model Family:** *SALAMA β€” Scalable African LAnguage Multimodal AI Framework*
198
+ **Maintainer:** [EYEDOL Research / DressMatic AI Labs]
199
+ **Contact:** [email protected]
200
+ **Repository:** [https://huggingface.co/EYEDOL/salama-llm](https://huggingface.co/EYEDOL/salama-llm)