Update README.md
Browse files
README.md
CHANGED
|
@@ -1,199 +1,183 @@
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
---
|
| 5 |
|
| 6 |
-
#
|
| 7 |
|
| 8 |
-
|
| 9 |
|
|
|
|
| 10 |
|
|
|
|
| 11 |
|
| 12 |
## Model Details
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
|
| 16 |
-
|
| 17 |
|
| 18 |
-
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
-
|
| 21 |
-
- **
|
| 22 |
-
- **
|
| 23 |
-
- **
|
| 24 |
-
- **Language(s) (NLP):** [More Information Needed]
|
| 25 |
-
- **License:** [More Information Needed]
|
| 26 |
-
- **Finetuned from model [optional]:** [More Information Needed]
|
| 27 |
|
| 28 |
-
|
| 29 |
|
| 30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 31 |
|
| 32 |
-
|
| 33 |
-
- **Paper [optional]:** [More Information Needed]
|
| 34 |
-
- **Demo [optional]:** [More Information Needed]
|
| 35 |
|
| 36 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
-
|
| 39 |
|
| 40 |
-
|
| 41 |
|
| 42 |
-
|
|
|
|
|
|
|
| 43 |
|
| 44 |
-
|
| 45 |
|
| 46 |
-
|
|
|
|
|
|
|
| 47 |
|
| 48 |
-
|
| 49 |
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
|
| 52 |
-
|
| 53 |
|
| 54 |
-
|
|
|
|
| 55 |
|
| 56 |
-
|
|
|
|
| 57 |
|
| 58 |
-
|
|
|
|
| 59 |
|
| 60 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
-
|
| 63 |
|
| 64 |
-
|
| 65 |
|
| 66 |
-
|
|
|
|
| 67 |
|
| 68 |
-
|
|
|
|
| 69 |
|
| 70 |
-
|
|
|
|
| 71 |
|
| 72 |
-
|
|
|
|
| 73 |
|
| 74 |
-
|
|
|
|
| 75 |
|
| 76 |
-
## Training Details
|
| 77 |
-
|
| 78 |
-
### Training Data
|
| 79 |
-
|
| 80 |
-
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
|
| 81 |
-
|
| 82 |
-
[More Information Needed]
|
| 83 |
-
|
| 84 |
-
### Training Procedure
|
| 85 |
-
|
| 86 |
-
<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
|
| 87 |
-
|
| 88 |
-
#### Preprocessing [optional]
|
| 89 |
-
|
| 90 |
-
[More Information Needed]
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
#### Training Hyperparameters
|
| 94 |
-
|
| 95 |
-
- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
|
| 96 |
-
|
| 97 |
-
#### Speeds, Sizes, Times [optional]
|
| 98 |
-
|
| 99 |
-
<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
|
| 100 |
-
|
| 101 |
-
[More Information Needed]
|
| 102 |
|
| 103 |
## Evaluation
|
| 104 |
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
<!-- This should link to a Dataset Card if possible. -->
|
| 112 |
-
|
| 113 |
-
[More Information Needed]
|
| 114 |
-
|
| 115 |
-
#### Factors
|
| 116 |
-
|
| 117 |
-
<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
|
| 118 |
-
|
| 119 |
-
[More Information Needed]
|
| 120 |
-
|
| 121 |
-
#### Metrics
|
| 122 |
-
|
| 123 |
-
<!-- These are the evaluation metrics being used, ideally with a description of why. -->
|
| 124 |
-
|
| 125 |
-
[More Information Needed]
|
| 126 |
-
|
| 127 |
-
### Results
|
| 128 |
-
|
| 129 |
-
[More Information Needed]
|
| 130 |
-
|
| 131 |
-
#### Summary
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
## Model Examination [optional]
|
| 136 |
-
|
| 137 |
-
<!-- Relevant interpretability work for the model goes here -->
|
| 138 |
-
|
| 139 |
-
[More Information Needed]
|
| 140 |
-
|
| 141 |
-
## Environmental Impact
|
| 142 |
-
|
| 143 |
-
<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
|
| 144 |
-
|
| 145 |
-
Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
|
| 146 |
-
|
| 147 |
-
- **Hardware Type:** [More Information Needed]
|
| 148 |
-
- **Hours used:** [More Information Needed]
|
| 149 |
-
- **Cloud Provider:** [More Information Needed]
|
| 150 |
-
- **Compute Region:** [More Information Needed]
|
| 151 |
-
- **Carbon Emitted:** [More Information Needed]
|
| 152 |
-
|
| 153 |
-
## Technical Specifications [optional]
|
| 154 |
-
|
| 155 |
-
### Model Architecture and Objective
|
| 156 |
-
|
| 157 |
-
[More Information Needed]
|
| 158 |
-
|
| 159 |
-
### Compute Infrastructure
|
| 160 |
-
|
| 161 |
-
[More Information Needed]
|
| 162 |
-
|
| 163 |
-
#### Hardware
|
| 164 |
-
|
| 165 |
-
[More Information Needed]
|
| 166 |
-
|
| 167 |
-
#### Software
|
| 168 |
-
|
| 169 |
-
[More Information Needed]
|
| 170 |
-
|
| 171 |
-
## Citation [optional]
|
| 172 |
-
|
| 173 |
-
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
|
| 174 |
-
|
| 175 |
-
**BibTeX:**
|
| 176 |
-
|
| 177 |
-
[More Information Needed]
|
| 178 |
-
|
| 179 |
-
**APA:**
|
| 180 |
-
|
| 181 |
-
[More Information Needed]
|
| 182 |
-
|
| 183 |
-
## Glossary [optional]
|
| 184 |
-
|
| 185 |
-
<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
|
| 186 |
|
| 187 |
-
|
| 188 |
|
| 189 |
-
##
|
| 190 |
|
| 191 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 192 |
|
| 193 |
-
##
|
| 194 |
|
| 195 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 196 |
|
| 197 |
-
##
|
| 198 |
|
| 199 |
-
|
|
|
|
| 1 |
---
|
| 2 |
library_name: transformers
|
| 3 |
+
license: gemma
|
| 4 |
+
language:
|
| 5 |
+
- ti
|
| 6 |
+
base_model: luel/gemma-3-4b-tigrinya
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
tags:
|
| 9 |
+
- tigrinya
|
| 10 |
+
- gemma
|
| 11 |
+
- qa
|
| 12 |
+
- instruct
|
| 13 |
+
- low-resource
|
| 14 |
+
inference: true
|
| 15 |
+
model-index:
|
| 16 |
+
- name: "gemma-3-4b-tigrinya-qa"
|
| 17 |
+
results:
|
| 18 |
+
- task:
|
| 19 |
+
name: "Question Answering"
|
| 20 |
+
type: "question-answering"
|
| 21 |
+
dataset:
|
| 22 |
+
name: "Tigrinya Q&A"
|
| 23 |
+
type: "other"
|
| 24 |
+
split: "validation (5%)"
|
| 25 |
+
metrics:
|
| 26 |
+
- name: "Perplexity"
|
| 27 |
+
type: "perplexity"
|
| 28 |
+
value: 2.79
|
| 29 |
+
- name: "Eval Loss"
|
| 30 |
+
type: "loss"
|
| 31 |
+
value: 1.025
|
| 32 |
---
|
| 33 |
|
| 34 |
+
# Gemma-3-4B-Tigrinya-QA
|
| 35 |
|
| 36 |
+
Gemma-3-4B-Tigrinya-QA is a two-stage fine-tuned adaptation of Google's [Gemma-3-4B](https://huggingface.co/google/gemma-3-4b-pt) specifically optimized for question-answering in *Tigrinya* (ትግርኛ).
|
| 37 |
|
| 38 |
+
This model demonstrates good capabilities in answering questions across various domains, including history, culture, and general knowledge, in Tigrinya.
|
| 39 |
|
| 40 |
+
**Purpose:** Tigrinya is a low-resource language with limited high-performance open models available. This release aims to reduce barriers to entry for research and application development in the Tigrinya language space.
|
| 41 |
|
| 42 |
## Model Details
|
| 43 |
|
| 44 |
+
- **Model Type:** Instruction-tuned Causal Language Model
|
| 45 |
+
- **Base Model:** [luel/gemma-3-4b-tigrinya](https://huggingface.co/luel/gemma-3-4b-tigrinya) (stage 1: 60M tokens)
|
| 46 |
+
- **Parameters:** 4 billion
|
| 47 |
+
- **Architecture:** Gemma 3 with `Gemma3ForCausalLM`
|
| 48 |
+
- **Training Precision:** BF16 with TF32 acceleration
|
| 49 |
+
- **Max Sequence Length:** 1024 tokens
|
| 50 |
|
| 51 |
+
## Training Process
|
| 52 |
|
| 53 |
+
### Stage 1: General Text Generation
|
| 54 |
+
- **Base:** [Gemma-3-4B](https://huggingface.co/google/gemma-3-4b-pt) -> [luel/gemma-3-4b-tigrinya](https://huggingface.co/luel/gemma-3-4b-tigrinya)
|
| 55 |
+
- **Data:** 60M tokens of mixed-domain Tigrinya (news, web, literature)
|
| 56 |
+
- **Purpose:** Language adaptation and vocabulary expansion
|
| 57 |
|
| 58 |
+
### Stage 2: Instruction Fine-tuning (This Model)
|
| 59 |
+
- **Base:** [luel/gemma-3-4b-tigrinya](https://huggingface.co/luel/gemma-3-4b-tigrinya) -> [luel/gemma-3-4b-tigrinya-qa](https://huggingface.co/luel/gemma-3-4b-tigrinya-qa)
|
| 60 |
+
- **Data:** 67.5k curated Q&A pairs across governance, society, politics, culture, history, proverbs, etc.
|
| 61 |
+
- **Format:** Gemma chat template with user/assistant turns
|
|
|
|
|
|
|
|
|
|
| 62 |
|
| 63 |
+
## Dataset (Stage 2)
|
| 64 |
|
| 65 |
+
- **Size:** 67.5k question-answer pairs
|
| 66 |
+
- **Language:** Tigrinya (ትግርኛ)
|
| 67 |
+
- **Domains:** Geography, culture, history, politics, general knowledge
|
| 68 |
+
- **Format:** Chat template with `<start_of_turn>user` and `<start_of_turn>model` markers
|
| 69 |
+
- **Split:** 95% training / 5% validation
|
| 70 |
|
| 71 |
+
## Training Details (Stage 2)
|
|
|
|
|
|
|
| 72 |
|
| 73 |
+
- **Training Framework:** HuggingFace Transformers with `SFTTrainer`
|
| 74 |
+
- **Optimizer:** AdamW with cosine learning rate schedule
|
| 75 |
+
- **Learning Rate:** 2e-5 with 3% warmup
|
| 76 |
+
- **Weight Decay:** 0.01
|
| 77 |
+
- **Batch Size:** 6 per device, 2 gradient accumulation steps (effective batch size: 12)
|
| 78 |
+
- **Epochs:** 3
|
| 79 |
+
- **Evaluation:** Every 500 steps
|
| 80 |
+
- **Mixed Precision:** BF16 with gradient checkpointing
|
| 81 |
+
- **Hardware:** NVIDIA GH200 120GB
|
| 82 |
|
| 83 |
+
## Usage
|
| 84 |
|
| 85 |
+
First, install the Transformers library (version 4.50 or higher):
|
| 86 |
|
| 87 |
+
```bash
|
| 88 |
+
pip install -U transformers
|
| 89 |
+
```
|
| 90 |
|
| 91 |
+
Then, you can use it for inference as follows:
|
| 92 |
|
| 93 |
+
```python
|
| 94 |
+
from transformers import Gemma3ForCausalLM, AutoTokenizer
|
| 95 |
+
import torch
|
| 96 |
|
| 97 |
+
model_id = "luel/gemma-3-4b-tigrinya-qa"
|
| 98 |
|
| 99 |
+
tok = AutoTokenizer.from_pretrained(model_id)
|
| 100 |
+
model = Gemma3ForCausalLM.from_pretrained(
|
| 101 |
+
model_id,
|
| 102 |
+
torch_dtype=torch.bfloat16,
|
| 103 |
+
device_map="cuda"
|
| 104 |
+
)
|
| 105 |
+
question = "ትግራይ ኣበይ ���ርከብ?"
|
| 106 |
+
messages = [{"role": "user", "content": f"{question}"}]
|
| 107 |
+
prompt = tok.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
|
| 108 |
|
| 109 |
+
inputs = tok(prompt, return_tensors="pt").to(model.device)
|
| 110 |
|
| 111 |
+
# you can add and adjust max_length, temperature, repetition_penalty, top_p...
|
| 112 |
+
outputs = model.generate(**inputs, max_new_tokens=1024)
|
| 113 |
|
| 114 |
+
full_response = tok.decode(outputs[0], skip_special_tokens=False)
|
| 115 |
+
model_response = full_response.split("<start_of_turn>model\n")[-1].split("<end_of_turn>")[0].strip()
|
| 116 |
|
| 117 |
+
print(model_response)
|
| 118 |
+
```
|
| 119 |
|
| 120 |
+
### Chat Template
|
| 121 |
+
```text
|
| 122 |
+
<start_of_turn>user
|
| 123 |
+
{user_message}<end_of_turn>
|
| 124 |
+
<start_of_turn>model
|
| 125 |
+
{model_response}<end_of_turn>
|
| 126 |
+
```
|
| 127 |
|
| 128 |
+
## Examples
|
| 129 |
|
| 130 |
+
Below is a short sample showcasing the model's response quality (generated with `max_new_tokens=512`, `temperature=0.6`, `top_p=0.9`).
|
| 131 |
|
| 132 |
+
**Q (user):** `ሃገራት ቀርኒ ኣፍሪካ ዝኾኑ ኣየኖት እዮም?`<br>
|
| 133 |
+
**A (model):** `ኢትዮጵያ፣ ጅቡቲ፣ ኤርትራን ሶማልያን።`
|
| 134 |
|
| 135 |
+
**Q (user):** `ዓወት ኲናት ዓድዋ ንሃገራት ኣፍሪካ እንታይ መልእኽቲ እዩ ዘመሓላልፍ?`<br>
|
| 136 |
+
**A (model):** `ዓወት ኲናት ዓድዋ ኣብ ልዕሊ ባዕዳዊ መግዛእቲ ዝተፈፀመን ሃገራት ኣፍሪካ ነፃነተን ክቃለሳ ከምዝኽእላ ዘርኣየሉን ታሪክ ፍፃመ እዩ።`
|
| 137 |
|
| 138 |
+
**Q (user):** `ነዚ ምስላ እዚ ምልኣዮ። ምስ ቆልዓ ኣይትምከር`<br>
|
| 139 |
+
**A (model):** ምስ ከልቢ ኣይትታሓባእ።
|
| 140 |
|
| 141 |
+
**Q (user):** `ብዛዕባ ትግራይ ብሓፂሩ ንገረኒ`<br>
|
| 142 |
+
**A (model):** `ትግራይ ኣብ ሰሜን ኢትዮጵያ እትርከብ ክልል እያ። ርእሰ ከተማኣ መቐለ ኮይና ካልኦት ዓበይቲ ከተማታት ድማ ኣክሱም፡ ሽረ፡ ማይጨው እየን። ትግራይ ዝተፈላለዩ ህዝቢ፣ ታሪኽ ዘለዎምን ሃፍታም ባህሊ ዘለዎን ብዙሕነት ዝነብረላ እያ። ብዘደንቕ ተፈጥሮን ስነ-ጥበባዊ ምልክታትን ዝመልአት ቦታ እያ።`
|
| 143 |
|
| 144 |
+
**Q (user):** `ሰናይ ምምሕዳር እንታይ ማለት እዩ?`<br>
|
| 145 |
+
**A (model):** `ሰናይ ምምሕዳር ንዜጋታት ብዘይምጭፍልቕ፣ ብዘይኣድልዎን ብዘይ ኣፈላላይን ብማዕረ ግልጋሎት ዝህብ መንግስታዊ ኣሰራርሓ የመላኽት። ሰናይ ምምሕዳር ኩሉ ዜጋ ማዕረ መሰላት ከምዝወሃቦ ይገብር።`
|
| 146 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 147 |
|
| 148 |
## Evaluation
|
| 149 |
|
| 150 |
+
| Metric | Split | Value |
|
| 151 |
+
|-------------------- |------------ |-------|
|
| 152 |
+
| Evaluation Loss | validation | 1.025 |
|
| 153 |
+
| Perplexity | validation | 2.79 |
|
| 154 |
+
| Token Accuracy | validation | 75% |
|
| 155 |
+
| Training Loss | final | 0.963 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 156 |
|
| 157 |
+
*Validation corpus: 5% held-out split from 67.5k Q&A pairs*
|
| 158 |
|
| 159 |
+
## Limitations
|
| 160 |
|
| 161 |
+
- **Language Mixing:** May occasionally mix (very rare) Amharic or English words in responses
|
| 162 |
+
- **Domain Scope:** Optimized for general Q&A; may not handle highly specialized technical queries optimally
|
| 163 |
+
- **Factual Accuracy:** Generated answers should be verified for factual correctness
|
| 164 |
+
- **Context Length:** Limited to 1024 tokens for both input and output
|
| 165 |
+
- **Base Model Limitations:** Inherits limitations from the base Gemma-3-4B architecture
|
| 166 |
+
- **No Multimodal:** Text-only model; cannot process images, audio, or other media
|
| 167 |
+
- **Bias:** May reflect societal biases present in training data
|
| 168 |
|
| 169 |
+
## Citation
|
| 170 |
|
| 171 |
+
```bibtex
|
| 172 |
+
@misc{gemma-3-4b-tigrinya-qa,
|
| 173 |
+
author = {Luel},
|
| 174 |
+
title = {Gemma-3-4B-Tigrinya-QA: A Fine-tuned Question-Answering Model for Tigrinya},
|
| 175 |
+
year = {2025},
|
| 176 |
+
publisher = {Hugging Face},
|
| 177 |
+
howpublished = {\url{https://huggingface.co/luel/gemma-3-4b-tigrinya-qa}}
|
| 178 |
+
}
|
| 179 |
+
```
|
| 180 |
|
| 181 |
+
## Acknowledgements
|
| 182 |
|
| 183 |
+
This model builds upon Google's Gemma 3 4B foundation and the Tigrinya language adaptation. We acknowledge Google for making their foundation models available to the community, enabling the development of language-specific instruction-tuned models like this one.
|