library_name: transformers
license: gemma
language:
- ti
base_model: luel/gemma-3-4b-tigrinya
pipeline_tag: text-generation
tags:
- tigrinya
- gemma
- qa
- instruct
- low-resource
inference: true
model-index:
- name: gemma-3-4b-tigrinya-qa
results:
- task:
name: Question Answering
type: question-answering
dataset:
name: Tigrinya Q&A
type: other
split: validation (5%)
metrics:
- name: Perplexity
type: perplexity
value: 2.79
- name: Eval Loss
type: loss
value: 1.025
Gemma-3-4B-Tigrinya-QA
Gemma-3-4B-Tigrinya-QA is a two-stage fine-tuned adaptation of Google's Gemma-3-4B specifically optimized for question-answering in Tigrinya (ትግርኛ).
This model demonstrates good capabilities in answering questions across various domains, including history, culture, and general knowledge, in Tigrinya.
Purpose: Tigrinya is a low-resource language with limited high-performance open models available. This release aims to reduce barriers to entry for research and application development in the Tigrinya language space.
Model Details
- Model Type: Instruction-tuned Causal Language Model
- Base Model: luel/gemma-3-4b-tigrinya (stage 1: 60M tokens)
- Parameters: 4 billion
- Architecture: Gemma 3 with
Gemma3ForCausalLM - Training Precision: BF16 with TF32 acceleration
- Max Sequence Length: 1024 tokens
Training Process
Stage 1: General Text Generation
- Base: Gemma-3-4B -> luel/gemma-3-4b-tigrinya
- Data: 60M tokens of mixed-domain Tigrinya (news, web, literature)
- Purpose: Language adaptation and vocabulary expansion
Stage 2: Instruction Fine-tuning (This Model)
- Base: luel/gemma-3-4b-tigrinya -> luel/gemma-3-4b-tigrinya-qa
- Data: 67.5k curated Q&A pairs across governance, society, politics, culture, history, proverbs, etc.
- Format: Gemma chat template with user/assistant turns
Dataset (Stage 2)
- Size: 67.5k question-answer pairs
- Language: Tigrinya (ትግርኛ)
- Domains: Geography, culture, history, politics, general knowledge
- Format: Chat template with
<start_of_turn>userand<start_of_turn>modelmarkers - Split: 95% training / 5% validation
Training Details (Stage 2)
- Training Framework: HuggingFace Transformers with
SFTTrainer - Optimizer: AdamW with cosine learning rate schedule
- Learning Rate: 2e-5 with 3% warmup
- Weight Decay: 0.01
- Batch Size: 6 per device, 2 gradient accumulation steps (effective batch size: 12)
- Epochs: 3
- Evaluation: Every 500 steps
- Mixed Precision: BF16 with gradient checkpointing
- Hardware: NVIDIA GH200 120GB
Usage
First, install the Transformers library (version 4.50 or higher):
pip install -U transformers
Then, you can use it for inference as follows:
from transformers import Gemma3ForCausalLM, AutoTokenizer
import torch
model_id = "luel/gemma-3-4b-tigrinya-qa"
tok = AutoTokenizer.from_pretrained(model_id)
model = Gemma3ForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="cuda"
)
question = "ትግራይ ኣበይ ትርከብ?"
messages = [{"role": "user", "content": f"{question}"}]
prompt = tok.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = tok(prompt, return_tensors="pt").to(model.device)
# you can add and adjust max_length, temperature, repetition_penalty, top_p...
outputs = model.generate(**inputs, max_new_tokens=1024)
full_response = tok.decode(outputs[0], skip_special_tokens=False)
model_response = full_response.split("<start_of_turn>model\n")[-1].split("<end_of_turn>")[0].strip()
print(model_response)
Chat Template
<start_of_turn>user
{user_message}<end_of_turn>
<start_of_turn>model
{model_response}<end_of_turn>
Examples
Below is a short sample showcasing the model's response quality (generated with max_new_tokens=512, temperature=0.6, top_p=0.9).
Q (user): ሃገራት ቀርኒ ኣፍሪካ ዝኾኑ ኣየኖት እዮም?
A (model): ኢትዮጵያ፣ ጅቡቲ፣ ኤርትራን ሶማልያን።
Q (user): ዓወት ኲናት ዓድዋ ንሃገራት ኣፍሪካ እንታይ መልእኽቲ እዩ ዘመሓላልፍ?
A (model): ዓወት ኲናት ዓድዋ ኣብ ልዕሊ ባዕዳዊ መግዛእቲ ዝተፈፀመን ሃገራት ኣፍሪካ ነፃነተን ክቃለሳ ከምዝኽእላ ዘርኣየሉን ታሪክ ፍፃመ እዩ።
Q (user): ነዚ ምስላ እዚ ምልኣዮ። ምስ ቆልዓ ኣይትምከር
A (model): ምስ ከልቢ ኣይትታሓባእ።
Q (user): ብዛዕባ ትግራይ ብሓፂሩ ንገረኒ
A (model): ትግራይ ኣብ ሰሜን ኢትዮጵያ እትርከብ ክልል እያ። ርእሰ ከተማኣ መቐለ ኮይና ካልኦት ዓበይቲ ከተማታት ድማ ኣክሱም፡ ሽረ፡ ማይጨው እየን። ትግራይ ዝተፈላለዩ ህዝቢ፣ ታሪኽ ዘለዎምን ሃፍታም ባህሊ ዘለዎን ብዙሕነት ዝነብረላ እያ። ብዘደንቕ ተፈጥሮን ስነ-ጥበባዊ ምልክታትን ዝመልአት ቦታ እያ።
Q (user): ሰናይ ምምሕዳር እንታይ ማለት እዩ?
A (model): ሰናይ ምምሕዳር ንዜጋታት ብዘይምጭፍልቕ፣ ብዘይኣድልዎን ብዘይ ኣፈላላይን ብማዕረ ግልጋሎት ዝህብ መንግስታዊ ኣሰራርሓ የመላኽት። ሰናይ ምምሕዳር ኩሉ ዜጋ ማዕረ መሰላት ከምዝወሃቦ ይገብር።
Evaluation
| Metric | Split | Value |
|---|---|---|
| Evaluation Loss | validation | 1.025 |
| Perplexity | validation | 2.79 |
| Token Accuracy | validation | 75% |
| Training Loss | final | 0.963 |
Validation corpus: 5% held-out split from 67.5k Q&A pairs
Limitations
- Language Mixing: May occasionally mix (very rare) Amharic or English words in responses
- Domain Scope: Optimized for general Q&A; may not handle highly specialized technical queries optimally
- Factual Accuracy: Generated answers should be verified for factual correctness
- Context Length: Limited to 1024 tokens for both input and output
- Base Model Limitations: Inherits limitations from the base Gemma-3-4B architecture
- No Multimodal: Text-only model; cannot process images, audio, or other media
- Bias: May reflect societal biases present in training data
Citation
@misc{gemma-3-4b-tigrinya-qa,
author = {Luel},
title = {Gemma-3-4B-Tigrinya-QA: A Fine-tuned Question-Answering Model for Tigrinya},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/luel/gemma-3-4b-tigrinya-qa}}
}
Acknowledgements
This model builds upon Google's Gemma 3 4B foundation and the Tigrinya language adaptation. We acknowledge Google for making their foundation models available to the community, enabling the development of language-specific instruction-tuned models like this one.