luel's picture
Update README.md
3d04665 verified
metadata
library_name: transformers
license: gemma
language:
  - ti
base_model: luel/gemma-3-4b-tigrinya
pipeline_tag: text-generation
tags:
  - tigrinya
  - gemma
  - qa
  - instruct
  - low-resource
inference: true
model-index:
  - name: gemma-3-4b-tigrinya-qa
    results:
      - task:
          name: Question Answering
          type: question-answering
        dataset:
          name: Tigrinya Q&A
          type: other
          split: validation (5%)
        metrics:
          - name: Perplexity
            type: perplexity
            value: 2.79
          - name: Eval Loss
            type: loss
            value: 1.025

Gemma-3-4B-Tigrinya-QA

Gemma-3-4B-Tigrinya-QA is a two-stage fine-tuned adaptation of Google's Gemma-3-4B specifically optimized for question-answering in Tigrinya (ትግርኛ).

This model demonstrates good capabilities in answering questions across various domains, including history, culture, and general knowledge, in Tigrinya.

Purpose: Tigrinya is a low-resource language with limited high-performance open models available. This release aims to reduce barriers to entry for research and application development in the Tigrinya language space.

Model Details

  • Model Type: Instruction-tuned Causal Language Model
  • Base Model: luel/gemma-3-4b-tigrinya (stage 1: 60M tokens)
  • Parameters: 4 billion
  • Architecture: Gemma 3 with Gemma3ForCausalLM
  • Training Precision: BF16 with TF32 acceleration
  • Max Sequence Length: 1024 tokens

Training Process

Stage 1: General Text Generation

Stage 2: Instruction Fine-tuning (This Model)

Dataset (Stage 2)

  • Size: 67.5k question-answer pairs
  • Language: Tigrinya (ትግርኛ)
  • Domains: Geography, culture, history, politics, general knowledge
  • Format: Chat template with <start_of_turn>user and <start_of_turn>model markers
  • Split: 95% training / 5% validation

Training Details (Stage 2)

  • Training Framework: HuggingFace Transformers with SFTTrainer
  • Optimizer: AdamW with cosine learning rate schedule
  • Learning Rate: 2e-5 with 3% warmup
  • Weight Decay: 0.01
  • Batch Size: 6 per device, 2 gradient accumulation steps (effective batch size: 12)
  • Epochs: 3
  • Evaluation: Every 500 steps
  • Mixed Precision: BF16 with gradient checkpointing
  • Hardware: NVIDIA GH200 120GB

Usage

First, install the Transformers library (version 4.50 or higher):

pip install -U transformers

Then, you can use it for inference as follows:

from transformers import Gemma3ForCausalLM, AutoTokenizer
import torch

model_id = "luel/gemma-3-4b-tigrinya-qa"

tok = AutoTokenizer.from_pretrained(model_id)
model = Gemma3ForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16, 
    device_map="cuda"
)
question = "ትግራይ ኣበይ ትርከብ?"
messages = [{"role": "user", "content": f"{question}"}]
prompt = tok.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)

inputs = tok(prompt, return_tensors="pt").to(model.device)

# you can add and adjust max_length, temperature, repetition_penalty, top_p...
outputs = model.generate(**inputs, max_new_tokens=1024)

full_response = tok.decode(outputs[0], skip_special_tokens=False)
model_response = full_response.split("<start_of_turn>model\n")[-1].split("<end_of_turn>")[0].strip()

print(model_response)

Chat Template

<start_of_turn>user
{user_message}<end_of_turn>
<start_of_turn>model
{model_response}<end_of_turn>

Examples

Below is a short sample showcasing the model's response quality (generated with max_new_tokens=512, temperature=0.6, top_p=0.9).

Q (user): ሃገራት ቀርኒ ኣፍሪካ ዝኾኑ ኣየኖት እዮም?
A (model): ኢትዮጵያ፣ ጅቡቲ፣ ኤርትራን ሶማልያን።

Q (user): ዓወት ኲናት ዓድዋ ንሃገራት ኣፍሪካ እንታይ መልእኽቲ እዩ ዘመሓላልፍ?
A (model): ዓወት ኲናት ዓድዋ ኣብ ልዕሊ ባዕዳዊ መግዛእቲ ዝተፈፀመን ሃገራት ኣፍሪካ ነፃነተን ክቃለሳ ከምዝኽእላ ዘርኣየሉን ታሪክ ፍፃመ እዩ።

Q (user): ነዚ ምስላ እዚ ምልኣዮ። ምስ ቆልዓ ኣይትምከር
A (model): ምስ ከልቢ ኣይትታሓባእ።

Q (user): ብዛዕባ ትግራይ ብሓፂሩ ንገረኒ
A (model): ትግራይ ኣብ ሰሜን ኢትዮጵያ እትርከብ ክልል እያ። ርእሰ ከተማኣ መቐለ ኮይና ካልኦት ዓበይቲ ከተማታት ድማ ኣክሱም፡ ሽረ፡ ማይጨው እየን። ትግራይ ዝተፈላለዩ ህዝቢ፣ ታሪኽ ዘለዎምን ሃፍታም ባህሊ ዘለዎን ብዙሕነት ዝነብረላ እያ። ብዘደንቕ ተፈጥሮን ስነ-ጥበባዊ ምልክታትን ዝመልአት ቦታ እያ።

Q (user): ሰናይ ምምሕዳር እንታይ ማለት እዩ?
A (model): ሰናይ ምምሕዳር ንዜጋታት ብዘይምጭፍልቕ፣ ብዘይኣድልዎን ብዘይ ኣፈላላይን ብማዕረ ግልጋሎት ዝህብ መንግስታዊ ኣሰራርሓ የመላኽት። ሰናይ ምምሕዳር ኩሉ ዜጋ ማዕረ መሰላት ከምዝወሃቦ ይገብር።

Evaluation

Metric Split Value
Evaluation Loss validation 1.025
Perplexity validation 2.79
Token Accuracy validation 75%
Training Loss final 0.963

Validation corpus: 5% held-out split from 67.5k Q&A pairs

Limitations

  • Language Mixing: May occasionally mix (very rare) Amharic or English words in responses
  • Domain Scope: Optimized for general Q&A; may not handle highly specialized technical queries optimally
  • Factual Accuracy: Generated answers should be verified for factual correctness
  • Context Length: Limited to 1024 tokens for both input and output
  • Base Model Limitations: Inherits limitations from the base Gemma-3-4B architecture
  • No Multimodal: Text-only model; cannot process images, audio, or other media
  • Bias: May reflect societal biases present in training data

Citation

@misc{gemma-3-4b-tigrinya-qa,
  author = {Luel},
  title = {Gemma-3-4B-Tigrinya-QA: A Fine-tuned Question-Answering Model for Tigrinya},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/luel/gemma-3-4b-tigrinya-qa}}
}

Acknowledgements

This model builds upon Google's Gemma 3 4B foundation and the Tigrinya language adaptation. We acknowledge Google for making their foundation models available to the community, enabling the development of language-specific instruction-tuned models like this one.