Update README.md

3d04665 verified 5 months ago

7.66 kB

	---
	library_name: transformers
	license: gemma
	language:
	- ti
	base_model: luel/gemma-3-4b-tigrinya
	pipeline_tag: text-generation
	tags:
	- tigrinya
	- gemma
	- qa
	- instruct
	- low-resource
	inference: true
	model-index:
	- name: "gemma-3-4b-tigrinya-qa"
	results:
	- task:
	name: "Question Answering"
	type: "question-answering"
	dataset:
	name: "Tigrinya Q&A"
	type: "other"
	split: "validation (5%)"
	metrics:
	- name: "Perplexity"
	type: "perplexity"
	value: 2.79
	- name: "Eval Loss"
	type: "loss"
	value: 1.025
	---

	# Gemma-3-4B-Tigrinya-QA

	Gemma-3-4B-Tigrinya-QA is a two-stage fine-tuned adaptation of Google's [Gemma-3-4B](https://huggingface.co/google/gemma-3-4b-pt) specifically optimized for question-answering in Tigrinya (ትግርኛ).

	This model demonstrates good capabilities in answering questions across various domains, including history, culture, and general knowledge, in Tigrinya.

	Purpose: Tigrinya is a low-resource language with limited high-performance open models available. This release aims to reduce barriers to entry for research and application development in the Tigrinya language space.

	## Model Details

	- Model Type: Instruction-tuned Causal Language Model
	- Base Model: [luel/gemma-3-4b-tigrinya](https://huggingface.co/luel/gemma-3-4b-tigrinya) (stage 1: 60M tokens)
	- Parameters: 4 billion
	- Architecture: Gemma 3 with `Gemma3ForCausalLM`
	- Training Precision: BF16 with TF32 acceleration
	- Max Sequence Length: 1024 tokens

	## Training Process

	### Stage 1: General Text Generation
	- Base: [Gemma-3-4B](https://huggingface.co/google/gemma-3-4b-pt) -> [luel/gemma-3-4b-tigrinya](https://huggingface.co/luel/gemma-3-4b-tigrinya)
	- Data: 60M tokens of mixed-domain Tigrinya (news, web, literature)
	- Purpose: Language adaptation and vocabulary expansion

	### Stage 2: Instruction Fine-tuning (This Model)
	- Base: [luel/gemma-3-4b-tigrinya](https://huggingface.co/luel/gemma-3-4b-tigrinya) -> [luel/gemma-3-4b-tigrinya-qa](https://huggingface.co/luel/gemma-3-4b-tigrinya-qa)
	- Data: 67.5k curated Q&A pairs across governance, society, politics, culture, history, proverbs, etc.
	- Format: Gemma chat template with user/assistant turns

	## Dataset (Stage 2)

	- Size: 67.5k question-answer pairs
	- Language: Tigrinya (ትግርኛ)
	- Domains: Geography, culture, history, politics, general knowledge
	- Format: Chat template with `<start_of_turn>user` and `<start_of_turn>model` markers
	- Split: 95% training / 5% validation

	## Training Details (Stage 2)

	- Training Framework: HuggingFace Transformers with `SFTTrainer`
	- Optimizer: AdamW with cosine learning rate schedule
	- Learning Rate: 2e-5 with 3% warmup
	- Weight Decay: 0.01
	- Batch Size: 6 per device, 2 gradient accumulation steps (effective batch size: 12)
	- Epochs: 3
	- Evaluation: Every 500 steps
	- Mixed Precision: BF16 with gradient checkpointing
	- Hardware: NVIDIA GH200 120GB

	## Usage

	First, install the Transformers library (version 4.50 or higher):

	```bash
	pip install -U transformers
	```

	Then, you can use it for inference as follows:

	```python
	from transformers import Gemma3ForCausalLM, AutoTokenizer
	import torch

	model_id = "luel/gemma-3-4b-tigrinya-qa"

	tok = AutoTokenizer.from_pretrained(model_id)
	model = Gemma3ForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="cuda"
	)
	question = "ትግራይ ኣበይ ትርከብ?"
	messages = [{"role": "user", "content": f"{question}"}]
	prompt = tok.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)

	inputs = tok(prompt, return_tensors="pt").to(model.device)

	# you can add and adjust max_length, temperature, repetition_penalty, top_p...
	outputs = model.generate(**inputs, max_new_tokens=1024)

	full_response = tok.decode(outputs[0], skip_special_tokens=False)
	model_response = full_response.split("<start_of_turn>model\n")[-1].split("<end_of_turn>")[0].strip()

	print(model_response)
	```

	### Chat Template
	```text
	<start_of_turn>user
	{user_message}<end_of_turn>
	<start_of_turn>model
	{model_response}<end_of_turn>
	```

	## Examples

	Below is a short sample showcasing the model's response quality (generated with `max_new_tokens=512`, `temperature=0.6`, `top_p=0.9`).

	Q (user): `ሃገራት ቀርኒ ኣፍሪካ ዝኾኑ ኣየኖት እዮም?`<br>
	A (model): `ኢትዮጵያ፣ ጅቡቲ፣ ኤርትራን ሶማልያን።`

	Q (user): `ዓወት ኲናት ዓድዋ ንሃገራት ኣፍሪካ እንታይ መልእኽቲ እዩ ዘመሓላልፍ?`<br>
	A (model): `ዓወት ኲናት ዓድዋ ኣብ ልዕሊ ባዕዳዊ መግዛእቲ ዝተፈፀመን ሃገራት ኣፍሪካ ነፃነተን ክቃለሳ ከምዝኽእላ ዘርኣየሉን ታሪክ ፍፃመ እዩ።`

	Q (user): `ነዚ ምስላ እዚ ምልኣዮ። ምስ ቆልዓ ኣይትምከር`<br>
	A (model): ምስ ከልቢ ኣይትታሓባእ።

	Q (user): `ብዛዕባ ትግራይ ብሓፂሩ ንገረኒ`<br>
	A (model): `ትግራይ ኣብ ሰሜን ኢትዮጵያ እትርከብ ክልል እያ። ርእሰ ከተማኣ መቐለ ኮይና ካልኦት ዓበይቲ ከተማታት ድማ ኣክሱም፡ ሽረ፡ ማይጨው እየን። ትግራይ ዝተፈላለዩ ህዝቢ፣ ታሪኽ ዘለዎምን ሃፍታም ባህሊ ዘለዎን ብዙሕነት ዝነብረላ እያ። ብዘደንቕ ተፈጥሮን ስነ-ጥበባዊ ምልክታትን ዝመልአት ቦታ እያ።`

	Q (user): `ሰናይ ምምሕዳር እንታይ ማለት እዩ?`<br>
	A (model): `ሰናይ ምምሕዳር ንዜጋታት ብዘይምጭፍልቕ፣ ብዘይኣድልዎን ብዘይ ኣፈላላይን ብማዕረ ግልጋሎት ዝህብ መንግስታዊ ኣሰራርሓ የመላኽት። ሰናይ ምምሕዳር ኩሉ ዜጋ ማዕረ መሰላት ከምዝወሃቦ ይገብር።`


	## Evaluation

	\| Metric \| Split \| Value \|
	\|-------------------- \|------------ \|-------\|
	\| Evaluation Loss \| validation \| 1.025 \|
	\| Perplexity \| validation \| 2.79 \|
	\| Token Accuracy \| validation \| 75% \|
	\| Training Loss \| final \| 0.963 \|

	Validation corpus: 5% held-out split from 67.5k Q&A pairs

	## Limitations

	- Language Mixing: May occasionally mix (very rare) Amharic or English words in responses
	- Domain Scope: Optimized for general Q&A; may not handle highly specialized technical queries optimally
	- Factual Accuracy: Generated answers should be verified for factual correctness
	- Context Length: Limited to 1024 tokens for both input and output
	- Base Model Limitations: Inherits limitations from the base Gemma-3-4B architecture
	- No Multimodal: Text-only model; cannot process images, audio, or other media
	- Bias: May reflect societal biases present in training data

	## Citation

	```bibtex
	@misc{gemma-3-4b-tigrinya-qa,
	author = {Luel},
	title = {Gemma-3-4B-Tigrinya-QA: A Fine-tuned Question-Answering Model for Tigrinya},
	year = {2025},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/luel/gemma-3-4b-tigrinya-qa}}
	}
	```

	## Acknowledgements

	This model builds upon Google's Gemma 3 4B foundation and the Tigrinya language adaptation. We acknowledge Google for making their foundation models available to the community, enabling the development of language-specific instruction-tuned models like this one.