Bronsn
/

gemma-2-2b-it-pretrained

Text Generation

Model card Files Files and versions

gemma-2-2b-it-pretrained / README.md

Bronsn's picture

Update README.md

b15711e verified 10 months ago

|

history blame contribute delete

2.9 kB

	---
	language:
	- lg
	- en
	library_name: unsloth
	pipeline_tag: text-generation
	license: llama2
	base_model: unsloth/gemma-2-2b-it
	tags:
	- luganda
	- gemma
	- pretrained
	- wikipedia
	- unsloth
	datasets:
	- wikimedia/wikipedia
	---
	# Gemma-2-2b-it Pretrained for Luganda

	## Model Description
	This is a continued pretraining of the Gemma-2-2b-it model on Luganda text data. The model has been pretrained on Wikipedia Luganda articles to adapt it for Luganda language understanding and generation.

	## Model Details
	- Base Model: unsloth/gemma-2-2b-it
	- Pretraining Data:
	- Luganda Wikipedia articles (wikimedia/wikipedia 20231101.lg)
	- Training Method: LoRA with unsloth optimization
	- Context Length: 2048 tokens
	- Training Hardware: Tesla T4 GPU

	## Training Process
	The model was trained using the following configuration:

	### LoRA Configuration
	- LoRA rank (r): 128
	- Target modules:
	- q_proj, k_proj, v_proj, o_proj
	- gate_proj, up_proj, down_proj
	- embed_tokens, lm_head
	- LoRA alpha: 32
	- LoRA dropout: 0
	- Used RS-LoRA (Rank Stabilized LoRA)

	### Training Parameters
	- Batch size: 2 with gradient accumulation steps of 8
	- Learning rates:
	- General: 5e-5
	- Embeddings: 1e-6 (reduced for stability)
	- Training epochs: 10
	- Warmup steps: 10
	- Warmup ratio: 0.1
	- Weight decay: 0.01
	- Optimizer: AdamW 8-bit
	- LR scheduler: Linear

	### Data Processing
	The training data was processed using the following template:

	```python
	Ekyawandiikibwa kya Wikipedia
	### Omutwe: {title}

	### Akawayiro:
	{text}
	```

	## Checkpoints
	This repository contains multiple checkpoints from the pretraining process:
	- checkpoint-500
	- checkpoint-1000
	- checkpoint-1500
	- checkpoint-2000
	- checkpoint-2500
	- checkpoint-2530 (final)

	## Usage

	```python
	from unsloth import FastLanguageModel
	import torch

	# Load the model
	model, tokenizer = FastLanguageModel.from_pretrained(
	model_name = "Bronsn/gemma-2-2b-it-pretrained",
	max_seq_length = 2048,
	dtype = None, # Auto-detect
	load_in_4bit = True,
	)

	# Example usage
	text = "Ekyawandiikibwa kya Wikipedia\n### Omutwe: Uganda\n\n### Akawayiro:\n"
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model.generate(**inputs, max_new_tokens=100)
	generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
	```

	## Limitations
	- The model is specifically adapted for Luganda text understanding and generation
	- Performance may vary on dialectal variations or code-mixed text
	- The model maintains the base Gemma-2-2b-it limitations

	## Citation
	If you use this model, please cite:
	```
	@misc{luganda-gemma-pretrained,
	author = {Bronsn},
	title = {Gemma-2-2b-it Pretrained for Luganda},
	year = {2025},
	publisher = {HuggingFace}
	}
	```

	## License
	This model inherits the licensing terms from the base Gemma-2-2b-it model. For more details, please refer to [Gemma's license](https://ai.google.dev/gemma/terms).