GPT-2 Medium Fine-tuned on WikiText-2 with LoRA
Model Description
This is a GPT-2 Medium (354M parameters) model fine-tuned on the WikiText-2 dataset using LoRA (Low-Rank Adaptation).
- Base Model: gpt2-medium
- Fine-tuning Method: LoRA (r=16, alpha=32)
- Dataset: WikiText-2 (23,767 training samples)
- Training Time: 1.81 hours on 2x Tesla T4 GPUs
- Final Validation Perplexity: 20.73
Training Configuration
LoRA Configuration:
- Rank (r): 16
- Alpha: 32
- Dropout: 0.05
- Target Modules: c_attn, c_proj, c_fc
- Trainable Parameters: 6.29M (1.74%)
Training Hyperparameters:
- Learning Rate: 3e-4
- Scheduler: Cosine
- Batch Size: 16 per GPU
- Gradient Accumulation: 4 steps
- Effective Batch Size: 128
- Epochs: 5
- Mixed Precision: FP16
Performance
| Metric | Value |
|---|---|
| Validation Perplexity | 20.73 |
| Training Loss | 2.96 |
| Training Time | 1.81h |
| GPU Memory | ~8GB per GPU |
Usage
Installation
pip install transformers peft torch
Loading the Model
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
"gpt2-medium",
torch_dtype=torch.float16,
device_map="auto"
)
# Load LoRA weights
model = PeftModel.from_pretrained(
base_model,
"shiva9876/gpt2-medium-wikitext2-lora"
)
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2-medium")
# Generate text
prompt = "The future of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_length=100,
temperature=0.8,
top_p=0.9,
do_sample=True
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Merging LoRA Weights (Optional)
For faster inference, merge LoRA weights with base model:
# Merge and save
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged_model")
tokenizer.save_pretrained("./merged_model")
# Load merged model directly
model = AutoModelForCausalLM.from_pretrained("./merged_model")
Training Details
Dataset
WikiText-2 is a collection of high-quality articles from Wikipedia. The dataset contains:
- Training: 23,767 samples
- Validation: 2,461 samples
- Test: 2,891 samples
Training Procedure
- Preprocessing: Tokenization with max length 512
- Optimization: AdamW with fused implementation
- Regularization: Weight decay 0.01, gradient clipping 1.0
- Learning Rate Schedule: Cosine decay with 5% warmup
- Early Stopping: Patience of 3 evaluations
Training Curves
The model showed smooth convergence:
- Epoch 0: Loss 3.43 β PPL ~31
- Epoch 1: Loss 3.03 β PPL ~21
- Epoch 3: Loss 2.92 β PPL ~19
- Epoch 5: Loss 2.87 β PPL ~18
Limitations
- Fine-tuned on English Wikipedia text only
- May not generalize well to other domains
- LoRA adapters add small overhead during inference
- Inherits biases from GPT-2 and Wikipedia
Intended Use
This model is intended for:
- Text generation experiments
- Research on parameter-efficient fine-tuning
- Educational purposes
- Transfer learning baselines
Citation
If you use this model, please cite:
@misc{gpt2-wikitext2-lora,
author = {Shiva Jaiswal},
title = {GPT-2 Medium Fine-tuned on WikiText-2 with LoRA},
year = {2025},
publisher = {HuggingFace},
url = {https://huggingface.co/shiva9876/gpt2-medium-wikitext2-lora}
}
Acknowledgments
- Base model: OpenAI's GPT-2
- LoRA: Microsoft Research
- Training: Kaggle Tesla T4 x 2 GPUs
- Framework: HuggingFace Transformers, PEFT
Contact
For questions or issues, please open an issue on the model repository.
- Downloads last month
- 40
Evaluation results
- Validation Perplexity on WikiText-2self-reported20.730