GPT-2 Medium Fine-tuned on WikiText-2 with LoRA

Model Description

This is a GPT-2 Medium (354M parameters) model fine-tuned on the WikiText-2 dataset using LoRA (Low-Rank Adaptation).

Base Model: gpt2-medium
Fine-tuning Method: LoRA (r=16, alpha=32)
Dataset: WikiText-2 (23,767 training samples)
Training Time: 1.81 hours on 2x Tesla T4 GPUs
Final Validation Perplexity: 20.73

Training Configuration

LoRA Configuration:
  - Rank (r): 16
  - Alpha: 32
  - Dropout: 0.05
  - Target Modules: c_attn, c_proj, c_fc
  - Trainable Parameters: 6.29M (1.74%)

Training Hyperparameters:
  - Learning Rate: 3e-4
  - Scheduler: Cosine
  - Batch Size: 16 per GPU
  - Gradient Accumulation: 4 steps
  - Effective Batch Size: 128
  - Epochs: 5
  - Mixed Precision: FP16

Performance

Metric	Value
Validation Perplexity	20.73
Training Loss	2.96
Training Time	1.81h
GPU Memory	~8GB per GPU

Usage

Installation

pip install transformers peft torch

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "gpt2-medium",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA weights
model = PeftModel.from_pretrained(
    base_model,
    "shiva9876/gpt2-medium-wikitext2-lora"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2-medium")

# Generate text
prompt = "The future of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_length=100,
    temperature=0.8,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Merging LoRA Weights (Optional)

For faster inference, merge LoRA weights with base model:

# Merge and save
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged_model")
tokenizer.save_pretrained("./merged_model")

# Load merged model directly
model = AutoModelForCausalLM.from_pretrained("./merged_model")

Training Details

Dataset

WikiText-2 is a collection of high-quality articles from Wikipedia. The dataset contains:

Training: 23,767 samples
Validation: 2,461 samples
Test: 2,891 samples

Training Procedure

Preprocessing: Tokenization with max length 512
Optimization: AdamW with fused implementation
Regularization: Weight decay 0.01, gradient clipping 1.0
Learning Rate Schedule: Cosine decay with 5% warmup
Early Stopping: Patience of 3 evaluations

Training Curves

The model showed smooth convergence:

Epoch 0: Loss 3.43 → PPL ~31
Epoch 1: Loss 3.03 → PPL ~21
Epoch 3: Loss 2.92 → PPL ~19
Epoch 5: Loss 2.87 → PPL ~18

Limitations

Fine-tuned on English Wikipedia text only
May not generalize well to other domains
LoRA adapters add small overhead during inference
Inherits biases from GPT-2 and Wikipedia

Intended Use

This model is intended for:

Text generation experiments
Research on parameter-efficient fine-tuning
Educational purposes
Transfer learning baselines

Citation

If you use this model, please cite:

@misc{gpt2-wikitext2-lora,
  author = {Shiva Jaiswal},
  title = {GPT-2 Medium Fine-tuned on WikiText-2 with LoRA},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/shiva9876/gpt2-medium-wikitext2-lora}
}

Acknowledgments

Base model: OpenAI's GPT-2
LoRA: Microsoft Research
Training: Kaggle Tesla T4 x 2 GPUs
Framework: HuggingFace Transformers, PEFT

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month: 40

Evaluation results

Validation Perplexity on WikiText-2
self-reported

20.730

View on Papers With Code