GPT-2 Medium Fine-tuned on WikiText-2 with LoRA

Model Description

This is a GPT-2 Medium (354M parameters) model fine-tuned on the WikiText-2 dataset using LoRA (Low-Rank Adaptation).

  • Base Model: gpt2-medium
  • Fine-tuning Method: LoRA (r=16, alpha=32)
  • Dataset: WikiText-2 (23,767 training samples)
  • Training Time: 1.81 hours on 2x Tesla T4 GPUs
  • Final Validation Perplexity: 20.73

Training Configuration

LoRA Configuration:
  - Rank (r): 16
  - Alpha: 32
  - Dropout: 0.05
  - Target Modules: c_attn, c_proj, c_fc
  - Trainable Parameters: 6.29M (1.74%)

Training Hyperparameters:
  - Learning Rate: 3e-4
  - Scheduler: Cosine
  - Batch Size: 16 per GPU
  - Gradient Accumulation: 4 steps
  - Effective Batch Size: 128
  - Epochs: 5
  - Mixed Precision: FP16

Performance

Metric Value
Validation Perplexity 20.73
Training Loss 2.96
Training Time 1.81h
GPU Memory ~8GB per GPU

Usage

Installation

pip install transformers peft torch

Loading the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model
base_model = AutoModelForCausalLM.from_pretrained(
    "gpt2-medium",
    torch_dtype=torch.float16,
    device_map="auto"
)

# Load LoRA weights
model = PeftModel.from_pretrained(
    base_model,
    "shiva9876/gpt2-medium-wikitext2-lora"
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("gpt2-medium")

# Generate text
prompt = "The future of artificial intelligence"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_length=100,
    temperature=0.8,
    top_p=0.9,
    do_sample=True
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Merging LoRA Weights (Optional)

For faster inference, merge LoRA weights with base model:

# Merge and save
merged_model = model.merge_and_unload()
merged_model.save_pretrained("./merged_model")
tokenizer.save_pretrained("./merged_model")

# Load merged model directly
model = AutoModelForCausalLM.from_pretrained("./merged_model")

Training Details

Dataset

WikiText-2 is a collection of high-quality articles from Wikipedia. The dataset contains:

  • Training: 23,767 samples
  • Validation: 2,461 samples
  • Test: 2,891 samples

Training Procedure

  1. Preprocessing: Tokenization with max length 512
  2. Optimization: AdamW with fused implementation
  3. Regularization: Weight decay 0.01, gradient clipping 1.0
  4. Learning Rate Schedule: Cosine decay with 5% warmup
  5. Early Stopping: Patience of 3 evaluations

Training Curves

The model showed smooth convergence:

  • Epoch 0: Loss 3.43 β†’ PPL ~31
  • Epoch 1: Loss 3.03 β†’ PPL ~21
  • Epoch 3: Loss 2.92 β†’ PPL ~19
  • Epoch 5: Loss 2.87 β†’ PPL ~18

Limitations

  • Fine-tuned on English Wikipedia text only
  • May not generalize well to other domains
  • LoRA adapters add small overhead during inference
  • Inherits biases from GPT-2 and Wikipedia

Intended Use

This model is intended for:

  • Text generation experiments
  • Research on parameter-efficient fine-tuning
  • Educational purposes
  • Transfer learning baselines

Citation

If you use this model, please cite:

@misc{gpt2-wikitext2-lora,
  author = {Shiva Jaiswal},
  title = {GPT-2 Medium Fine-tuned on WikiText-2 with LoRA},
  year = {2025},
  publisher = {HuggingFace},
  url = {https://huggingface.co/shiva9876/gpt2-medium-wikitext2-lora}
}

Acknowledgments

  • Base model: OpenAI's GPT-2
  • LoRA: Microsoft Research
  • Training: Kaggle Tesla T4 x 2 GPUs
  • Framework: HuggingFace Transformers, PEFT

Contact

For questions or issues, please open an issue on the model repository.

Downloads last month
40
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Evaluation results