FLAN-T5-Base Fine-tuned on XSum with LoRA

This model is a fine-tuned version of google/flan-t5-base on the XSum dataset using LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning.

Model Description

  • Base Model: google/flan-t5-base
  • Task: Extreme Summarization (one-sentence summaries)
  • Dataset: XSum (BBC news articles)
  • Training Method: LoRA (Low-Rank Adaptation)
  • Parameters: 0.00M trainable (0.00% of 249.35M total)

Training Details

LoRA Configuration

  • Rank (r): 16
  • Alpha: 32
  • Target modules: q, v
  • Dropout: 0.05

Training Hyperparameters

  • Learning rate: 3e-4
  • Batch size: 8
  • Epochs: 3
  • Optimizer: AdamW
  • Mixed precision: FP16

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from peft import PeftModel

# Load base model and tokenizer
base_model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")
tokenizer = AutoTokenizer.from_pretrained("AKG2/flan-t5-base-xsum-lora")

# Load LoRA adapters
model = PeftModel.from_pretrained(base_model, "AKG2/flan-t5-base-xsum-lora")

# Generate summary
text = "Your article text here..."
inputs = tokenizer("summarize: " + text, return_tensors="pt", max_length=512, truncation=True)
outputs = model.generate(**inputs, max_length=64, num_beams=4, length_penalty=2.0)
summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(summary)

Performance

Evaluation metrics on XSum test set:

  • ROUGE-1: [Add your score]
  • ROUGE-2: [Add your score]
  • ROUGE-L: [Add your score]

Citation

If you use this model, please cite the original FLAN-T5 paper and the XSum dataset:

@article{chung2022scaling,
  title={Scaling instruction-finetuned language models},
  author={Chung, Hyung Won and others},
  journal={arXiv preprint arXiv:2210.11416},
  year={2022}
}

@inproceedings{narayan2018don,
  title={Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization},
  author={Narayan, Shashi and others},
  booktitle={EMNLP},
  year={2018}
}

License

This model inherits the license from the base model: Apache 2.0


Trained by: AKG2

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AKG2/flan-t5-base-xsum-lora

Adapter
(270)
this model

Dataset used to train AKG2/flan-t5-base-xsum-lora