|
|
--- |
|
|
language: |
|
|
- en |
|
|
license: apache-2.0 |
|
|
base_model: google/flan-t5-base |
|
|
tags: |
|
|
- text2text-generation |
|
|
- summarization |
|
|
- xsum |
|
|
- lora |
|
|
- peft |
|
|
datasets: |
|
|
- EdinburghNLP/xsum |
|
|
metrics: |
|
|
- rouge |
|
|
--- |
|
|
|
|
|
# FLAN-T5-Base Fine-tuned on XSum with LoRA |
|
|
|
|
|
This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on the [XSum dataset](https://huggingface.co/datasets/EdinburghNLP/xsum) using **LoRA (Low-Rank Adaptation)** for parameter-efficient fine-tuning. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
- **Base Model:** google/flan-t5-base |
|
|
- **Task:** Extreme Summarization (one-sentence summaries) |
|
|
- **Dataset:** XSum (BBC news articles) |
|
|
- **Training Method:** LoRA (Low-Rank Adaptation) |
|
|
- **Parameters:** 0.00M trainable (0.00% of 249.35M total) |
|
|
|
|
|
## Training Details |
|
|
|
|
|
### LoRA Configuration |
|
|
- **Rank (r):** 16 |
|
|
- **Alpha:** 32 |
|
|
- **Target modules:** q, v |
|
|
- **Dropout:** 0.05 |
|
|
|
|
|
### Training Hyperparameters |
|
|
- **Learning rate:** 3e-4 |
|
|
- **Batch size:** 8 |
|
|
- **Epochs:** 3 |
|
|
- **Optimizer:** AdamW |
|
|
- **Mixed precision:** FP16 |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
from peft import PeftModel |
|
|
|
|
|
# Load base model and tokenizer |
|
|
base_model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base") |
|
|
tokenizer = AutoTokenizer.from_pretrained("Ekansh112/flan-t5-base-xsum-lora") |
|
|
|
|
|
# Load LoRA adapters |
|
|
model = PeftModel.from_pretrained(base_model, "Ekansh112/flan-t5-base-xsum-lora") |
|
|
|
|
|
# Generate summary |
|
|
text = "Your article text here..." |
|
|
inputs = tokenizer("summarize: " + text, return_tensors="pt", max_length=512, truncation=True) |
|
|
outputs = model.generate(**inputs, max_length=64, num_beams=4, length_penalty=2.0) |
|
|
summary = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
print(summary) |
|
|
``` |
|
|
|
|
|
## Performance |
|
|
|
|
|
Evaluation metrics on XSum test set: |
|
|
- **ROUGE-1:** [Add your score] |
|
|
- **ROUGE-2:** [Add your score] |
|
|
- **ROUGE-L:** [Add your score] |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model, please cite the original FLAN-T5 paper and the XSum dataset: |
|
|
|
|
|
```bibtex |
|
|
@article{chung2022scaling, |
|
|
title={Scaling instruction-finetuned language models}, |
|
|
author={Chung, Hyung Won and others}, |
|
|
journal={arXiv preprint arXiv:2210.11416}, |
|
|
year={2022} |
|
|
} |
|
|
|
|
|
@inproceedings{narayan2018don, |
|
|
title={Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization}, |
|
|
author={Narayan, Shashi and others}, |
|
|
booktitle={EMNLP}, |
|
|
year={2018} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This model inherits the license from the base model: Apache 2.0 |
|
|
|
|
|
--- |
|
|
|
|
|
**Trained by:** Ekansh112 |
|
|
|