--- language: - en license: apache-2.0 base_model: google/flan-t5-base tags: - text2text-generation - summarization - xsum - lora - peft datasets: - EdinburghNLP/xsum metrics: - rouge --- # FLAN-T5-Base Fine-tuned on XSum with LoRA This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on the [XSum dataset](https://huggingface.co/datasets/EdinburghNLP/xsum) using **LoRA (Low-Rank Adaptation)** for parameter-efficient fine-tuning. ## Model Description - **Base Model:** google/flan-t5-base - **Task:** Extreme Summarization (one-sentence summaries) - **Dataset:** XSum (BBC news articles) - **Training Method:** LoRA (Low-Rank Adaptation) - **Parameters:** 0.00M trainable (0.00% of 249.35M total) ## Training Details ### LoRA Configuration - **Rank (r):** 16 - **Alpha:** 32 - **Target modules:** q, v - **Dropout:** 0.05 ### Training Hyperparameters - **Learning rate:** 3e-4 - **Batch size:** 8 - **Epochs:** 3 - **Optimizer:** AdamW - **Mixed precision:** FP16 ## Usage ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM from peft import PeftModel # Load base model and tokenizer base_model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base") tokenizer = AutoTokenizer.from_pretrained("Ekansh112/flan-t5-base-xsum-lora") # Load LoRA adapters model = PeftModel.from_pretrained(base_model, "Ekansh112/flan-t5-base-xsum-lora") # Generate summary text = "Your article text here..." inputs = tokenizer("summarize: " + text, return_tensors="pt", max_length=512, truncation=True) outputs = model.generate(**inputs, max_length=64, num_beams=4, length_penalty=2.0) summary = tokenizer.decode(outputs[0], skip_special_tokens=True) print(summary) ``` ## Performance Evaluation metrics on XSum test set: - **ROUGE-1:** [Add your score] - **ROUGE-2:** [Add your score] - **ROUGE-L:** [Add your score] ## Citation If you use this model, please cite the original FLAN-T5 paper and the XSum dataset: ```bibtex @article{chung2022scaling, title={Scaling instruction-finetuned language models}, author={Chung, Hyung Won and others}, journal={arXiv preprint arXiv:2210.11416}, year={2022} } @inproceedings{narayan2018don, title={Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization}, author={Narayan, Shashi and others}, booktitle={EMNLP}, year={2018} } ``` ## License This model inherits the license from the base model: Apache 2.0 --- **Trained by:** Ekansh112