Upload flan-t5-base-xsum-lora from checkpoint-10000

deb4bb0 verified 14 days ago

2.53 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: google/flan-t5-base
	tags:
	- text2text-generation
	- summarization
	- xsum
	- lora
	- peft
	datasets:
	- EdinburghNLP/xsum
	metrics:
	- rouge
	---

	# FLAN-T5-Base Fine-tuned on XSum with LoRA

	This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on the [XSum dataset](https://huggingface.co/datasets/EdinburghNLP/xsum) using LoRA (Low-Rank Adaptation) for parameter-efficient fine-tuning.

	## Model Description

	- Base Model: google/flan-t5-base
	- Task: Extreme Summarization (one-sentence summaries)
	- Dataset: XSum (BBC news articles)
	- Training Method: LoRA (Low-Rank Adaptation)
	- Parameters: 0.00M trainable (0.00% of 249.35M total)

	## Training Details

	### LoRA Configuration
	- Rank (r): 16
	- Alpha: 32
	- Target modules: q, v
	- Dropout: 0.05

	### Training Hyperparameters
	- Learning rate: 3e-4
	- Batch size: 8
	- Epochs: 3
	- Optimizer: AdamW
	- Mixed precision: FP16

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	from peft import PeftModel

	# Load base model and tokenizer
	base_model = AutoModelForSeq2SeqLM.from_pretrained("google/flan-t5-base")
	tokenizer = AutoTokenizer.from_pretrained("Ekansh112/flan-t5-base-xsum-lora")

	# Load LoRA adapters
	model = PeftModel.from_pretrained(base_model, "Ekansh112/flan-t5-base-xsum-lora")

	# Generate summary
	text = "Your article text here..."
	inputs = tokenizer("summarize: " + text, return_tensors="pt", max_length=512, truncation=True)
	outputs = model.generate(**inputs, max_length=64, num_beams=4, length_penalty=2.0)
	summary = tokenizer.decode(outputs[0], skip_special_tokens=True)
	print(summary)
	```

	## Performance

	Evaluation metrics on XSum test set:
	- ROUGE-1: [Add your score]
	- ROUGE-2: [Add your score]
	- ROUGE-L: [Add your score]

	## Citation

	If you use this model, please cite the original FLAN-T5 paper and the XSum dataset:

	```bibtex
	@article{chung2022scaling,
	title={Scaling instruction-finetuned language models},
	author={Chung, Hyung Won and others},
	journal={arXiv preprint arXiv:2210.11416},
	year={2022}
	}

	@inproceedings{narayan2018don,
	title={Don't give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization},
	author={Narayan, Shashi and others},
	booktitle={EMNLP},
	year={2018}
	}
	```

	## License

	This model inherits the license from the base model: Apache 2.0

	---

	Trained by: Ekansh112