grammarly/coedit
Viewer • Updated • 70.8k • 1.07k • 96
Fine-tuned google/flan-t5-small on the grammarly/coedit dataset for English Grammar Error Correction (GEC).
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("xhimanshuz/flan-t5-small-grammar-correction")
model = AutoModelForSeq2SeqLM.from_pretrained("xhimanshuz/flan-t5-small-grammar-correction")
text = "Fix the grammar: I goes to school yesterday and learn many thing."
inputs = tokenizer(text, return_tensors="pt", max_length=128, truncation=True)
outputs = model.generate(**inputs, max_new_tokens=128)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Output: I went to school yesterday and learned many things.
Use instruction prefixes from the CoEdIT format:
"Fix the grammar: <text>""Fix grammatical errors in this sentence: <text>""Improve the grammaticality: <text>""Remove all grammatical errors from this text: <text>"| Input | Output |
|---|---|
| I goes to school yesterday and learn many thing. | I went to school yesterday and learned many things. |
| She don't know what are she doing. | She doesn't know what she is doing. |
| The informations was very helpfull for our researchs. | The information was very helpful for our research. |
| He have went to the market and buyed some apple. | He has gone to the market and bought some apple. |
| The childs was playing in park when it start raining. | The children were playing in the park when it started raining. |
| Step | Loss | Epoch |
|---|---|---|
| 1 | 0.669 | 0.00 |
| 100 | 0.484 | 0.40 |
| 250 | 0.448 | 1.00 |
| 500 | 0.325 | 2.00 |
| 750 | 0.292 | 3.00 |
This model was trained on a 2000-example subset on CPU as a demonstration. For better performance:
grammarly/coedit, or all 69K examples (including simplification, paraphrasing, etc.)google/flan-t5-base (250M) or google/flan-t5-large (770M)@inproceedings{raheja2023coedit,
title={CoEdIT: Text Editing by Task-Specific Instruction Tuning},
author={Raheja, Vipul and Kumar, Dhruv and Koo, Ryan and Kang, Dongyeop},
booktitle={EMNLP 2023},
year={2023}
}
Base model
google/flan-t5-small