Commit
·
97af73d
1
Parent(s):
cdb2de7
Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,26 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
+
|
| 5 |
+
|
| 6 |
+
This model can be used to generate a SMILES string from an input caption.
|
| 7 |
+
|
| 8 |
+
## Example Usage
|
| 9 |
+
```python
|
| 10 |
+
from transformers import T5Tokenizer, T5ForConditionalGeneration
|
| 11 |
+
tokenizer = T5Tokenizer.from_pretrained("laituan245/molt5-small-caption2smiles", model_max_length=512)
|
| 12 |
+
model = T5ForConditionalGeneration.from_pretrained('laituan245/molt5-small-caption2smiles')
|
| 13 |
+
input_text = 'The molecule is a monomethoxybenzene that is 2-methoxyphenol substituted by a hydroxymethyl group at position 4. It has a role as a plant metabolite. It is a member of guaiacols and a member of benzyl alcohols.'
|
| 14 |
+
input_ids = tokenizer(input_text, return_tensors="pt").input_ids
|
| 15 |
+
outputs = model.generate(input_ids, num_beams=5, max_length=512)
|
| 16 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 17 |
+
# The model will generate "COC1=C(C=CC(=C1)CCCO)O". The ground-truth is "COC1=C(C=CC(=C1)CO)O".
|
| 18 |
+
```
|
| 19 |
+
|
| 20 |
+
## Paper
|
| 21 |
+
|
| 22 |
+
For more information, please take a look at our paper.
|
| 23 |
+
|
| 24 |
+
Paper: [Translation between Molecules and Natural Language](https://arxiv.org/abs/2204.11817)
|
| 25 |
+
|
| 26 |
+
Authors: *Carl Edwards\*, Tuan Lai\*, Kevin Ros, Garrett Honke, Heng Ji*
|