laituan245 commited on
Commit
97af73d
·
1 Parent(s): cdb2de7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +23 -0
README.md CHANGED
@@ -1,3 +1,26 @@
1
  ---
2
  license: apache-2.0
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  ---
4
+
5
+
6
+ This model can be used to generate a SMILES string from an input caption.
7
+
8
+ ## Example Usage
9
+ ```python
10
+ from transformers import T5Tokenizer, T5ForConditionalGeneration
11
+ tokenizer = T5Tokenizer.from_pretrained("laituan245/molt5-small-caption2smiles", model_max_length=512)
12
+ model = T5ForConditionalGeneration.from_pretrained('laituan245/molt5-small-caption2smiles')
13
+ input_text = 'The molecule is a monomethoxybenzene that is 2-methoxyphenol substituted by a hydroxymethyl group at position 4. It has a role as a plant metabolite. It is a member of guaiacols and a member of benzyl alcohols.'
14
+ input_ids = tokenizer(input_text, return_tensors="pt").input_ids
15
+ outputs = model.generate(input_ids, num_beams=5, max_length=512)
16
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
17
+ # The model will generate "COC1=C(C=CC(=C1)CCCO)O". The ground-truth is "COC1=C(C=CC(=C1)CO)O".
18
+ ```
19
+
20
+ ## Paper
21
+
22
+ For more information, please take a look at our paper.
23
+
24
+ Paper: [Translation between Molecules and Natural Language](https://arxiv.org/abs/2204.11817)
25
+
26
+ Authors: *Carl Edwards\*, Tuan Lai\*, Kevin Ros, Garrett Honke, Heng Ji*