RetroDFM-R: Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning

RetroDFM-R is a reasoning-driven large language model designed for chemical retrosynthesis. Unlike traditional graph-based or sequence models, it incorporates large-scale reinforcement learning with chemically verifiable rewards, enabling stronger generalization, higher prediction reliability, and improved interpretability. Comprehensive evaluations show that RetroDFM-R outperforms existing state-of-the-art approaches across standard benchmarks. Double-blind human assessments further confirm the chemical plausibility and practical usefulness of its predictions. The model also successfully reconstructs multistep routes for real drug molecules and complex materials reported in the literature. Its explicit reasoning process offers clear, human-interpretable insights, enhancing trust and real-world applicability in retrosynthesis planning.

News

2025-11-22: The parameter of RetroDFM-R-8B is open-sourced!
2025-07-23: The paper of RetroDFM-R is released on arXiv: Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning.

Training Details

RetroDFM-R is trained through a three-stage pipeline: (1) continual pretraining on retrosynthesis-focused chemical data, (2) supervised fine-tuning on distilled chain-of-thought reasoning samples, and (3) reinforcement learning to further enhance step-by-step reasoning and prediction quality.

Usage Details

Local Inference

To load and run RetroDFM-R locally, here is an example:

import re
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name_or_id = "OpenDFM/RetroDFM-R-8B"
tokenizer = AutoTokenizer.from_pretrained(model_name_or_id)
model = AutoModelForCausalLM.from_pretrained(model_name_or_id, torch_dtype=torch.bfloat16, device_map="auto")

target_smiles = "<target mol in SMILES format>"
instruction = f"<SMILES> {target_smiles} </SMILES> Given the product SMILES, your task is to predict the reactants SMILES using your experienced chemical Retrosynthesis knowledge. Please reason step by step, and put your final answer within <answer> answer here </answer>."

message = [
    {"role": "user", "content": instruction}
]

input_text = tokenizer.apply_chat_template(message, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
generation_config = GenerationConfig(
    do_sample=True,
    top_k=20,
    top_p=0.9,
    temperature=0.6,
    max_new_tokens=1024,
    eos_token_id=tokenizer.eos_token_id
)
outputs = model.generate(**inputs, generation_config=generation_config)

generated_text = tokenizer.batch_decode(outputs, skip_special_tokens=True)[0]
input_text = tokenizer.decode(inputs["input_ids"][0], skip_special_tokens=True)
generated_text = generated_text[len(input_text):].strip()
print(f"{generated_text=}")

thinking, answer = re.match(r'<think>(.*?)</think>\s?<answer>(.*?)</answer>', generated_text, re.DOTALL).groups()
thinking, answer = thinking.strip(), answer.strip()
print(f"{thinking=}")
print(f"{answer=}")

SMILES preprocess

When there involves SMILES notation in your input, we recommend to preprocess the SMILES with the rdkit package to canonicalize the SMILES. Here is an example:

from rdkit import Chem
def canonicalize_smiles(smiles):
    mol = Chem.MolFromSmiles(smiles)
    if mol is None:
        return None
    return Chem.MolToSmiles(mol, isomericSmiles=True, kekuleSmiles=False)

or directly:

from rdkit import Chem
def canonicalize_smiles(smiles):
    return Chem.CanonSmiles(smiles, useChiral=True)

Citation

@misc{zhang2025retrodfmr,
  title={Reasoning-Driven Retrosynthesis Prediction with Large Language Models via Reinforcement Learning},
  author={Zhang, Situo and Li, Hanqi and Chen, Lu and Zhao, Zihan and Lin, Xuanze and Zhu, Zichen and Chen, Bo and Chen, Xin and Yu, Kai},
  year={2025},
  eprint={2507.17448},
  archivePrefix={arXiv},
  primaryClass={cs.CE},
  url={https://arxiv.org/abs/2507.17448}, 
}

Disclaimer

Current version of RetroDFM-R may generate incorrect or misleading information. Please use it with caution and verify the results with domain experts before making any decisions based on the results.

Downloads last month: 29

Safetensors

Model size

266k params

Tensor type

BF16

Model tree for OpenDFM/RetroDFM-R-8B

Quantizations

1 model