Issue 12, 2023

Reagent prediction with a molecular transformer improves reaction data quality

Abstract

Automated synthesis planning is key for efficient generative chemistry. Since reactions of given reactants may yield different products depending on conditions such as the chemical context imposed by specific reagents, computer-aided synthesis planning should benefit from recommendations of reaction conditions. Traditional synthesis planning software, however, typically proposes reactions without specifying such conditions, relying on human organic chemists who know the conditions to carry out suggested reactions. In particular, reagent prediction for arbitrary reactions, a crucial aspect of condition recommendation, has been largely overlooked in cheminformatics until recently. Here we employ the Molecular Transformer, a state-of-the-art model for reaction prediction and single-step retrosynthesis, to tackle this problem. We train the model on the US patents dataset (USPTO) and test it on Reaxys to demonstrate its out-of-distribution generalization capabilities. Our reagent prediction model also improves the quality of product prediction: the Molecular Transformer is able to substitute the reagents in the noisy USPTO data with reagents that enable product prediction models to outperform those trained on plain USPTO. This makes it possible to improve upon the state-of-the-art in reaction product prediction on the USPTO MIT benchmark.

Graphical abstract: Reagent prediction with a molecular transformer improves reaction data quality

Supplementary files

Article information

Article type
Edge Article
Submitted
09 Dec 2022
Accepted
12 Feb 2023
First published
01 Mar 2023
This article is Open Access

All publication charges for this article have been paid for by the Royal Society of Chemistry
Creative Commons BY-NC license

Chem. Sci., 2023,14, 3235-3246

Reagent prediction with a molecular transformer improves reaction data quality

M. Andronov, V. Voinarovska, N. Andronova, M. Wand, D. Clevert and J. Schmidhuber, Chem. Sci., 2023, 14, 3235 DOI: 10.1039/D2SC06798F

This article is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported Licence. You can use material from this article in other publications, without requesting further permission from the RSC, provided that the correct acknowledgement is given and it is not used for commercial purposes.

To request permission to reproduce material from this article in a commercial publication, please go to the Copyright Clearance Center request page.

If you are an author contributing to an RSC publication, you do not need to request permission provided correct acknowledgement is given.

If you are the author of this article, you do not need to request permission to reproduce figures and diagrams provided correct acknowledgement is given. If you want to reproduce the whole article in a third-party commercial publication (excluding your thesis/dissertation for which permission is not required) please go to the Copyright Clearance Center request page.

Read more about how to correctly acknowledge RSC content.

Social activity

Spotlight

Advertisements