Wednesday, November 30, 2022

Quantum Chemical Data Generation as Fill-In for Reliability Enhancement of Machine-Learning Reaction and Retrosynthesis Planning

Alessandra Toniato, Jan P. Unsleber, Alain C. Vaucher, Thomas Weymuth, Daniel Probst, Teodoro Laino, and Markus Reiher (2022)
Highlighted by Jan Jensen

Part of Figure 7 from the paper. (c) The authors 2022. Reproduced under the CC BY NC ND 4.0 license

This is the first paper I have seen on combining automated QM-reaction prediction with ML-based retrosynthesis prediction. The idea itself is simple: for ML-predictions with low confidence (i.e. few examples in the training data) can automated QM-reaction prediction be used to check whether the proposed reaction is feasible, i.e. whether it is the reaction path with the lowest barrier?  If so, it could also be used to augment the training data.

The paper considers two examples using the Chemoton 2.0 method: one where the reaction is an elementary reaction and one where there are two steps (the Friedel-Crafts reaction shown above). It works pretty well for the former, but runs into problems for the latter.

One problem for non-elementary reactions is that one can't predict which atoms are chemically active from the overall reaction. Chemoton therefore must consider reactions involving all atom pairs and preferably more pairs of atoms simultaneously. The number of required calculations quickly gets out of hand and the authors conclude that "For such multistep reactions, new methods to identify the individual elementary steps will have to be developed to maintain the exploration within tight bounds, and hence, within reasonable computing time." 

However, even when they specify the two elementary steps for the Friedel-Crafts reaction, their method fails to find the second elementary step. The reason for this failure is not clear but could be due to the semiempirical xTB used for efficiency.

So the paper presents an interesting and important challenge to computational chemistry community. I wish more papers did this.



This work is licensed under a Creative Commons Attribution 4.0 International License.