Tuesday, March 29, 2022

Machine Learning May Sometimes Simply Capture Literature Popularity Trends: A Case Study of Heterocyclic Suzuki−Miyaura Coupling

Wiktor Beker, RafałRoszak, Agnieszka Wołos, Nicholas H. Angello, Vandana Rathore, Martin D. Burke, and Bartosz A. Grzybowski (2022)
Highlighted by Jan Jensen

What do you infer from this quote from the paper (emphasis added)?

Another important problem, tackled herein, deals with the prediction of optimal conditions for a particular reaction in which there are generally multiple viable choices of solvents or reagents. Several works[21−24] have attempted to use ML for the prediction of reaction conditions, and the overall message they seem to convey is that ML can, in fact, offer accurate predictions provided adequate numbers of literature examples on which to build the models (but see also critical ref 6). However, here, we demonstrate with a case study that this may have been an overoptimistic interpretation, and that even with large quantities of carefully curated literature data, ML approaches may not perform considerably better than estimates based on the popularity of reaction conditions reported in the literature. In other words, these ML models do not provide significantly more insights than just suggesting the most popular conditions which could be obtained by simple statistics over literature examples[25,26] and no “machine intelligence.”
I can tell you what I inferred. References 21-24 used ML models to predict optimal reaction conditions, but failed to check whether they "provide significantly more insights than just suggesting the most popular conditions". I also inferred that the results from this study suggests that, had the authors checked, they would have found that not to be the case. 

However, the four references refer to two papers (21 and 23) by Doyle and co-workers on the prediction of reaction yields (not conditions) and two papers, one by Coley and co-workers and one by Reisman and co-workers (22 and 24, respectively), on the prediction of reaction conditions with comparison to popularity baselines

The paper looks at the prediction of solvent and base (and not catalysts and temperature as implied by the TOC graphic above) for ca 10,000 Suzuki coupling reactions from Reaxys. The best top-1 accuracy for base and solvent for ML are 80.6% and 51.7%, compared to popularity baseline values of 76.8% and 29.8%. The authors use the term "significantly" (and related terms) without ever quantifying what they deem significant, but to me the ML solvent predictions seem significantly better than the popularity baseline. 

Furthermore, as Coley and co-workers point out the true metric is the accuracy of the combined prediction, e.g. correct solvent and base. For example, in the case of correct catalysts and solvent and reagent Coley and co-workers found an accuracy of 57.3% compared to a popularity baseline of only 5.7%. However, I am not even certain whether Grzybowski and co-workers would deem that a significant improvement.

On a more constructive note, the topic of the paper does relate to an interesting fundamental question in ML on how to deal with imbalances data, i.e. where there is a a very popular single choice. One would perhaps naively suspect that this would be easier for a machine to learn, i.e. you just have to learn a few exceptions. But how to you typically learn exceptions? By memorising them, and we tend to employ many ML techniques to avoid just this.  

This work is licensed under a Creative Commons Attribution 4.0 International License.