Saturday, June 27, 2026

Developing Pharmaceutically Relevant Pd-Catalyzed C−N Coupling Reactivity Models Leveraging High-Throughput Experimentation

Seung Kyun Ha, Dipannita Kalyani, Michael S. West, Jessica Xu, Yu-hong Lam, Thomas Struble, Spencer Dreher, Shane W. Krska, Stephen L. Buchwald, and Klavs F. Jensen (2025)
Highlighted by Jan Jensen

Yield prediction is one of the most difficult and important challenges for machine learning applied to chemistry. This paper is a useful contribution because it provides a relatively large and systematic high-throughput dataset of ca. 4000 Pd-catalyzed C−N coupling reactions, spanning a wide variety of secondary amines and aryl bromides relevant to medicinal chemistry.

One important caveat is that the study does not address full reaction-condition optimization. All scope reactions are run using a single set of reaction conditions: one catalyst, one base, and one solvent system. The task is therefore better described as substrate-scope prediction under fixed conditions, rather than general prediction of reaction yield across arbitrary reaction conditions.

Significantly, the authors provide a useful reality check on the quality of yield data. For 32 repeated reactions, the measured product Liquid Chromatography Area Percent (LCAP) values correlate poorly, with (R^2 = 0.35). This experimental variability motivates their decision to treat the problem as binary classification rather than regression. A threshold of 20% product LCAP is chosen to define a “successful” reaction, and the repeated reactions are then consistent under this classification scheme in 27 cases. This supports a broader cautionary point: if yields from carefully controlled HTE experiments are already noisy at the level of absolute values, then predicting precise yield values from heterogeneous literature or web-scraped data is likely to be extremely difficult, and perhaps unrealistic in many settings.

The authors construct four different test sets to ascertain whether ML models can be used to extrapolate to unseen amines (amine OSS), aryl bromides (ArX OSS), or both (Both OSS) in addition to standard interpolation (DRS(n) where n is the percentage of the dataset used for training).

The authors compare several model classes and molecular representations, including random forests, decision trees, AdaBoost, fully connected neural networks, and MPNNs using Chemprop. Input features include one-hot encodings, Morgan fingerprints, quantum-mechanical fingerprints, molecular graphs, and combinations of these. Overall, the best models are usually either random forests with fingerprint-based descriptors, sometimes augmented with QM descriptors for the reacting components, or MPNNs. However, the optimal model and representation depend on the data split, which is itself an important result: there is no single universally best model for all generalization tasks.

The best model for each split is then used to design a corresponding prospective validation library of 96 reactions. The models are first retrained on the full experimental dataset using the best architecture, input features, and hyperparameters identified from the retrospective modeling. For the DRS validation library, the DRS25 settings are used. Each validation library is constructed so that approximately half of the reactions are predicted to give >20% LCAP and half are predicted to give <20% LCAP. The confidence threshold is >0.9 for the Amine OOS, ArX OOS, and DRS libraries, and >0.8 for the Both OOS library. For OOS amines or aryl halides, the selected substrates must also have a maximum Tanimoto similarity <0.7 to the corresponding substrates used in the model-building dataset. Thus, the validation libraries are not random samples of chemical space; they are enriched for reactions where the model is sufficiently confident.

The prospective validation results are impressive. For the Amine OOS library, the RF model gives 11 false positives, and no false negatives. For the ArX OOS library, the MPNN gives 3 false positives, and 2 false negatives. For the Both OOS library, the RF model performs less well but still gives useful enrichment, with most errors arising from false positives rather than false negatives. For the DRS25 library, the RF model performs extremely well, with essentially perfect precision and only one false negative. Overall, the models are especially good at avoiding false negatives, which is important in a medicinal chemistry setting because false negatives could cause chemists to discard reactions that would actually work.

Having said that, this study represents something close to a best-case scenario for reaction-outcome prediction. The dataset is large by the standards of synthetic chemistry, with around 4000 systematically generated reactions. The reactions are all run under the same conditions, reducing experimental heterogeneity. The positive rate is also relatively high: about 35% of the reactions exceed the 20% LCAP threshold. This makes the classification task easier than many realistic discovery settings where successful reactions are much rarer. Finally, because the dataset is large and the hit rate is high, the models can make a substantial number of high-confidence predictions, which enables the construction of balanced validation libraries with 50% predicted successes and 50% predicted failures. In smaller, noisier, or more imbalanced datasets, this level of prospective performance would likely be much harder to achieve.


This work is licensed under a Creative Commons Attribution 4.0 International License.

Sunday, May 31, 2026

Harnessing AtomisticSkills for Agentic Atomistic Research

Bowen Deng, Bohan Li, Matthew Cox, Hoje Chun, Juno Nam, Artur Lyssenko, Sathya Edamadaka, Jurgis Ruza, Xiaochen Du, Nofit Segal, Jesus Diaz Sanchez, Mingrou Xie, Ty Perez, Yu Yao, Miguel Steiner, Sauradeep Majumdar, Charles B. Musgrave III, Anirban Chandra, Abhirup Patra, Detlef Hohl, Connor W. Coley, Ju Li, Rafael Gómez-Bombarelli (2026)
Highlighted by Jan Jensen




AtomisticSkills is a hierarchical research framework in which skills encode reusable, mid-level scientific workflows, while tools provide low-level, type-checked computational operations that agents can reliably call to execute those workflows. While LLMs can in principle assemble such workflows from package documentation and first principles, in practice their performance degrades as context length grows. Put another way, trying to keep the manuals and execution details for RDKit, ORCA, and related tools in context at the same time is likely to increase hallucinations in the proposed workflow. Instead, complicated workflows are distilled by experts into SKILL.md files that outline how the tools are to be used, and these can be loaded into general-purpose coding agents such as Claude Code and Codex.

I especially like this last point. AtomisticSkills lets researchers use a tool they may already be familiar with, but apply it to new scientific problems. It looks like an interesting way to share robust workflows with non-experts. Take, for example, the installation of AtomisticSkills  itself: it basically amounts to downloading the repository and telling Codex to “Install AtomisticSkills according to its docs/setup.md guide,” after which the agent interactively guides the user through comparatively complicated steps such as creating environments, configuring API keys, and registering MCP servers.

For example, while we have made the xTB version of our EsNuEl workflow available through a web server, making the DFT versions available there was not practical. Installing it locally from the repo is of course possible, but perhaps a little intimidating for the target group of synthetic chemists. An approach like this might be more palatable: package the workflow as a skill, provide tested scripts and examples, and let a general-purpose coding agent guide the user through local setup and execution.


Thursday, April 30, 2026

Density Functional Theory Surrogate Enables Fast and Broad Computational Evaluation of Homogeneous Transition Metal Catalytic Energy Landscapes

Kevin P. Quirion, Wang-Yeuk Kong, Britton Stanley, Jyothish Joy, and Daniel H. Ess (2026)
Highlighted by Jan Jensen


It has been about 10 months since Meta FAIR released the Universal Models for Atoms, or UMA, machine-learning interatomic potentials. Since then, the first independent benchmarking studies have begun to appear, and this paper by Quirion and co-workers asks a very practical question: can UMA be used as a fast surrogate for DFT in homogeneous organometallic catalysis?

The authors examine seven catalytic/organometallic case studies taken from the literature, including Ir pincer alkane dehydrogenation, Rh hydroformylation, Ru olefin metathesis, Pd Buchwald–Hartwig amination, Cu-catalyzed difluorocarbene insertion, Ni asymmetric radical capture/reductive elimination, and a dinuclear Ni–Ni naphthyridine-diimine cycloaddition.

For literature geometries, they recompute reaction energies using ωB97M-V/def2-TZVPD single points, which is close to the level of theory that UMA is trained to reproduce. They then compare these values to UMA-S and UMA-M single-point energies, and in many cases also to UMA-optimized structures and energies. 

The headline result is encouraging: in most cases, UMA tracks ωB97M-V very well, often within a few kcal/mol and with good agreement in relative barriers and reaction-profile shapes. This is particularly impressive because the systems include different metals, oxidation-state changes, large ligands, charged species, and transition states. For routine conformer screening, preliminary mechanism mapping, or fast evaluation of many candidate catalysts, this suggests UMA could be genuinely useful.

There are, however, two important problem cases.

The first is the Cu-catalyzed difluorocarbene insertion, where the key issue is an open-shell singlet intermediate. UMA could not locate the TS1e transition state during optimization or NEB, gave unphysical conformational changes when optimizing the singlet 3e, and predicted the triplet state of 3e to be much lower than the singlet. At first glance this looks like a UMA failure, but ωB97M-V itself has similar problems with the singlet–triplet energetics. So this is not simply a machine-learning-potential problem. UMA is trained to reproduce ωB97M-V-like energies and forces; it should not be expected to magically repair failures of the underlying DFT reference method. The more specific concern is that UMA also has practical difficulties optimizing the open-shell singlet surface and locating the associated transition state. It was not tested whether ωB97M-V had the same problem.

The second problem case is the dinuclear Ni–Ni naphthyridine-diimine diene cycloaddition. Here UMA struggles with the relative spin states and barriers. In particular, it does not reproduce the same doublet/quartet ordering as ωB97M-V, and it overstabilizes some parts of the profile. This is perhaps less surprising because OMol25 did not include multinuclear transition-metal complexes, and the authors note that the naphthyridine-diimine ligand is not represented in the training set. Interestingly, the optimized geometries are not disastrous: UMA-S gives heavy-atom RMSDs of roughly 0.22 Å for the doublet and 0.36 Å for the quartet relative to the reported M06-L structures. So the failure is more severe for relative energetics and spin-state ordering than for generating plausible structures.

Overall, the study is a strong endorsement of UMA as a practical tool for organometallic mechanism work, provided it is used with the same caution one would apply to DFT. UMA appears especially promising for rapid conformer screening, approximate reaction-profile generation, and preoptimization before higher-level single-point calculations.

One unresolved issue is training-set overlap. The authors write that the OMol25 training database is so large that it “cannot be easily queried,” and that UMA does not provide an intrinsic nearest-neighbor or structure-comparison analysis for new inputs. That is a real limitation: if a benchmark system, or something very close to it, is already in the training data, the benchmark is much less informative about out-of-distribution generalization.

At the same time, the paper also states that the authors queried the dataset for the naphthyridine-diimine ligand and provide code in the Supporting Information. So the situation is somewhat unclear. The database may be inconvenient to search, but it does not seem impossible to search. For future UMA benchmark studies, it would be very useful to include at least a basic training-set check: for example, filtering OMol25 by metal, composition, charge, spin state, ligand identity, and local coordination environment. This would help distinguish cases where UMA is genuinely extrapolating from cases where it is interpolating within a familiar chemical neighborhood.

Wednesday, March 25, 2026

Stochastic tensor contraction for quantum chemistry

Jiace Suna and Garnet Kin-Lic Chan (2026)
Highlighted by Jan Jensen


What this paper lacks in terms of punchy title, it makes up for in content. I guess I would have gone with something like "Monte Carlo Meets Coupled Cluster: Slashing the Cost of CCSD(T)" or "Stochastic Tensor Contraction Pushes CCSD(T) Toward Mean-Field Cost". 

Anyway, tensor contraction is the algebraic core of much of quantum chemistry: large multidimensional arrays representing amplitudes and integrals are multiplied and summed over shared indices to produce energies and intermediates. It matters because these contractions set the scaling wall for methods like CCSD(T), where the formal cost rises far faster than Hartree–Fock. 

This study uses importance samplling to evaluate the tensor contraction, Importance sampling means drawing the most important terms in a sum more often than the unimportant ones, while reweighting so the final estimator stays unbiased. Here, Sun and Chan use it to evaluate high-order tensor contractions stochastically.

The headline result is that stochastic tensor contraction (STC) drives the scaling of CCSD(T) down dramatically: from the usual O(N^6) and O(N^7) down to O(N^4). In practice, water-cluster tests show very large FLOP reductions and wall-time crossovers at surprisingly small sizes. 

Figure 7 in the paper is the real selling point, because it compares against the incumbent approximate workhorse, DLPNO-CCSD(T), on 20 realistic molecules. STC is faster than DLPNO for every system in the set, with speedups ranging from 2.5× to 32×, while also delivering smaller errors than all DLPNO/Normal results and 15 of 20 DLPNO/Tight results. Just as importantly, the STC errors stay tightly clustered around the chosen target of 0.2 kcal/mol, whereas DLPNO errors vary much more from system to system. That makes STC look not just fast, but controllable. 

Table 3 sharpens that message. Averaged over the benchmark set, STC has a mean absolute error of 0.2 kcal/mol at a geometric mean runtime of 10.7 min, compared with 3.00 kcal/mol / 58 min for DLPNO/Normal, 0.70 kcal/mol / 159 min for DLPNO/Tight, and 773 min for exact CCSD(T). So the paper’s central claim is not merely better asymptotic scaling, but a roughly order-of-magnitude win in both time and error relative to state-of-the-art local correlation in this benchmark. 

One caveat: while the speed-up is undeniably impressive, another likely limiting factor is memory. The paper notes the use of density fitting “to reduce memory requirements,” but does not really quantify memory use or memory scaling in the same systematic way as FLOPs and wall time. Given that modern CC implementations are often limited as much by storage and movement of intermediates as by raw arithmetic, that omission stands out. 

Overall, this is prototype code, but very exciting prototype code. It will be very interesting to see whether this stochastic route can mature into something that genuinely displaces DLPNO-CCSD(T) as the default reduced-cost gold-standard method. Code: GitHub repository



This work is licensed under a Creative Commons Attribution 4.0 International License.



Saturday, February 28, 2026

Classical solution of the FeMo-cofactor model to chemical accuracy and its implications

Huanchen Zhai, Chenghan Li, Xing Zhang, Zhendong Li, Seunghoon Lee, and Garnet Kin-Lic Chan (2026)
Highlighted by Jan Jensen



The FeMo cofactor in nitrogenase enzymes is often mentioned as the killer application of quantum computing (QC) in chemistry. That is due to its complex electronic structure, which has made is difficult to model accurately. However, Chan and co-workers now claim to have computed the electronic energy to, by their estimate, chemical accuracy by conventional means.

They have done so by a series of calculations as indicated in the figure above. The CPU requirements are not given in detail, but the authors point out that no supercomputer was needed. 

Interestingly, the authors found that the ground state wavefunction is not inherently strongly multireference. Rather the main challenge is to identify the correct (mostly) single-reference state.

Where does that leave chemical applications of QC? For one thing, it moves the goalpost further back. The active space is the one typically used to estimate QC requirements, but it may have to be expanded to include MOs from the surrounding protein to accurately capture the chemistry, which would require even larger quantum computer. But that will be even further into the future with plenty of time for conventional approached to get there first. 

In my opinion, the case for QC-based quantum chemistry was never very strong, and this study is just another blow.

Wednesday, January 28, 2026

Predicting Enantioselectivity via Kinetic Simulations on Gigantic Reaction Path Networks

Yu Harabuchi, Ruben Staub, Min Gao, Nobuya Tsuji, Benjamin List, Alexandre  Varnek, and Satoshi Maeda (2026)
Highlighted by Jan Jensen



The automated predict of chemical reaction networks have thus far been limited to relatively small systems, typically with less than 50 atoms (including Hs) due to computational expense. This study goes significantly beyond this by studying a system with 228 atoms.

This is made possible by three things: 

1. While the system is big, the reaction is relatively simple, so the reaction network is relatively small. 

The reaction is an acid-catalysed cyclisation reaction involving a relatively small and chemically simple molecules. It is the (chiral) acid catalyst that contributes most of the atoms. The reaction itself has three steps: protonation of alkene group, intramolecular C-O bond formation on the activated alkene, deprotonation of the O to regenerate the catalyst. Most of the atoms are chemically inert, and there are 12 chemically active atoms (defined by the user). In all, the study identified 74 possible intermediates/products and only about half of those are chemically distinct if you ignore chirality. 

2. Cheap surrogate energy function

They use a Δ-ML approach that corrects the xTB energy and gradient to obtain better accuracy. The ML model is trained on-the-fly against DFT calculations. 

3. Massive computational resources 

In spite of 1 and 2 they this study required massive computational resources. They don't address this point specifically, other than to mention that it requires millions of gradient evaluations, but Maeda stressed this point during his talk at the WATOC last year. 

So this is not exactly a routine application. 

Wednesday, December 31, 2025

One step retrosynthesis of drugs from commercially available chemical building blocks and conceivable coupling reactions

Babak Mahjour, Felix Katzenburg, Emil Lammi, and Tim Cernak (2025)
Highlighted by Jan Jensen

What are important reactions that we currently can't perform? I asked myself this a few years ago and found that there were very few papers in the literature that addressed this. It turns out that I possessed the skills to figure it out for myself if I had only had the idea. The idea being that "the most valuable couplings would utilize the most abundant building blocks to form the most common types of bonds found in [a] target dataset."

As an example, the authors took a list of 9028 known drugs and asked how many could potentially be made in a single step from molecules in the MilliporeSigma catalog by hypothetical coupling reactions. The answer turns out to be 2573 (28%), which is a surprisingly large number. The most common reaction was the coupling of alkyl alcohols and alkyl amines, followed by alkyl acid-alkyl amine and alkyl acid-alkyl alcohols. All reaction for which there's no robust and generally applicable synthetic protocol, although AFAIK, although Zhang and Cernak took a stab at the alkyl acid-alkyl amine coupling. 

I really wish there were more papers like this. Identifying important questions to work on is just as important as solving them, and the latter is almost always a communal effort.


This work is licensed under a Creative Commons Attribution 4.0 International License.