Thursday, April 30, 2026

Density Functional Theory Surrogate Enables Fast and Broad Computational Evaluation of Homogeneous Transition Metal Catalytic Energy Landscapes

Kevin P. Quirion, Wang-Yeuk Kong, Britton Stanley, Jyothish Joy, and Daniel H. Ess (2026)
Highlighted by Jan Jensen


It has been about 10 months since Meta FAIR released the Universal Models for Atoms, or UMA, machine-learning interatomic potentials. Since then, the first independent benchmarking studies have begun to appear, and this paper by Quirion and co-workers asks a very practical question: can UMA be used as a fast surrogate for DFT in homogeneous organometallic catalysis?

The authors examine seven catalytic/organometallic case studies taken from the literature, including Ir pincer alkane dehydrogenation, Rh hydroformylation, Ru olefin metathesis, Pd Buchwald–Hartwig amination, Cu-catalyzed difluorocarbene insertion, Ni asymmetric radical capture/reductive elimination, and a dinuclear Ni–Ni naphthyridine-diimine cycloaddition.

For literature geometries, they recompute reaction energies using ωB97M-V/def2-TZVPD single points, which is close to the level of theory that UMA is trained to reproduce. They then compare these values to UMA-S and UMA-M single-point energies, and in many cases also to UMA-optimized structures and energies. 

The headline result is encouraging: in most cases, UMA tracks ωB97M-V very well, often within a few kcal/mol and with good agreement in relative barriers and reaction-profile shapes. This is particularly impressive because the systems include different metals, oxidation-state changes, large ligands, charged species, and transition states. For routine conformer screening, preliminary mechanism mapping, or fast evaluation of many candidate catalysts, this suggests UMA could be genuinely useful.

There are, however, two important problem cases.

The first is the Cu-catalyzed difluorocarbene insertion, where the key issue is an open-shell singlet intermediate. UMA could not locate the TS1e transition state during optimization or NEB, gave unphysical conformational changes when optimizing the singlet 3e, and predicted the triplet state of 3e to be much lower than the singlet. At first glance this looks like a UMA failure, but ωB97M-V itself has similar problems with the singlet–triplet energetics. So this is not simply a machine-learning-potential problem. UMA is trained to reproduce ωB97M-V-like energies and forces; it should not be expected to magically repair failures of the underlying DFT reference method. The more specific concern is that UMA also has practical difficulties optimizing the open-shell singlet surface and locating the associated transition state. It was not tested whether ωB97M-V had the same problem.

The second problem case is the dinuclear Ni–Ni naphthyridine-diimine diene cycloaddition. Here UMA struggles with the relative spin states and barriers. In particular, it does not reproduce the same doublet/quartet ordering as ωB97M-V, and it overstabilizes some parts of the profile. This is perhaps less surprising because OMol25 did not include multinuclear transition-metal complexes, and the authors note that the naphthyridine-diimine ligand is not represented in the training set. Interestingly, the optimized geometries are not disastrous: UMA-S gives heavy-atom RMSDs of roughly 0.22 Å for the doublet and 0.36 Å for the quartet relative to the reported M06-L structures. So the failure is more severe for relative energetics and spin-state ordering than for generating plausible structures.

Overall, the study is a strong endorsement of UMA as a practical tool for organometallic mechanism work, provided it is used with the same caution one would apply to DFT. UMA appears especially promising for rapid conformer screening, approximate reaction-profile generation, and preoptimization before higher-level single-point calculations.

One unresolved issue is training-set overlap. The authors write that the OMol25 training database is so large that it “cannot be easily queried,” and that UMA does not provide an intrinsic nearest-neighbor or structure-comparison analysis for new inputs. That is a real limitation: if a benchmark system, or something very close to it, is already in the training data, the benchmark is much less informative about out-of-distribution generalization.

At the same time, the paper also states that the authors queried the dataset for the naphthyridine-diimine ligand and provide code in the Supporting Information. So the situation is somewhat unclear. The database may be inconvenient to search, but it does not seem impossible to search. For future UMA benchmark studies, it would be very useful to include at least a basic training-set check: for example, filtering OMol25 by metal, composition, charge, spin state, ligand identity, and local coordination environment. This would help distinguish cases where UMA is genuinely extrapolating from cases where it is interpolating within a familiar chemical neighborhood.

No comments:

Post a Comment