Yu Harabuchi, Ruben Staub, Min Gao, Nobuya Tsuji, Benjamin List, Alexandre Varnek, and Satoshi Maeda (2026)
Highlighted by Jan Jensen

Important recent papers in computational and theoretical chemistry
A free resource for scientists run by scientists
Yu Harabuchi, Ruben Staub, Min Gao, Nobuya Tsuji, Benjamin List, Alexandre Varnek, and Satoshi Maeda (2026)
Highlighted by Jan Jensen

Babak Mahjour, Felix Katzenburg, Emil Lammi, and Tim Cernak (2025)
Highlighted by Jan Jensen
What are important reactions that we currently can't perform? I asked myself this a few years ago and found that there were very few papers in the literature that addressed this. It turns out that I possessed the skills to figure it out for myself if I had only had the idea. The idea being that "the most valuable couplings would utilize the most abundant building blocks to form the most common types of bonds found in [a] target dataset."
As an example, the authors took a list of 9028 known drugs and asked how many could potentially be made in a single step from molecules in the MilliporeSigma catalog by hypothetical coupling reactions. The answer turns out to be 2573 (28%), which is a surprisingly large number. The most common reaction was the coupling of alkyl alcohols and alkyl amines, followed by alkyl acid-alkyl amine and alkyl acid-alkyl alcohols. All reaction for which there's no robust and generally applicable synthetic protocol, although AFAIK, although Zhang and Cernak took a stab at the alkyl acid-alkyl amine coupling.
I really wish there were more papers like this. Identifying important questions to work on is just as important as solving them, and the latter is almost always a communal effort.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Hao Zhang and Matthew Otten (2025)
Highlighted by Jan Jensen
The paper introduces a method they call TrimCI that very efficiently finds a relatively small set of determinants that accurately describes strongly correlated systems. (Well, it actually works for any system, but the main advantage is for strongly correlated systems).
Unlike most new correlation methods, this one is actually simple enough to describe in a few sentences. TrimCI starts by constructing a set of orthogonal (non-optimised!) MOs (e.g. by diagonalising the AO overlap matrix). From these MOs you construct a small number of random determinants (e.g.100), construct the wavefunction (i.e. construct the Hamiltonian matrix and diagonalise, as per usual). Then you compute all the Hamiltonian elements between this wavefunction ($H_{ij}$) and the remaining determinants and add determinants with sufficiently large |$H_{ij}$| to the wavefunction. Finally, there is the trimming step "which removes negligible basis states by first diagonalising randomised blocks of the core and then performing a global diagonalising step on the surviving set." And repeat.
The authors find that this approach converges much quicker than other similar methods, using many fewer determinants. Another big advantage is that the method does not require a single-determinant ground state as a starting point and is thus not sensitive to how much such a single-determinant deviates from the actual wavefunction.
So, what's the catch here? In order to be practically useful, we need to compute energy differences with mHa accuracy, and I did not see any TrimCI results for chemical systems where the energy had converged to that kind of accuracy. It's possible that error cancellation can help here, but that needs to be investigated. The authors do look at extrapolation, which looks promising, but needs to be systematically investigated. Yet another option is to use the (compact) TrimCI wavefunction as an ansatz for dynamic-correlation methods.
It's also not clear what AO basis set it used for some of these calculations (including the one shown above). I suspect small basis sets are used and even FCI energies with very small basis sets are of limited practical use. Are the TrimCI calculations on large systems still practical with more realistic basis sets?
Nevertheless, this seems like a very promising step in the right direction.
This work is licensed under a Creative Commons Attribution 4.0 International License.
Joonyoung F. Joung, Mun Hong Fong, Nicholas Casetti, Jordan P. Liles, Ne S. Dassanayake & Connor W. Coley (2025)
Highlighted by Jan Jensen

William B. Hughes, Mihai V. Popescu, Robert S. Paton (2025)
Highlighted by Jan Jensen
While this paper presents an interesting and useful benchmark and dataset of barrier heights involving organic molecules in the triplet state, I am highlighting this paper for a different reason.
While compiling the data set the authors "observed a common tendency for triplet SCF calculations to converge non-Aufbau solutions, resulting in catastrophic predictions in both thermochemistry and activation energy barriers and leading to errors as high as 26.4 kcal/mol." They go on to note that "Since such errors cannot be predicted a priori, manual inspection of spin densities for triplet-state calculations can be helpful to ensure the lowest triplet state has been converged with KS-DFT."
I remember the days when the SCF would routinely fail to converge even for simple singlet ground state molecules. So when they did converge and you got odd results, one of the first things you checked was the orbitals. But those days are long gone and I don't think it would occur to me now. I'd be much more likely to ascribe it to some deficiency in the functional.
I now wonder how many of such wrong conclusions are scattered throughout the literature, especially for molecules with "funky" electronic structure, such as transition metal complexes. Manual inspection of the MOs is not going to be a practical option for many of these studies, and SCF stability checks did not identify all problems!
However, most QM packages have several options for the MO guess and it might be a good idea to use more than one of them and check whether they all converge to the same SCF solution. It'll be just like the old days.

This work is licensed under a Creative Commons Attribution 4.0 International License.
Brandon M. Wood, Misko Dzamba1, Xiang Fu, Meng Gao, Muhammed Shuaibi, Luis Barroso-Luque, Kareem Abdelmaqsoud, Vahe Gharakhanyan, John R. Kitchin, Daniel S. Levine, Kyle Michel, Anuroop Sriram, Taco Cohen, Abhishek Das, Ammar Rizvi, Sushree Jagriti Sahoo, Zachary W. Ulissi, C. Lawrence Zitnick (2025)
Highlighted by Jan Jensen
I use xTB extensively in my research and I am often asked why don't switch to machine learning potentials (MLPs) instead. My answer has always been that they have too many limitations: limited atom types, no charged molecules, can't handle reactions, efficiency on CPUs, solvent effects, etc. I know these can be overcome by making bespoke MLPs, then it is not really a simple replacement for xTB, but a whole new workflow.
However, the new UMA MLP from Meta seems to address all but one of my concerns (more on that below). UMA is trained to reproduce DFT energies and gradients calculated for nearly half a billion 3D structures, spanning molecules, surfaces, reactions, etc, containing atoms from virtually all of the periodic table. It is also possible to specify the charge and multiplicity and the cost seems to be comparable to xTB when running on CPUs, when interfaced with ORCA. So this is all very encouraging.
Two main questions remain. One is the accuracy, and by that I mean how faithfully it reproduces ωB97M-V/def2-TZVPD results (in the case of molecules) for molecules outside it's training set. AFAIK nothing is published yet, but encouraging results are being shared online.
The other main question is how to include implicit solvent effects. In cases where it is OK to optimise i the gas phase, one option is to compute the solvation energy with some other method and add it to the gas phase UMA results. Even if you do that at the DFT level, UMA has still saved you a lot if time. However, if the problem requires optimisation in solvent, then you have to use a faster method like xTB to compute the solvent effects on the gradient in order to get any time-savings. Depending on how well xTB does on the system of interest, this could "contaminate" the UMA results. Alternatively, a purely ML approach would basically amount to redoing UMA for molecules with continuum solvation included. Explicit solvation is fine in principle, but impractical for routine applications.
Anyway, before this is resolved there could be some fairly routine applications that still cannot be address satisfactorily with MLPs.

This work is licensed under a Creative Commons Attribution 4.0 International License.
J. Harry Moore, Daniel J. Cole, and Gábor Csányi (2025)
Highlighted by Jan Jensen
