Monday, February 10, 2020

On the Completeness of Atomic Structure Representations

Here, I highlight an interesting recent preprint that tries to formalize and quantify something that I previously have posted here at Computational Chemistry Highlights (see the post on Atomistic Fingerprints here), namely how to best describe atomic environments in all their many-body glory. A widely held perception among practitioners of the "art" of molecular simulation is that while we usually restrict ourselves to 2-body effects for efficiency purposes, 3-body descriptions uniquely specify the atomic environment (up to a rotation and permutation of like atoms). Not the case (!) and the authors effectively debunk this belief with several concrete counter-examples. 

FIG. 1: "(a) Two structures with the same histogram of triangles; (angles 45, 45, 90, 135, 135, 180 degrees) (b) A manifold of degenerate pairs of environments: In addition to three points A,B,B′ a fourth point Cor C− is added leading to two degenerate environments, and − . (c) Degeneracies induce a transformation of feature space so that structures that should be far apart are brought close together."

Perhaps the most important implication of the work is that it in part helps us understand why modern machine-learning (ML) force fields appears to be so successful. At first sight the conclusion we face is daunting: for arbitrarily high accuracy, no n-point correlation cutoff may suffice to reconstruct the environment faithfully. Why, then, can recent ML force fields so accurately be used to calculate extensive properties such as the molecular energy? According to the results of Pozdnyakov, Willatt et al.'s work, low-correlation order representations often suffice in practice because, as they state, "the presence of many neighbors or of different species (that provide distinct “labels” to associate groups of distances and angles to specific atoms), and the possibility of using representations centred on nearby atoms to lift the degeneracy of environments reduces the detrimental effects of the lack of uniqueness of the power spectrum [the power spectrum is equivalent to the 3-body correlation, Madsen], when learning extensive properties such as the energy." However, the authors do suggest that introducing higher order invariants that lift the detrimental degeneracies might be a better approach in general. In any case, the preprint raises many technical and highly relevant issues; and it would be well worth going over if you don't mind getting in the weeds with Maths.   

Wednesday, January 29, 2020

Discovery of a Difluoroglycine Synthesis Method through Quantum Chemical Calculations

Tsuyoshi Mita, Yu Harabuchi and Satoshi Maeda (2020)
Highlighted by Jan Jensen

TOC graphic. © The Authors 2020. Reproduced under the CC-BY-NC-ND 4.0 license.

In this paper the authors use DFT calculations to identify a synthetic route to difluoroglycine. 

They started by applying the single component artificial force induced reaction (SC-AFIR) method to difluoroglycine. In the SC-AFIR artificial forces are introduced between functional groups which forces them to either react or dissociate from one another. 

This yielded 288 equilibrium structures and 309 transition states. The selected NH3 + :CF2 + CO2 for further study because the reaction is 1) predicted to be very exothermic (i.e. high yield), 2) has a low barrier, and 3) NH3 and CO2 are readily available.

:CF2 can be generated by a variety of methods and the authors initially chose Me3SiCF3, which generates CF3-, which in turn dissociates to :CF2 and F-. They then generated the reaction network for NH3 + CF3- + CO2 and performed a kinetic analysis, which predicted that "the calculated yield of difluoroglycine is almost zero because the equilibrium between CF3- and CF3 + F- favours the former. As a result, CF3CO2-, in which CF3- is directly bound to CO2, was obtained as the main product (99.8%)."

A similar analysis was performed for NH3 + CF2Br- + CO2, which predicted a higher yield for difluoroglycine, but also a minor by-product NH2CO2CHF2 due to proton transfer from NH3 to :CF2. Thus, to increase the yield, the authors repeated the analysis for NMe3 + CF2Br- + CO2, which predicted a >99% yield.

Finally, the predicted synthetic route was tested experimentally and the reaction conditions (such as solvent, temperature, and silane activator) optimised resulting in a 96% yield. However, it was only possible to purify the ester.

This is the first study I have seen where a synthetic route (of an admittedly very simple molecule) is predicted from DFT calculations. Hopefully the first of many. However, as the authors note "It is  undeniable that  the  experience  and  intuition of  chemists,  or even luck, contributed to appropriate choices being made."

The calculations were performed with the GRRM17 program. It appears to be free, but I don't believe it is open source.

This work is licensed under a Creative Commons Attribution 4.0 International License.