Tuesday, October 24, 2017

An Atomistic Fingerprint Algorithm for Learning Ab Initio Molecular Force Fields

Yu-Hang Tang, Dongkun Zhang, and George Em Karniadakis, arXiv:1709.09235
Contributed by Jesper Madsen

Modeling potential energy landscapes of complex atomic environments is challenging. Conventional interatomic potentials are very useful because the potential energy surface is well approximated by some appropriate smooth function of nuclear coordinates. However, choosing the functional form too simple and closed comes with severe limitations because the true potential energy surface may not be (easily) decomposable. 

Instead of sticking with an explicit functional form, one can use continuous density-fields, formed by superimposition of a smoothing kernel on the atoms of the atomic configuration, in order to represent and compare atomistic neighborhoods. Herein, I highlight a recent example of such a method called Density-Encoded Canonically Aligned Fingerprint (DECAF). 

Figure 1: (A) “Two 1D density profiles, ρ1 and ρ2, are generated from two different atomistic configurations using atom-centered smoothing kernel functions. The ‘distance’ between them is measured as the L2 norm of their difference, which corresponds to the highlighted area in the middle plot.” (B) “Shown here is a 2D density field using smoothing kernels whose widths depend on the distances of the atoms from the origin. Darker shades indicate higher density.” 

The preprint by Tang et al. describes the DECAF algorithm (Fig. 1) and also briefly reviews and critically compares with the recent literature of similar methods [such as Smooth Overlap of Atomic Positions (SOAP), Coulomb Matrix, Graph Approximated Energy (GRAPE), and Atom-Centered Symmetry Functions].

The work rests on the key idea of splitting up conventional functional forms into two separate problems, one of representation and one of interpolation, which appears particularly powerful. Molecular fingerprint algorithms such as DECAF are promising in representing atomic neighborhoods faithfully using kernel regression methods. All the beneficial tools and analyses from modern statistics come into play, but there are still open questions that remain. For instance, it is not clear which smoothing kernel, distance metric (and so on) is superior in relating atomic configurations to one-another -- both in general and in specific situations. It is conceivable that there does not exist a best one-size-fits-all option. Furthermore, there will as always be tradeoffs between resolution and computational costs. For an introductory discussion on these topics, the preprint by Tang et al. (and the references within) is a good place to start.