Tuesday, December 31, 2019

Schrödinger-ANI: An Eight-Element Neural Network InteractionPotential with Greatly Expanded Coverage of Druglike Chemical Space

James M. Stevenson, Leif D. Jacobson, Yutong Zhao, Chuanjie Wu, Jon Maple, Karl Leswing, Edward Harder, and Robert Abel (2019)
Highlighted by Jan Jensen

There are a lot of ML models trained on HCNO data sets. This is fine for proof-of-concept, but severely limits applications to real world problems. For example HCNO-only molecules comprise only 46% of molecules in the CHEMBL database of drug-like molecules.

The main problem with extending these methods to other elements is that the size of the chemical space grows non-linearly with respect to the number of different elements. Furthermore, there aren't any generally available and comprehensive sets of molecules similar to QMx or GDB-x.

The current study extends the ANI-1 method to H, C, N, O, S, F, Cl, and P, which covers 94% of CHEMBL. The authors use a combination of stochastic sampling and extensive pre-screening to distill the training/validation set to only 10 million DFT single point calculations on relatively small molecules, which took "just a few days and at a very reasonable compute cost".

The main focus was on relative conformer energies, since the bulk of CPU time for many studies is typically spent on conformational searches. The RMSE for this data is 0.70 kcal/mol relative to DFT, which is quite impressive.

As the name suggests, the work was done at Schrödinger, so the method is not open sourced. However, an earlier version for 4 elements is available here. More importantly, the methodology behind the dataset generation is well described and appears to be practically feasible for academic labs.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Wednesday, November 27, 2019

Unifying machine learning and quantum chemistry with a deep neural network for molecular wavefunctions

K. T. Schütt, M. Gastegger, A. Tkatchenko, K.-R. Müller & R. J. Maurer  (2019)
Highlighted by Jan Jensen

Figure 2 from the paper. © The Authors 2019. Reproduced under the CC-BY license.

This paper had received a lot of attention, so I had to see what the fuzz was about. The method (SchNOrb) uses a deep neural network to create a Hamiltonian matrix from 3D coordinates, which can then be diagonalised to yield orbital energies and orbitals. SchNOrb is trained to reproduce Hamiltonian and overlap matrix elements, total energies (which are computed as the sum of orbital energies in the ML model), and gradients all taken from AIMD trajectories. 

So it's a bit like the ANI-1 method except that you also get orbitals, which can be used to compute other properties without additional parameterisation. One crucial difference though is that, as far as I can tell, SchNOrb is parameterised for each molecule separately. 

The model uses about 93 million parameters for a >100 AO Hamiltonian and requires 25,000 structures and 80 GPU hours to train. Once trained it can predict an energy with meV accuracy in about 50 ms on a GPU.

The software is "available upon request".

The reviews are made available, and well worth reading.

Thursday, October 31, 2019

The minimum parameterization of the wave function for the many-body electronic Schrödinger equation. I. Theory and ansatz

Lasse Kragh Sørensen (2019)
Highlighted by Jan Jensen

It is well known that Full CI (FCI) scales exponentially with the basis set size. However, Sørensen claims that the "whole notion of the exponentially scaling nature of quantum mechanics simply stems from expanding the wavefunction in a sub-optimal basis",  i.e. one-electron functions. Sørensen goes on to argue that of two-ele tron functions (geminals) are used instead, the scaling is reduced to m(m−1)/2 where m is the number of basis functions.  Furthermore, because "the number of parameters is independent of the electronic structure the multi-configurational problem is a mere phantom summoned by a poor choice of basis for the wave function". 

I don't have the background to tell whether the arguments in this paper are correct and the main point os this post is to see if I can get some feedback from people who do have the background. 

In principle one could simply compare the new approach to FCI calculations but the new method isn't quite there yet:
A straight forward minimization of Eq. 84 unfortunately gives the solution for the two-electron problem of a +(N−2) charged system N/2 times so additional constraints must be introduced. These constraints can be found by the property of the wave function in limiting cases. The problem of finding the constraints for the geminals in the AGP ansatz is therefore closely related to finding the N-representability constraints for the two-body reduced density matrix (2-RDM). For the N-representability Mazziotti have showed a constructive solution[74] though the exact conditions are still illusive.[75, 76]  

Sunday, September 29, 2019

Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning

Frank Noé, Simon Olsson, Jonas Köhler, Hao Wu (2019)
Highlighted by Jan Jensen

Figure 1A from the paper (Copyright © 2019 The Authors, some rights reserved)

The paper presents a novel method to predict free energy differences much more efficiently.  Currently, this is usually done by MD simulations and observing how often each state is visited along the trajectory. However, transitions between each state are rare, which means very long and costly trajectories, even when using various tricks to force the transitions.

The solution Noé et al. present is to "train a deep invertible neural network to learn a coordinate transformation from x to a so-called “latent” representation z, in which the low-energy configurations of different states are close to each other and can be easily sampled."

For example, the user supplies a few examples of each state [px(x)] and trains the NN to find a new set of variables (z) with a much simpler probability distribution [pz(z)], by minimising the difference using the Kullback-Leibler divergence as a loss function.

Since the NN is invertible, one can now sample repeatedly from pz(z) to get a more accurate px(x), which can then be reweighed to give the Boltzmann distribution. Since pz(z) is a simple Gaussian most of the sampled structure will have a high Boltzmann probability, so you don't have to sample that many structures.

The technical main advance is the use of an invertible NN, that allow you to go both from x to z and z to x. This is done by using a NN architecture where only simple mathematically operations (addition and multiplication) that can be reversed (subtraction and division) are allowed.

It would be very interesting to see if a similar approach can be used for inverse design.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Wednesday, September 25, 2019

Deflate to Understand Complex Molecular Kinetics

Contributed by Jesper Madsen

Dimensionality reduction is at the core of understanding and making intuitive sense of complex dynamic phenomena in chemistry.  It is usually assumed that the slowest mode is the one of primary interest; however, it is critical to realize that this is not always so! A conceptual example hereof is a protein folding simulation (Lindorff-Larsen et al. Science 334, 517-520, 2011) where the slowest dynamical mode is not the folding itself (see Figure). What is the influence, then, of “non-slowest” modes in this process and how can it most appropriately be elucidated?

FIG: Figure 2 from the preprint: "(A) Sampled villin structures from the MD trajectory analyzed. Helical secondary structure is colored and coils are white. Each image represents five structures sampled from similar locations in TIC space as determined by a 250-center k-means model built upon the first three original TICs. The purple structure represents the folded state, and the blue structure represents the denatured state. The green structure is a rare helical misfolded state that we assert is an artifact. (B) Two-dimensional histograms for TICA transformations constructed from villin contact distances. Dashed lines indicate the regions corresponding to the sampled structures of the same color. The first TIC tracks the conversion to and from the rare artifact only. The second TIC tracks the majority of the folding process and correlates well with RMSD to the folded structure."

This work by Husic and Noé show how deflation can provide an answer to these questions. Technically speaking deflation is a collection of methods for how to modify a matrix after the largest eigenvalue is known in order to find the rest. In their provided example of the folding simulation, the dominant Time-lagged Independent Component (TIC) encapsulates the "artifact" variation that we are not really interested in. Thus, a constructed kinetic (Markov-state) model will be contaminated in several undesirable ways as discussed by the authors in great detail.  

In principle, this should be a very common problem since chemical systems have complex Hamiltonians. Perhaps the reason why we don’t see it discussed more is that ultra-rare events – real or artifact – may not usually be sampled during conventional simulations. So, with the increasing computational power available to us, and simulations approaching ever-longer timescales, this is likely something that we need to be able to handle. This preprint describes well how one can think about attacking these potential difficulties.   

Saturday, August 31, 2019

Physical machine learning outperforms "human learning" in Quantum Chemistry

Highlighted by Jan Jensen

Figure 1 from the paper

The paper presents a method to estimate DFT or CCSD(T) energies (computed using large basis sets) based only on HF/cc-pVDZ densities and energies. In order to avoid overfitting, the method must also estimate the corresponding DFT or CCSD densities. The method is trained and validated on a subset of the QM9 data set (i.e. on relative small molecules). It is first trained on DFT data (using 89K molecules) and then retrained on CC data for a smaller subset (3.6K molecules), both being subsets of the QM9 data set (i.e. relatively small molecules). The input density is evaluated in a 3D grid that is big enough to accommodate the largest molecule in the data set, so a new model would have to be trained for significantly larger molecules.

This “physical machine learning in Quantum Chemistry” (PML-QC) model reaches a mean absolute error in energies of molecules with up to eight non-hydrogen atoms as low as 0.9 kcal/mol relative to CCSD(T) values, which is quite impressive. In fact the authors speculate that 
With ML, it may become not required that an accurate quantum chemical method works fast enough for every new molecule that an end user may be interested in. Instead, the focus shifts to generating highly accurate results only for a finite dataset to be used for training, while the efficiency in practical applications is to be achieved via improvements in DNNs to make them faster and more accurate.
Much will depend how much the number of outliers can be reduced. For example, for PML-QCDFT ca 5% of molecules have errors greater than 2.6 kcal/mol and this effect can be magnified for relative energies if the individual errors have opposite sign.

Wednesday, July 31, 2019

Popular Integration Grids Can Result in Large Errors in DFT-Computed Free Energies

Highlighted by Jan Jensen

 Figure 1B from the paper (CC BY-NC-ND 4.0)

 This paper has already been highlighted here and here, so I'll just briefly summarise.

The grid used for the numerical integration in DFT calculations is defined relative to the Cartesian axes, so rotating the molecule will change the integration grid and, hence, the energy. This has been known for some time and, f.eks. Gaussian09 uses a default grid size (Fine, 75,302) where the effect on the electronic energy variation is usually negligible.

Bootsma and Wheeler show that the vibrational entropy and, hence, the free energy is significantly more sensitive to grid size than the electronic energy. Using the Fine grid, the differences in relative free energy changes can be as large as 4 kcal/mol, which could significantly change conclusion regarding mechanisms, etc. The effect comes from the variation in low frequency vibrational modes and the effect can be reduced a little by scaling these frequencies. 

However, the errors really only become acceptable when using the UltraFine grid size, which is the default in Gaussian16, especially combined with frequency scaling (which one should do anyway to get consistent results). If you are using Gaussian09 or some other quantum program to compute relative free energies it is definitely a good idea to look at the default grid size and perform some tests.

Note that if you want to perform such tests yourself, you need to re-optimise the molecule after you rotate it because the gradient is also affected by the rotation.