Wednesday, November 30, 2016

ANI-1: An extensible neural network potential with DFT accuracy at force field computational cost

Justin S. Smith, Olexandr Isayev, Adrian E. Roitberg (2016)
Contributed by Jan Jensen

This paper basically presents a neural network force field, which the authors call a neural network potential (NNP).  The authors heavily modify the Behler-Parinello symmetry functions (also used in this CCH) to improve the transferability and train it against 13.8 million ωB97X/6-31G(d) energies computed for CHON-containing molecules with 8 or less non-hydrogen atoms. This huge training set made it possible to parameterise a neural net with three hidden layers with a total of 320 nodes and 124,033 optimisable parameters.  Deep learning indeed.  

What makes this work particularly exiting is that the NNP appears to be transferable to larger molecules. For example, the figure above shows that the NNP can reproduce the relative ωB97X/6-31G(d) energies of retinol conformers with en RMSE of 0.6 kcal/mol.  For comparison the corresponding value for DFTB (not clear if it's DFTB2 or DFTB3) is 1.2 kcal/mol, although ωB97X/6-31G(d) is not the definitive reference by which to judge DFTB accuracy.

I think this work holds a lot of promise. One of the key challenges is to reduce the size of the training set to a point where high level calculations can be used to compute the energies. Alternatively, perhaps approaches like ∆-machine learning can be used to correct the NNP using a smaller representative training set.

This work is licensed under a Creative Commons Attribution 4.0

Wednesday, November 16, 2016

Calculation of NMR Spin–Spin Coupling Constants in Strychnine

Helgaker, T.; Jaszuński, M.; Świder, P. J. Org. Chem. 2016
Contributed by Steven Bachrach
Reposted from Computational Organic Chemistry with permission

Helgaker, Jaszunski, and Swider1 have examined the use of B3LYP with four different basis sets to compute the spin-spin coupling constants in strychnine 1.

They used previously optimized coordinates of the two major conformations of strychnine, shown in Figure 1.

Conformer A

Conformer B
Figure 1. Confrmations of strychnine 1.

They tested four basis sets designed for NMR computations: pcJ-0,2 pcJ-1,2 6-31G-J,3 and 6-311G-J.3 pCJ-0 and 6-31G-J are relatively small basis sets, while the other two are considerably larger.

All four basis sets provide values of the 122 J(C-H) with a root mean square deviation of less than 0.6 Hz. J(HH) and J(CC) coupling constants are also well predicted, especially with the larger pcJ-1 basis set. They also examined the four Ramsey terms in the coupling model. The Fermi contact term dominates, and if the large pcJ-1 basis set is used to calculate it, and the smaller pcJ-0 basis set is used for the other three terms, the RMS error only increases from 0.18 to 0.20 Hz. Taking this to the extreme, they omitted calculating any of the non-Fermi contact terms, with again only small increases in the RMS – even with the small pcJ-0 basis set. Considering the computational costs, one should seriously consider whether the non-Fermi contact terms and a small basis set might be satisfactory for your own problem(s) at hand.


1) Helgaker, T.; Jaszuński, M.; Świder, P., "Calculation of NMR Spin–Spin Coupling Constants in Strychnine." J. Org. Chem. 2016, ASAP, DOI: 10.1021/acs.joc.6b02157.
2) Jensen, F., "The Basis Set Convergence of Spin−Spin Coupling Constants Calculated by Density Functional Methods." J. Chem. Theor. Comput. 2006, 2, 1360-1369, DOI: 10.1021/ct600166u.
3) Kjær, H.; Sauer, S. P. A., "Pople Style Basis Sets for the Calculation of NMR Spin–Spin Coupling Constants: the 6-31G-J and 6-311G-J Basis Sets." J. Chem. Theor. Comput. 2011, 7, 4070-4076, DOI: 10.1021/ct200546q.


Strychnine 1: InChI=1S/C21H22N2O2/c24-18-10-16-19-13-9-17-21(6-7-22(17)11-12(13)5-8-25-16)14-3-1-2-4-15(14)23(18)20(19)21/h1-5,13,16-17,19-20H,6-11H2/t13-,16-,17-,19-,20-,21+/m0/s1

This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Sunday, October 30, 2016

Automatic chemical design using a data-driven continuous representation of molecules

Rafael Gómez-Bombarelli, David Duvenaud, José Miguel Hernández-Lobato, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alán Aspuru-Guzik (2016)
Contributed by Jan Jensen

Chemical space is discrete which makes it hard to search with standard techniques such as gradient-based minimisation.  This paper used a standard machine learning tool called an autoencoder to help solve that problem.  One way to think of an autoencoder is as a data-compressor where one neural network is trained to describe a data set such as an image in some compressed representation and another network is trained to recover the image from the compressed format.

The interesting thing in the context of chemical space is that the compressed format can be a continuous function such as a real-valued vector (latent space). (Another use of autoencoders is dimensionality reduction for data visualization, e.g. as an alternative to principal component analysis.)  This latent space is therefore a continuous representation of the chemical space (a set of SMILES strings) that the autoencoder was trained on.  Another neural net can then be trained to map some chemical property, such as logP values, on this latent space and the space can be searched for regions with desired logP values with techniques as simple as interpolation.

One problem with autoencoders is that they are "lossy" which in this case translates to the fact that not all points in latent space can be decoded to a valid molecule (SMILES string) but the failure rate is relatively low for the two proof-of-concept applications in the paper.

This is a very interesting new tool in the hunt for molecules with new properties.

This work is licensed under a Creative Commons Attribution 4.0

Wednesday, October 26, 2016

More examples of structure determination with computed NMR chemical shifts

Nguyen, Q. N. N.; Tantillo, D. J. “Using quantum chemical computations of NMR chemical shifts to assign relative configurations of terpenes from an engineered Streptomyces host,” J. Antibiotics 2016, 69, 534–540
Khokhar, S.; Pierens, G. K.; Hooper, J. N. A.; Ekins, M. G.; Feng, Y.; Rohan A. Davis, R. A. “Rhodocomatulin-Type Anthraquinones from the Australian Marine Invertebrates Clathria hirsuta and Comatula rotalaria,” J. Nat. Prod., 2016, 79, 946–953
Contributed by Steven Bachrach
Reposted from Computational Organic Chemistry with permission

Use of computed NMR chemical shifts in structure determination is really growing fast. Presented here are a couple of recent examples.

Nguyen and Tantillo used computed chemical shifts with the DP4 analysis to identify the structure of three terpenes 1-3.1 They optimized the geometries of all of the diastereomers of each compound, along with multiple conformations of each diastereomer, at B3LYP/6-31+G(d,p) and then computed the chemical shifts at SMD(CHCl3)–mPW1PW91/6-311+G(2d,p). The chemical shifts were Boltzmann weighted including all conformations within 3 kcal mol-1 of the lowest energy structure.

For 1, the DP4 analysis using just the proton shifts predicted a different isomer than using the carbon shifts, but when combined, DP4 predicted the structure, with 98.8% confidence, shown in the scheme above, and in Figure 1. For 2, the combined proton and carbon shift analysis with DP4 indicated a 100% confidence of the structure shown in the scheme and Figure 1. Lastly, for 3, which is more complicated due to the conformations of the 9-member ring, DP4 predicts with 100% confidence the structure shown in the scheme and Figure 1.



Figure 1. Optimized geometries of 1-3.

Feng, Davis and coworkers have examined a series of anthroquionones from Australian marine sponges.2The structure of one compound was a choice of two options: 4 or 5. Initial geometries were obtain by molecular mechanics and the low energy isomers were then reoptimized at B3LYP/6-31+G(d,p). The chemical shifts were computed using PCM/MPW1PW91/6-311+G(2d,p). Application of the DP4 method indicate the structure to be 4 with a 100% confidence level. The lowest energy conformer of 4 is shown in Figure 2.

Figure 2. Optimized geometry of 4.


1) Nguyen, Q. N. N.; Tantillo, D. J. “Using quantum chemical computations of NMR chemical shifts to assign relative configurations of terpenes from an engineered Streptomyces host,” J. Antibiotics 201669, 534–540, DOI: 10.1038/ja.2016.51.
2) Khokhar, S.; Pierens, G. K.; Hooper, J. N. A.; Ekins, M. G.; Feng, Y.; Rohan A. Davis, R. A. “Rhodocomatulin-Type Anthraquinones from the Australian Marine Invertebrates Clathria hirsuta andComatula rotalaria,” J. Nat. Prod., 2016, 79, 946–953, DOI: 10.1021/acs.jnatprod.5b01029.


1: InChI=1S/C15H24/c1-10-5-6-15(4)8-11-7-14(2,3)9-12(11)13(10)15/h9-11,13H,5-8H2,1-4H3/t10-,11+,13-,15+/m1/s1
2: InChI=1S/C15H24/c1-10-5-6-15(4)8-11-7-14(2,3)9-12(11)13(10)15/h5,11-13H,6-9H2,1-4H3/t11-,12-,13+,15-/m0/s1
3: InChI=1S/C20H32/c1-14-6-9-18-19(3,4)10-11-20(18,5)13-17-15(2)7-8-16(17)12-14/h6,13,15-16,18H,7-12H2,1-5H3/b14-6-,17-13-/t15-,16-,18-,20+/m0/s1
4: InChI=1S/C18H14O7/c1-7(19)13-10(20)6-11(21)15-16(13)17(22)9-4-8(24-2)5-12(25-3)14(9)18(15)23/h4-6,20-21H,1-3H3
5: InChI=1S/C18H14O7/c1-7(19)13-10(20)6-11(21)15-16(13)14-9(17(22)18(15)23)4-8(24-2)5-12(14)25-3/h4-6,20-21H,1-3H3

This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Wednesday, October 12, 2016

Expanding DP4: application to drug compounds and automation

Ermanis, K.; Parkes, K. E. B.; Agback, T.; Goodman, J. M. Org. Biomol. Chem., 2016, 14, 3943-3949
Contributed by Steven Bachrach
Reposted from Computational Organic Chemistry with permission

Computational chemistry has had a remarkable impact on the field of structure determination by NMR spectroscopy. The ability to efficiently compute 13C and 1H chemical shifts allows for comparison of the computed chemical shifts of potential structures against the experimental values, a tremendous aid in structure determination (see some examples in previous posts). Goodman and Smith developed the DP4 method1 (see this post) to assist in identifying proper structures by means of statistical distribution of errors and Bayes Theorem.

The Goodman group now reports on workflow solutions to structure prediction using DP4.2 They explore the use of open source computational tools both for predicting conformations and for computing the chemical shifts. They use a set of 10 drugs to test the performance. In general, the original DP4 method works very well in predicting drug structure, despite the fact that DP4 parameters were developed for natural products. The only failure is for simvastatin, where the large number of diastereomers and conformational flexibility prove to be too complex. The open source tools perform just slightly less effectively than the commercial packages, but are certainly a viable route for those with limited resources. The authors also provide a series of python scripts that allow users to create a seamless workflow; these should prove most helpful to the structure determination community.



1) Smith, S. G.; Goodman, J. M. "Assigning Stereochemistry to Single Diastereoisomers by GIAO
NMR Calculation: The DP4 Probability," J. Am. Chem. Soc. 2010132, 12946-12959, DOI:10.1021/ja105035r.
2) Ermanis, K.; Parkes, K. E. B.; Agback, T.; Goodman, J. M. “Expanding DP4: application to drug compounds and automation,” Org. Biomol. Chem.201614, 3943-3949, DOI: 10.1039/c6ob00015k.


Simvastatin: InChI=1S/C25H38O5/c1-6-25(4,5)24(28)30-21-12-15(2)11-17-8-7-16(3)20(23(17)21)10-9-19-13-18(26)14-22(27)29-19/h7-8,11,15-16,18-21,23,26H,6,9-10,12-14H2,1-5H3/t15-,16-,18+,19+,20-,21-,23-/m0/s1

This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Thursday, September 29, 2016

Multiscale Quantum Mechanics/Molecular Mechanics Simulations with Neural Networks

Lin Shen, Jingheng Wu, and Weitao Yang (2016)
Contributed by Jan Jensen

There has been a lot of work on estimating high level energies from low level energies, most of which have focussed some kind of interpolation or extrapolation.  However, this has been particularly challenging for QM/MM-MD studies where thousands of semiempirical energies contribute to the PMF but only hundreds of high level calculations are feasible. 

Yang and co-workers offer an interesting solution to this problem by using the high level calculations to train a neutral network to estimate the energy correction, which can then be used to estimate the high level energies for all thousands of geometries from the semiempirical QM/MM-MD.  Thus, the PMF can be computed "from scratch" at the high level rather than correcting the low level PMF.

The approach is based on work by Behler and Parrinello, but with three tweaks geared towards this particular problem.

1. The neural network is trained to reproduce an energy correction rather than a total energy.

2. Mulliken charges are used as input, in addition to atomic coordinates of the QM region, to provide some information about the interaction with the MM environment.

3. A subnet is added to the neutral net that depends on the reaction coordinate and the potential of mean force obtained at the low level to improve accuracy.  

The method is tested for three relatively small systems in water, so that high level PMFs can be computed rigorously for comparison. The results are quite impressive. For example, the free energy of glycine zwiterion is ca 8 kcal/mol higher in energy than the neutral form at the SCC-DFT/MM level but ca 8 kcal/mol lower in free energy at the B3LYP/6-31G(d)//MM level of theory. This latter value is reproduced to within 0.1 kcal/mol by the machine learning approach using only 500 B3LYP/6-31G(d)//MM energies to train the neural net. For comparison, the PMF requires 50,000 energy evaluations.

This work is licensed under a Creative Commons Attribution 4.0

Wednesday, September 28, 2016


Allen, W. D.; Quanz, H.; Schreiner, P. R. J. Chem. Theory Comput. 2016, 12, 4707–4716
Contributed by Steven Bachrach
Reposted from Computational Organic Chemistry with permission

Cyclopropyl rings can be joined together in a spiro fashion to form triangulanes. An interesting topology can be made by joining the rings to form a helical pattern, as shown in the [9]triangulane 1 below. Allen, Quanz, and Schreiner1 have examined the notion of an infinite helical molecule formed in this way.

First, they describe how one can generate the coordinates of such a beast using a closed analytical expression, which is a really nice demonstration of applied geometry. Next, they compute the geometry of a series of [n]triangulanes at M06-2x/6-31G(d). The geometries of [9]triangulane and their largest example, [42]triangulane 2 are shown in Figure 1.


Figure 1. M06-2x/6-31G(d) optimized geometries of 1 and 2.

They show that the geometry of 2 exhibits a structure that has two different C-C distances: one between the spiro carbons, and the second between the spiro carbon and the methylene carbon. The distance between the spiro carbons is rather short (1.458 Å), suggesting that the bonding here is between carbons that are nearly sp2-hybridized.

Lastly, they discuss the thermodynamics of polytriangulane. They employ a series of homodesmotic reactions to attempt to determine the enthalpy for adding another cyclopropyl ring to an extended triangulane. Unfortunately, the computed enthalpy is quite dependent on functional used. Similar attempts to define the strain energy is also flawed in this way. However, regardless of the functional the enthalpy for adding a cyclopropane ring appears to reach an asymptote rather quickly. So, using [3]triangulane they estimate that the strain energy per mole of cyclopropane in triangulane is about 42.7 kcal mol-1, or about 14 kcal mol-1 of strain due to the spiroannulation.


(1) Allen, W. D.; Quanz, H.; Schreiner, P. R. “Polytriangulane,” J. Chem. Theory Comput. 201612, 4707–4716, DOI: 10.1021/acs.jctc.6b00669.


1: InChI=1S/C19H22/c1-2-12(1)5-14(12)7-16(14)9-18(16)11-19(18)10-17(19)8-15(17)6-13(15)3-4-13/h1-11H2/t14-,15-,16-,17-,18-,19-/m0/s1

This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.