## Saturday, March 31, 2012

### Finding Density Functionals with Machine Learning

John C. Snyder, Matthias Rupp, Katja Hansen, Klaus-Robert Müller, and Kieron Burke arXiv:1112.5441v1 2012 (Open Access)

The study uses standard machine learning (ML) techniques to construct a density-based (i.e. orbital-free) expression for the electronic kinetic energy $(T^{ML})$ for a model 1-dimensional system of up to four non-interacting electrons subject to a smooth potential.  Sub-kcal/mol accuracy can be achieved with fewer than 100 Gaussian training densities.

Numerical noise in the functional derivative of $T^{ML}$ is removed by a principal component analysis and while variational minimization of the energy fails to find a unique density "the search does not produce a unique minimum, it produces a range of similar but valid approximate densities, each with a small error."

While the application of the method does not require any physical intuition about the form of the functional, physically motivated potentials can be included in the method to improve results.  It will be very interesting indeed to see this and similar approaches being trained against highly accurate electronic structure data computed for for atoms and molecules, not just for the kinetic energy but for the exchange-correlation functional itself.

Acknowledgement: I thank Matteo Cavalleri for bringing this paper to my attention via twitter.

## Wednesday, March 28, 2012

### Synthesis and Properties of [9]Cyclo-1,4-naphthylene: A π-Extended Carbon Nanoring

Akiko Yagi, Yasutomo Segawa, and Kenichiro Itami Journal of the America Chemical Society 2012, 134, 2962–2965 (Paywall)
Contributed by Steven Bachrach.
Reposted from Computational Organic Chemistry with permission

Itami continues to design novel macrocycles containing aromatic rings (see this post). This latest paper reports the synthesis of the first nanohoops containing naphthylenes, namely [9]cyclo-1,4-naphthylene 1. Since the macrocycle contains an odd number of naphthylene units, the lowest energy conformation is of C2 symmetry with one of the naphthylene rings in the plane of the macrocycle. This conformation gives rise to 27 peaks in the proton NMR, and while the value of the computed chemical shifts differ from the experimental ones by about 0.5 to 1 ppm, their relative ordering is in very nice agreement.

Itami also notes that 1 is chiral and computed the barrier for racemization of 19.9 kcal mol-1¸ through the transition state 2. This racemization process is compared with the racemization of 1,1’-binaphthyl.

## Sunday, March 25, 2012

### Linear Scaling Self-Consistent Field Calculations with Millions of Atoms in the Condensed Phase

Joost VandeVondele, Urban Borštnik, and Jürg Hutter Journal of Chemical Theory and Computation 2012, ASAP. (Paywall)

This paper presents orbital-based self-consistent field calculations on three-dimensional dihydrogen and water clusters containing up to two million atoms. The main new feature is the use of a mathematical technique called the matrix sign function to express the density matrix and the inverse overlap matrix. This is combined with more conventional linear techniques: a static mixing approach to solve the self-consistent equation, parallel sparse matrix multiplication, use of plane waves and Fast Fourier Transforms to compute the Coulomb energy, and use of energy functions that do not require Hartree-Fock exchange.

The method is implemented in the open source CP2K/Quickstep software package.

The LDA-DFT/minimial basis set energy of one million dihydrogen molecules is computed in 8 minutes of wall clock time using 46,656 cores of an Cray XT5. For comparison, an LDA-DFT/double zeta single point calculation on a 96,000-atom water cluster (736,000 basis functions) requires 16 h using  46,656 cores.  Results for NDDO and DFT tight binding energy functions are also presented.

It will be very interesting to see how this method performs with heterogeneous systems and GGA functionals.

## Wednesday, March 21, 2012

### Effects of Ethynyl Substitution on Cyclobutadiene

B. J. Esselman, R. J. McMahon Journal of Physical Chemistry A 2012, 116, 483-490 (Paywall)
Contributed by Steven Bachrach.
Reposted from Computational Organic Chemistry with permission

Cyclobutadiene is the prototypical antiaromatic compound. McMahon has examined the effect of ethynyl substitution on this ring, with a long term eye towards the possibility of these types of species being involved in the synthesis of fullerenes.1

All of the possible ethynyl-substituted cyclobutadiene species (1-7) were optimized at B3LYP/6-31G(d) and CCSD/cc-pVDZ in their singlet and triplet states.
The structures of singlet and triplet 7 are shown in Figure 1. The geometries provided by the two different methods are quite similar. They show a rectangular form for the singlets and a delocalized, nearly square ring for the triplets.

The computed singlet-triplet gap decreases with each ethynyl substituent. B3LYP, which overestimates the stability of triplets, predicts that 6 and 7 will be ground state triplets, while CCSD predicts a singlet ground state for all 7 species, with the gap decreasing steadily from 11.5 to 8.2 kcal mol-1, a value that is also probably underestimated.

This change in the singlet-triplet gap reflects a stronger stabilizing effect of each ethynyl group to the cycnobutadiene ring for the triplet than for the singlet state. This is seen in the homodesmotic stabilization energies.

Lastly, NICS(1)zz values are positive for all of the singlets and negative for the triplets. The positive values for the singlets reflect their antiaromatic character, also seen in the alternant bond distances around the ring. The NICS values of the singlets decrease with increasing substitution. The negative NICS values of the triplets reflects aromatic character, as seen in the non-alternant distances around the ring. Interestingly, the triplet NICS values decrease with increasing ethynyl substitution, suggesting decreased aromaticity, even though the homodesmotic reactions suggest increasing stabilization with substitution.

References

(1) Esselman, B. J.; McMahon, R. J., "Effects of Ethynyl Substitution on Cyclobutadiene," J. Phys. Chem. A 2012, 116, 483-490, DOI: 10.1021/jp206478q

## Sunday, March 18, 2012

### A consistent and accurate ab initio parametrization of density functional dispersion correction (DFT-D) for the 94 elements H-Pu

S. Grimme, J. Antony, S. Ehrlich, and H. Krieg Journal of Chemical Physics 2010, 132, 154104 (Paywall)

With 245 Web of Science citations this paper by Grimme and co-workers must be one of the most cited density functional paper published recently.  The dispersion correction to DFT is based on average atomic dipole polarizability tensors at imaginary frequency computed for 227 representative hydrides of the first 94 elements.  These polarizability tensors are computed using TD-DFT using the PBE38 functional and a doubly augmented def2-QZVP basis set.  Hydrides with different atomic hybridization are included to yield hybridization-specific dispersion energies and an interpolation method handles atoms judged to have intermediate hybridization.

The correction consists of $r^{-6}$ and $r^{-8}$ two-body terms and an $r^{-9}$ three-body term, which are all damped at short-range using a damping function.  The method contains a total of two adjusted parameters, which are obtained by fitting to conformational and interaction energies computed using CCSD(T)/CBS for several structural datasets.  Optimized parameters are presented for 11 different functionals.

The method, termed DFT-D3, typically achieves an accuracy that is within 10% of CCSD(T) results based on the data presented in the paper.  However, higher errors were observed for "some chemical reaction energies, in particular those involving many carbon atoms."

A FORTRAN implementation has been made freely available, and the method is already implemented in several software packages such as GAMESS and ADF.

## Wednesday, March 14, 2012

### Numerical Errors and Chaotic Behavior in Docking Simulations

Miklos Feher and Christopher I. Williams, Journal of Chemical Information and Modeling, DOI: 10.1021/ci200598m (paywall)

The authors have made a very interesting study of the effect of numerical problems in molecular docking. Specifically, they have studied the two docking software GOLD and GLIDE, both with three different accuracy settings.

As the authors point out in the conclusion:
"This study clearly demonstrates that seemingly insignificant differences in ligand input, such as small coordinate perturbations or permuting the atom order in an input file, can have a dramatic effect on the final top-scoring docked pose."

The authors have investigated how robust the docking algorithms are with respect to poses and scores for two different input variations. First, very small variations to the input structure (max 0.1 degree of torsional angle changes, total max 0.1 Å RMSD). Second, permutations of atom order in the ligand input files. Both are variations that the normal user would expect to have no effect on the final output of a docking run.

The results are highly interesting and show that the virtual screening settings in the software seem to correlate with low robustness. Hence, using standard settings (or higher) improves robustness for these two methods.
While one could expect that GLIDE which is a deterministic approach (empirical scoring function) perhaps would be less prone to robustness problems than GOLD (genetic algorithm), this does not seem to be the case. The only major difference between the two softwares is that GOLD seem to generate more normal based distributions of scores and RMSDs than GLIDE.

Both small changes in the torsions of the input structures and the permutation of atom order in the input files lead to many cases where the docking protocols simply fail to be robust and can result in vastly different top scoring poses, especially for the low accuracy settings. Thus, applications of docking software with virtual screening settings are more prone to problems with reproducibility, but even with the highest accuracy robustness cannot be guaranteed.

For the future, it would be highly interesting to see if these problems are the same for other docking software, and how they can be alleviated.

[edit: changed the comment on CPU time relative to robustness, since there is no statistical significant difference of the robustness of the two more accurate settings used in the two software]

### Regiolone and Isosclerone, Two Enantiomeric Phytotoxic Naphthalenone Pentaketides: Computational Assignment of Absolute Configuration and Its Relationship with Phytotoxic Activity

A. Evidente, S. Superchi, A. Cimmino, G. Mazzeo, L. Mugnai, D. Rubiales, A. Andolfi, A. M. Villegas-Fernández, European Journal of Organic Chemistry 2011, 5564 (Paywall)
Contributed by Steven Bachrach.
Reposted from Computational Organic Chemistry with permission

It is striking to me that the absolute configuration of relatively simple compounds remains problematic even today. The structure of two naturally-occurring phytotoxic enantiomers 1, called regiolone and isosclerone, are finally definitively defined using a computational approach.

(R)-1: R = OH, R’ = H
(S)-1: R = H, R’ = OH

Isosclerone is the dextrorotatory isomer, while regiolone is the levorotatory isomer. The question though is which one is R and which one is S? Evidente and co-workers arbitrarily decided to compute the spectral properties of the S isomer.1 They located four low energy conformers at B3LYP/6-31G* and B3LYP/TZVP. (These conformers are not shown here as the authors did not deposit the coordinates. Reviewers and editors – please insist that this computational data be mandatory for publication!) The conformer relative energies, listed in Table 1, are dependent on the method, however, the two lowest energy structures will dominate the population and both will be present to a significant extent, regardless of which energy set is used. The optical rotation [α]D was computed at B3LYP/6-31G*//B3LYP/TZVP, and these too are listed in Table 1. The Boltzmann-weighted [α]D is 21.8. Even though the lowest energy conformer contributes a negative rotation, the much larger positive rotation due to the second-lowest energy conformer, along with the two other conformers, will dominate to dictate the OR value. This suggests that the enantiomers are (S)(+)-1 and (R)(-)-1. Computed ECD spectra confirm this assignment; the computed ECD of the (S) isomer is a near mirror image of the experimental ECD of the (-)-1 compound. Therefore, regiolone is (R)(-)-1 and isosclerone is (S)(+)-1.

Table 1. Relative free energies (kcal mol-1) and [α]D of the conformers of (S)-1.a
 conformer ΔG, 6-31G* ΔG, TZVP [α]Db A 0.43 0.0 -17.50 B 0.0 0.32 67.92 C 1.21 1.03 95.72 D 1.84 1.48 17.72
aAll computations performed with B3LYP. bAt B3LYP/6-31G*//B3LYP/TZVP

### References

(1) Evidente, A.; Superchi, S.; Cimmino, A.; Mazzeo, G.; Mugnai, L.; Rubiales, D.; Andolfi, A.; Villegas-Fernández, A. M., "Regiolone and Isosclerone, Two Enantiomeric Phytotoxic Naphthalenone Pentaketides: Computational Assignment of Absolute Configuration and Its Relationship with Phytotoxic Activity," Eur. J. Org. Chem., 2011, 5564-5570, DOI: 10.1002/ejoc.201100941

## Saturday, March 10, 2012

### The Energy Computation Paradox and ab initio Protein Folding

J. C. Faver, M. L. Benson, X. He, B. P. Roberts, B. Wang, M. S. Marshall, C. D. Sherrill, K. M. Merz Jr. PLoS ONE 2011, 6(4): e18868 (Open Access)

Protein structure prediction faces two challenges: sampling the vast conformational space and computing accurate energies for the structures.  Actually, as this paper nicely demonstrates, the latter part of this statement is inaccurate (pun intended): the main challenge is computing precise energies the structures.

The accuracy of the energy reports on the systematic error of the energy function which can be corrected and will largely cancel for related structures, but the precision reports on the random error which cannot be corrected for and must therefore be kept low.

The authors first determine the accuracy and precision of 20 different energy functions ranging from force fields to MP2 calculations with triple zeta basis sets for small gas phase models of 42 van der Waals interactions and 50 hydrogen bonds in the protein ubiquitin by comparison to MP2/CBS [or CCSD(T)/CBS for systems containing aromatic groups] calculations.  From this we learn, for example, that the mean error of interaction is 0.73 kcal/mol for the FF99FB force field and 1.67 kcal/mol for PM6.

FF99FB and PM6, as well as PM6-DH2, are then used to compute energies for the Rosetta decoy set consisting of 49 different proteins.  FF99FB fails to predict the lowest energy for the native structure for 19 proteins, whereas PM6 (and PM6-DH2) identifies the native structure in all cases, despite FF99FB being more accurate.  The reason is that FF99FB, with a variance of 4.04 kcal$^2$/mol$^2$, is less precise than PM6, with a variance of 2.24 kcal$^2$/mol$^2$.  For comparison, PM6-DH2 has a mean error of interaction of only 0.30 kcal/mol and a variance of 1.23 kcal$^2$/mol$^2$.

The authors also convincingly demonstrates how the random error increases with protein size, so that correctly folding larger proteins puts more demands on the precision of the energy function.

Acknowledgment: I thank Kenneth Merz, Jr for helpful discussions.

## Tuesday, March 6, 2012

### Entropic Intermediates and Hidden Rate-Limiting Steps in Seemingly Concerted Cycloadditions. Observation, Prediction, and Origin of an Isotope Effect on Recrossing

O. M. Gonzalez-James, E. E. Kwan, D. A. Singleton  Journal of the American Chemical Society 2012, 134, 1914 (Paywall)
Contributed by Steven Bachrach.
Reposted from Computational Organic Chemistry with permission

Once again the Singleton group reports experiments and computations that require serious reconsideration of our notions of reaction mechanisms.1 In this paper they examine the reaction of dichloroketene with labeled cis-2-butene. With 13C at the 2 position of 2-butene, two products are observed, 1 and 1’, in a ratio of 1’:1 = 0.993 ± 0.001. This is the opposite what one might have imagined based on the carbonyl carbon acting as an electrophile

The first interesting item is that B3LYP/6-31+G** fails to predict the proper structure of the transition state. It predicts an asymmetric structure 2, while MPW1k/6-31+G**, M06, and MP2 predict a Cs transition structure 3. The Cs TS is confirmed by a grid search of M06-2x geometries with CCSD(T)/6-311++G88/PCM(CH2Cl2) energies.

The PES using proper computational methods is bifurcating past TS 3, falling downhill to product 1 or 1’. Lying on the Cs plane is a second transition state that interconverts 1 and 1’. On such a surface, conventional transition state theory would predict equal amounts of 1 and 1’, i.e. no isotope effect! So they must resort to a trajectory study – which would be impossibly long if not for the trick of making the labeled carbon super-heavy – like 28C,44C, 76C and 140C and then extrapolating back to just ordinary 13C. These trajectories indicate a ratio of 1’:1 of 0.990 in excellent agreement with the experimental value of 0.993.

Interestingly, most trajectories recross the TS, usually by reaching into the region near the second TS. However, the recrossing decreases with increasing isotopic mass, and this leads to the isotope effect. It turns out the vibrational mode 3 breaks the Cs symmetry; movement in one direction along mode 3 has no mass dependence but in the opposite direction, increased mass leads to decreased recrossing – or put in another way, in this direction, increased mass leads more often to product.

But one can understand this reaction from a statistical point of view as well. If one looks at the free energy surface, there is a variational TS near 3, but then there is a second set of variational transition states (one leading to 1 and one to 1’) which are associated with the formation of the second C-C bond. In a sense there is an intermediate past 3 that leads to two entropic barriers, one on a path to 1 and one on the path to 1’. RRKM using this model gives a ratio of 0.992 – again in agreement with experiment! It is as Singleton notes “perplexing”; how do you reconcile the statistical view with the dynamical (trajectory) view? Singleton has no full explanation.

Lastly, they point out that a similar situation occurs in the organocatalyzed Diels-Alder reaction of MacMillan shown below.2 (This reaction is also discussed in a previous post.) Now Singleton finds that the “substituent effects, selectivity, solvent effects, isotope effects and activation parameters” are all dictated by a second variational TS far removed from the conventional electronic TS.

References

(1) Gonzalez-James, O. M.; Kwan, E. E.; Singleton, D. A., "Entropic Intermediates and Hidden Rate-Limiting Steps in Seemingly Concerted Cycloadditions. Observation, Prediction, and Origin of an Isotope Effect on Recrossing," J. Am. Chem. Soc. 2011, 134, 1914-1917, DOI: 10.1021/ja208779k

(2) Ahrendt, K. A.; Borths, C. J.; MacMillan, D. W. C., "New Strategies for Organic Catalysis: The First Highly Enantioselective Organocatalytic Diels-Alder Reaction," J. Am. Chem. Soc. 2000, 122, 4243-4244, DOI: 10.1021/ja000092s.

## Saturday, March 3, 2012

### Prediction of cyclin-dependent kinase 2 inhibitor potency using the fragment molecular orbital method

M. P. Mazanetz, O. Ichihara, R. J. Law, M. Whittaker Journal of Cheminformatics 2011, 3:2 (Open Access)

The paper describes a surprisingly accurate method for predicting ligand-protein binding energies based, in part, on MP2/6-31G(d) ligand-protein interactions computed using the Fragment Molecular Orbital (FMO) method.  The correlation between predicted and observed binding free energies are $r^2=0.939$ and $r^2=0.842$ for the training and testing set, respectively.  The method is sufficiently efficient to be used within an industrial setting.  The method performs better than more conventional scoring function implemented in the MOE program.

The binding free energy is given by$$\Delta G_{bind}=X_H \Delta H_{bind}+X_{TS} (T \Delta S_{bind})+X_{psolv} \Delta\Delta G_{psolv}+X_{npsolv} \Delta\Delta G_{npsolv}+X_{residual}$$
$\Delta H_{bind}$ is the ligand-protein interaction energy computed from an FMO energy decomposition analysis using a small structural model of the protein. $T\Delta S_{bind}$ is set to 1 kcal/mol per rotateable bond, $\Delta\Delta G_{psolv}$ and $\Delta\Delta G_{npsolv}$ are computed using the Poisson-Boltzmann equation and the solvent accessible surface area, respectively.

The four coefficients are obtained using a partial least squares to latent structures analysis on 14 ligand-protein complexes (for which x-ray coordinates were available).  Surprisingly, $X_{TS}$ came out negative, so it is doubtful whether this term actually reports on an entropy change. The method is then applied to a test set of 14 new ligands for which x-ray structures where not available.

Another interesting feature of the method is that only the hydrogen atoms of the x-ray structures were geometry optimized (using MMFF94x).  In the case where x-ray structures were not available, the structures of the ligand-protein complexes "were built by manually modifying the reference ligand and then energy minimising the ligand whilst keeping the reference heavy atoms fixed".

Results where  $\Delta H_{bind}$ is computed using the MMFF94x force field instead of FMO was not reported.  This would answer the question whether FMO is truly needed.

Acknowledgments: I thank Dmitri Fedorov for alerting me to this paper, and Michael Mazanetz for helpful discussions.  Note that in Table 2 the column labels for the experimental and computational binding free energies should be switched.

Update (4/3/2012): $X_H=0.018613, X_{TS}=-0.485403, X_{psolv}=-0.024463, X_{npsolv}=1.729698, X_{residual}=0.258$.  The latter parameter is in kcal/mol.