Sunday, March 26, 2017

The Elephant in the Room of Density Functional Theory Calculations

Stig Rune Jensen, Santanu Saha, José A. Flores-Livas, William Huhn, Volker Blum, Stefan Goedecker, and Luca Frediani (2017)
Contributed by Jan Jensen

While basis set convergence sounds straightforward (though time-consuming) it is hard to rule out that underlying assumptions in  the design of the basis set influences the results.  However, converged basis set DFT results are needed to separate basis set errors from errors due to the functional. Multiwavelets, a systematic and adaptive multiresolution numerical solution of the one-electron problem, appear to be a way around this.

The paper presents PBE and PBE0 total energies, atomization energies, and dipoles moments for 211 molecules that are converged with respect to basis set to μHartree accuracy, and benchmarks Gaussian-type orbitals (GTOs), all-electron numeric atom-centered orbitals (NAOs) and full-potential augmented plane wave (APW) calculations. 

In the case of atomization energies, a quintuple GTO basis set (aug-cc-pV5Z) is needed to reach a 1 kcal/mol accuracy in both MAE and RMSE. For aug-cc-pVQZ the MAE is below 1 kcal/mol, but the RMSE is about 1.5 kcal/mol.  Perhaps more importantly, the maxAE goes from ca 10 to 2-5 kcal/mol on going from quadruple to pentuple basis set.  So even aug-cc-pV5Z cannot consistently reach the basis set limit for atomization energies!  It would have been very interesting to see whether extrapolated-CBS values are able to do this.

This dataset will be an important resource for developers of both DFT and basis sets.

This work is licensed under a Creative Commons Attribution 4.0

Simulation-Based Algorithm for Two-Dimensional Chemical Structure Diagram Generation of Complex Molecules and Ligand–Protein Interactions

Frączek, T. J. Chem. Inform. Model. 2016, 56, 2320-2335
Contributed by Steven Bacharach
Reposted from Computational Organic Chemistry with permission

Making a good drawing of a chemical structure can be a difficult task. One wants to prepare a drawing that provides a variety of different information in a clean and clear way. We tend to want equal bond lengths, angles that are representative of the atom’s hybridization, symmetrical rings, avoided bond crossings, and the absence of overlapping groups. These ideals may be difficult to manage. Sometimes we might also want to represent something about the actual 3-dimensional shape. So for example, the drawing on the left of Figure 1 properly represents the atom connectivity with no bond crossing, but the figure on the right is probably the image all organic chemists would want to see for cubans.

Figure 1. Two drawing of cubane

For another example, the drawing on the left of Figure 2 nicely captures the relative stereo relationships within D-glucose, but the drawing on the right adds in the fact that the cyclohexyl ring is in a chair conformation. Which drawing is better? Well, it likely is in the eye of the beholder, and the context of the chemistry at hand.
Figure 2. Two drawings of D-glucose.

Frączek has reported on an automated procedure for creating aesthetically pleasing 2-D drawings of chemical structures.1 The method involves optimizing distances between atoms projected onto a 2-D plane, along with rules to try to keep atom lengths and angles similar, and symmetrical rings, and minimize overlapping bonds. He shows a number of nice examples, especially of natural products, where his automated procedure PSM (physical simulation method) provides some very nice drawings, often noticeably superior to those generated by previously proposed schemes for preparing drawings.

Using the web site he has developed (, I recreated the structures of some of the molecules I have discussed in this blog. In Figure 3, these are shown side-by-side to my drawings. My drawings were generally done with MDL/Isis/Accelrys/Biovia Draw (available for free for academic users) with an eye towards representing what I think is a suitable view of the molecule based on what I am discussing in the blog post. For many molecules, PSM does a very nice job, sometimes better than what I have drawn, but in some cases PSM produces an inferior drawing. Nonetheless, creating nice chemical drawings can be tedious and PSM offers a rapid option, worthy of at least trying out. Ultimately, what we decide to draw and publish is often an aesthetic choice and each individual must decide on one’s own how best to present one’s work.

My Drawing
Figure 3. Comparison of my drawings vs. drawing made by PSM.


1) Frączek, T., "Simulation-Based Algorithm for Two-Dimensional Chemical Structure Diagram Generation of Complex Molecules and Ligand–Protein Interactions." J. Chem. Inform. Model. 2016, 56, 2320-2335, DOI: 10.1021/acs.jcim.6b00391.

This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Sunday, February 26, 2017

Towards full Quantum Mechanics based Protein-Ligand Binding Affinities

Stephan Ehrlich, Andreas H. Göller, and Stefan Grimme (2017)
Contributed by Jan Jensen

Erlich et al. presents absolute binding free energies for activated serine protease factor X (FXa) and tyrosine-protein kinase 2 predicted using DFT. Here I'll focus on FXa. The calculations are based on truncated model systems consisting of ca 1000 atoms. The geometries are optimised using HF-3c/C-PCM and select constraints, the RRHO free energy correction with DFTB3-D3, the electronic energy with PBE-3c, and the solvation free energy with COSMO-RS and PBE0/def-SVP. The energy terms are simply added together to give a total free energy and the binding free energy is simply the change in free energy upon binding without any additional corrections.

The MAD is similar to that found for host-guest complexes but there are clearly some outliers. The authors ascribe L19 and L27 to errors in the structures due to HF-3c artefacts, while L23 is ascribed to the movement of a crystal water molecule and L10 is the only charged ligand where the error in the solvation free energy is likely higher. The error is below 1.5 kcal/mol for 14 of the 25 ligands.

Clearly there is room for improvement but I do think the results are quite encouraging. A MM-PB(GB)SA study in which five different solvation models are tested for the same ligands found maximum $r$ values of 0.28 and 0.60 using ensemble averaged and energy minimised structures respectively. Furthermore, study determined the relative binding free energies using thermodynamic integration, which is generally considered the current gold standard in the drug design, for five ligand pairs (see table, energies in kcal/mol). Given that there is only five points any statistical analysis of the accuracy would be suspect, but I don't think TI can be said to outperform DFT.

DFT TI exp
5->18 4.0 -0.4 1.1
5->12 2.1 0.4 0.4
5->21 7.5 1.3 4.4
5->17 4.1 -0.2 -0.3
5->24 4.1 0.4 3.6

The real question is whether the DFT results can be systematically improved and the main sticking point here will ultimately be the solvation free energy, especially for charged ligands. The continuum model ultimately relies on a fit to experimental data so there is some degree of empiricism that is hard to remove. In principle it can be done by adding explicit water molecules but then the question is how to deal with the sampling in a cost effective way.

This work is licensed under a Creative Commons Attribution 4.0

Wednesday, February 22, 2017

Preparation of an ion with the highest calculated proton affinity: ortho-diethynylbenzene dianion

Poad, B. L. J.; Reed, N. D.; Hansen, C. S.; Trevitt, A. J.; Blanksby, S. J.; Mackay, E. G.; Sherburn, M. S.; Chan, B.; Radom, L., Chem. Sci. 2016, 7, 6245-6250
Contributed by Steven Bacharach
Reposted from Computational Organic Chemistry with permission

The new benchmark has been set for superbases. The previous record holder was LiO, with a computed proton affinity of 424.9 kcal mol-1. A new study by Poad, et al., examines the dianions of the three isomeric phenyldiacetylides: 1o1m, and 1p.1 Their computed proton affinities (G4(MP2)-6X) are 440.6, 427.0, and 425.6 kcal mol-1, respectively. The optimized geometries of these dianions are shown in Figure 1.



Figure 1. Optimized geometries of 1o1m, and 1p.

The authors also prepared these bases inside a mass spectrometer. All three deprotonate water, but do not deprotonate methane, though that might be a kinetic issue.
The authors speculate that 1o will be hard to beat as a base since loss of an electron is always a concern with small dianions.


1) Poad, B. L. J.; Reed, N. D.; Hansen, C. S.; Trevitt, A. J.; Blanksby, S. J.; Mackay, E. G.; Sherburn, M. S.; Chan, B.; Radom, L., "Preparation of an ion with the highest calculated proton affinity: ortho-diethynylbenzene dianion." Chem. Sci. 2016, 7, 6245-6250, DOI: 10.1039/C6SC01726F.


1o: InChI=1S/C10H4/c1-3-9-7-5-6-8-10(9)4-2/h5-8H/q-2
1m: InChI=1S/C10H4/c1-3-9-6-5-7-10(4-2)8-9/h5-8H/q-2
1p: InChI=1S/C10H4/c1-3-9-5-7-10(4-2)8-6-9/h5-8H/q-2

This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Sunday, February 12, 2017

Conformer-specific hydrogen atom tunnelling in trifluoromethylhydroxycarbene

Mardyukov, A.; Quanz, H.; Schreiner, P. R., Nat. Chem. 2017, 9, 71–76
Contributed by Steven Bacharach
Reposted from Computational Organic Chemistry with permission

The Schreiner group has again reported an amazing experimental and computational study demonstrating a fascinating quantum mechanical tunneling effect, this time for the trifluoromethylhydroxycarbene (CF3COH) 2.1 (I have made on a number of posts discussing a series of important studies in this field by Schreiner.) Carbene 2 is formed, in analogy to many other hydroxycarbenes, by flash vapor pyrolysis of the appropriate oxoacid 1 and capturing the products on a noble gas matrix.

Carbene 2t is observed by IR spectroscopy, and its structure is identified by comparison with the computed CCSD(T)/cc-pVTZ frequencies. When 2t is subjected to 465 nm light, the signals for 2t disappear within 30s, and two new species are observed. The first species is the cis conformer 2c, confirmed by comparison with its computed CCSD(T)/cc-pVTZ frequencies. This cis conformer remains even with continued photolysis. The other product is determined to be trifluoroacetaldehyde 3. Perhaps most interesting is that 2t will convert to 3 in the absence of light at temperatures between 3 and 30 K, with a half-life of about 144 h. There is little rate difference at these temperatures. These results are quite indicative of quantum mechanical tunneling.

To aid in confirming tunneling, they computed the potential energy surface at CCSD(T)/cc-pVTZ. The trans isomer is 0.8 kcal mol-1 lower in energy that the cis isomer, and this is much smaller than for other hydroxycarbenes they have examined. The rotational barrier TS1 between the two isomer is quite large, 26.4 kcal mol-1, precluding their interchange by classical means at matrix temperatures. The barrier for conversion of 2t to 3 (TS2) is also quite large, 30.7 kcal mol-1, and insurmountable at 10K by classical means. No transition state connecting 2c to 3 could be located. These geometries and energies are shown in Figure 1.





Figure 1. Optimized geometries at CCSD(T)/cc-pVTZ. Relative energies (kcal mol-1) of each species are listed as well.

WKB computations at M06-2X/6-311++G(d,p) predict a half-life of 172 h, in nice agreement with experiment. The computed half-life for deuterated 2t is 106 years, and the experiment on the deuterated analogue revealed no formation of deuterated 3.

The novel component of this study is that tunneling is conformationally selective. The CF3 group stabilizes the cis form probably through some weak HF interaction, so that the cis isomer can be observed, but no tunneling is observed from this isomer. Only the trans isomer has the migrating hydrogen atom properly arranged for a short hop over to the carbon, allowing the tunneling process to take place.


1) Mardyukov, A.; Quanz, H.; Schreiner, P. R., "Conformer-specific hydrogen atom tunnelling in trifluoromethylhydroxycarbene." Nat. Chem. 20179, 71–76, DOI: 10.1038/nchem.2609.


1: =1S/C3HF3O3/c4-3(5,6)1(7)2(8)9/h(H,8,9)
2: InChI=1S/C2HF3O/c3-2(4,5)1-6/h6H
3: InChI=1S/C2HF3O/c3-2(4,5)1-6/h1H

This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Sunday, January 22, 2017

Crystal Structure Determination of the Pentagonal-Pyramidal Hexamethylbenzene Dication C6(CH3)62+

Malischewski, M.; Seppelt, K., Angew. Chem. Int. Ed. 2017, 56, 368-370
Contributed by Steven Bacharach
Reposted from Computational Organic Chemistry with permission

Hypercoordinated carbon has fascinated chemists since the development of the concept of the tetravalent carbon. The advent of superacids has opened up the world of hypercoordinated species and now a crystal structure of a hexacoordinated carbon has been reported for the C6(CH3)62+ species 1.1

The molecule is prepared by first epoxidation of hexamethyl Dewar benzene, followed by reaction with Magic acid, and crystallized by the addition of HF. The crystal structure shows a pentamethylcyclopentadienyl base capped by a carbon with a methyl group. The x-ray structure is well reproduced by the B3LYP/def2-TZVP structure shown in Figure 1. (While this DFT method predicts a six-member isomer to be slightly lower in energy, MP2 does predict the cage as the lowest energy isomer.)

Figure 1. B3LYP/def2-TZVP optimized geometry of 1.

The Wiberg bond order for the bond between the capping carbon and each carbon of the five-member base is about 0.54, so the sum of the bond orders to the apical carbon is less than 4. The carbon is therefore not hypervalent, but it appears to truly be hypercoordinate. (A topological electron density analysis (AIM) study would have been interesting here.) NICS analysis indicates the cage formed by the apical carbon and the five-member ring expresses 3-D aromaticity. This can be thought of as coming from the C5(CH3)5+ fragment with its 4 electrons and the CCH3+ fragment with two electrons, providing 4+ 2 = 6 electrons for the aromatic cage.


1) Malischewski, M.; Seppelt, K., "Crystal Structure Determination of the Pentagonal-Pyramidal Hexamethylbenzene Dication C6(CH3)62+Angew. Chem. Int. Ed. 2017, 56, 368-370, DOI: 10.1002/anie.201608795.

This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Acetyl-CoA carboxylase inhibition by ND-630 reduces hepatic steatosis, improves insulin sensitivity, and modulates dyslipidemia in rats

Harriman, G., Greenwood, J., Bhat, S., Huang, X., Wang, R., Paul, D., Tong, L., Saha, A.K., Westlin, W.F., Kapeller, R. and Harwood, H.J., (2016)
Contributed by Jan Jensen

This paper describes the development of ND-630 (aka NDI-010976) which is currently in Phase 2 clinical trials and could help cure a serious liver disease called non-alcoholic steatohepatitis and potentially other diseases. I am highlighting it here because computational chemistry had a lot to do with its discovery both directly and indirectly.

The development of ND-630 is spearheaded by Nimbus Therapeutics, which is basically an off-shoot of Schrödinger, i.e. a company that uses Schrödinger's software to discover new drugs. One of the co-founders (at the VC company Atlas) writes:
Back in the spring of 2009, Atlas (where I'm a partner) founded the company with Schrödinger, a leading computational chemistry software company, after almost a year-long dialogue between myself and Ramy Farid, Schrödinger’s president. At this time, Schrödinger was launching a novel computational tool called WaterMap, an apt name for a technology that maps the energetics of water sites at the receptor-ligand interface, providing a potential roadmap for efficient ligand-receptor interactions. As this cutting-edge technology catalyzed some of our initial thinking, we called it Project Troubled Water Inc (PTW) for the first year or so. 
So in a way, this is also highlight of this article. To summarise: the company was founded because these people believed in computational chemistry as the main driving force behind drug discovery. Did the success of ND-630 prove them right?

Here's how they discovered ND-630 according to the article. They started with the crystal structure of Acetyl-CoA carboxylase with the natural product Soraphen A bound and identified two pockets with high-energy hydration sites using SiteMap and then WaterMap. Then they did a structure-based virtual screen of commercially available compounds using GlideXP and kept only compounds that hit the high-energy hydration sites in both pockets. Soraphen A and these compounds where then used to build two pharmacophore models, which, in turn, where used for a ligand-based virtual screen with hits further refined with GlideXP. "A combined virtual hit-list of a few thousand compounds was clustered to maximize diversity, and 300 representatives were chosen after visualization of the poses. This process led to the identification of ND-022 ... Subsequently, lead optimization proceeded rapidly, guided by WaterMap and Prime/MM-GBSA v. 2.2 estimates of binding free energy." Which finally led to ND-630.

So not exactly Derek Lowe's unicorn dream come true, but I think it's fair to call this computer aided drug design.

Thanks to Victor Guallar for bringing the article to my attention.

This work is licensed under a Creative Commons Attribution 4.0