## Tuesday, July 29, 2014

### Protein structure prediction from sequence variation

Debora S. Marks, Thomas A. Hopf and Chris Sander Nature Biotechnology 2012, 30, 1072
Contributed by +Jan Jensen

This perspective paper gives a great overview of a very new and very promising sub-field of computational protein structure determination that started with this 2011 paper (see also this interesting blogpost).  The method predicts distance restraints between two amino acids by looking for correlated changes in the protein sequence. These distance restraints are then used to determine 3D protein structures using the same software package used to compute NMR structures using NOE constraints.

The method has been tested on globular and membrane proteins up to 258 and 483 amino acids, respectively. About 0.5 to 0.75 predicted constraints per residue is needed and ca 5$L$ (where $L$ is the number of amino acids) diverse sequences are needed to produce reasonable protein structures with $C_\alpha$ RMSDs < 5 Å relative to the corresponding x-ray structures.

A $C_\alpha$ RMSD of 5 Å may sounds like a lot but active site geometries may be significantly more accurate due to "strong evolutionary constraints".  For example while the structure of trypsin was predicted with a $C_\alpha$ RMSD of 4.3 Å, the relative orientation of the catalytic triad was predicted with a $C_\alpha$ RMSD of only 0.6 Å (1.3 Å all atom-RMSD).

Furthermore, x-ray structures are often refined using, for example, MD simulations before they are used in computational studies. I would be very interesting to compare computational predictions (e.g. activation energies, pKa values of active site residues, or docking scores) based on x-ray structures and evolutionary constraints, i.e. to compare their chemical accuracy.

The number of available sequences is growing very quickly, so I believe the main general issue that must be addressed with this method is the efficient prediction of structures of globular proteins larger than ca 400 amino acids using distance restraints. This is still quite a demanding task.

The authors provide a very nice web-service for the prediction of contacts and structures, which I have used in my own research.

In conclusion, this method provides a very nice complement to homology modeling for cases where no close structural homologs, but many sequence homologues, are available. Given the pace with which new sequences are determined it won't be too many years before a reasonable protein structure can be predicted for the vast majority of cases.

This work is licensed under a Creative Commons Attribution 4.0 International License.

## Friday, July 25, 2014

### Local hyperdynamics

Soo Young Kim, Danny Perez and Arthur F. Voter J. Chem. Phys. 139, 144110 (2013)
Contributed by David Bowler
Reposted from Atomistic Computer Simulations with permission

One of the key desires in atomistic simulations of all kinds, whether chemistry, biology, physics, materials science or earth sciences, is to be able to model accurately the long-time evolution of large systems. There have been good advances in recent years in linear scaling approaches to DFT and quantum chemistry[1], and significant progress in time acceleration (e.g. metadynamics[2] and hyperdynamics[3]).

The paper I will discuss here[4] points to a way to combine these two efforts, so that significant time acceleration can be applied to large systems. In combination with linear scaling electronic structure methods, this will allow us to achieve linear scaling in both size and time. The method, local hyperdynamics (LHD), is a development of the original hyperdynamics (HD) method, which is designed to increase the rate at which a system escapes from energy valleys. The idea behind HD is fairly simple, though elegant: a boost potential is added to the energy surface of the system being modelled so as to raise the bottom of the valleys in the system. This boost potential must go to zero if the system approaches a transition state; this requirement ensures that the rates of escape are not changed relative to each other, but leads to poor scaling with system size. As a larger system is modelled, it is more likely that a transition state will be found in some part of the system, and the boost becomes zero more and more of the time. The boosted time or hypertime can be evaluated during the simulation and related to the true system dynamics.

LHD defines local regions (centred normally on bonds) within which a local boost is applied. This allows the dynamics of the entire system to be boosted, and as the regions are independent, when one region goes through a transition, the other regions will not be affected. As the boost is only local, the dynamics are no longer conservative, though the authors make good arguments to suggest that, on average, using a Langevin thermostat, dynamics which are very close to conservative are followed.

The regions need to be large enough to encompass the characteristic length-scale of interactions and transitions being modelled, otherwise there will may be incorrect boosts from neighbouring domains. Once the system size is larger than the region size, the authors demonstrate that the boost factor achieved is constant with system size.

As each region has its own boost factor, some care has to be applied to ensure that boosting is uniform throughout the system. Given a target boost factor, the local boosting can be monitored and adjusted to ensure that it matches the target on average. The authors suggest that this time-evolving correction to the boost should be called a boostostat. There are careful, detailed statistical arguments which are worth careful reading in the paper.

The paper is also extremely careful in exploring the effects of the assumptions imposed. There will be force mismatches at the region edges, along with the lack of conservative dynamics. However, plausible arguments are made that these should average out for relatively uniform systems. A challenging system such as water, which contains both weak and strong bonds, might well show more significant deviation from these assumptions.

The authors tested the method with an embedded atom method (EAM) for Ag(100) with adatoms, vacancies and steps, and show excellent agreement with normal MD. For rare events, they show that boost of 10^6 are possible and give good agreement with transition state theory, though normally they work on boost factors around 100.

The Conquest developers (of which I am one) have recently performed full DFT molecular dynamics on 32,768 atoms for 2ps, and static calculations on over 1,000,000 atoms; in combination with LHD, we can see that DFT MD should be possible on 100,000+ atoms for nanoseconds and beyond.

[1] Rep. Prog. Phys. 75 036503 (2012) DOI:10.1088/0034-4885/75/3/036503
[2] Rep. Prog. Phys. 71, 126601 (2008) DOI:10.1088/0034-4885/71/12/126601
[3] Phys. Rev. Lett., 78, 3908(1997) DOI:10.1103/PhysRevLett.78.3908
[4] J. Chem. Phys., 139, 144110 (2013) DOI:10.1063/1.4824389

## Wednesday, July 23, 2014

### Dearomatization-Induced Transannular Cyclization: Synthesis of Electron-Accepting Thiophene-S,S-Dioxide-Fused Biphenylene

Fukazawa, A.; Oshima, H.; Shimizu, S.; Kobayashi, N.; Yamaguchi, S.  J. Am. Chem. Soc. 2014, 136, 8738-8745
Contributed by Steven Bachrach.
Reposted from Computational Organic Chemistry with permission

Aromaticity and orbital symmetry rules, though seemingly of ancient origin, remain areas of active interest. This paper by Fukazawa, et al combine both issues.1 The multiple-step electrocyclization of 1gives 2 in a reaction that takes 9 days at 80 °C. What would be the effect of diminishing the aromatic character of the fused rings of 1? Would the reaction be faster or slower?

Before discussing the experimental results, let’s examine the B3LYP/6-31G(d) results for the reaction of1’3 and 5. (Note that a slightly smaller pendant substituent is used in the computations than in the experiment.) The optimized geometries of the critical points along the reaction pathway for the cyclization of 3 are shown in Figure 1.

 3(0.0) 3-TS1(17.9) 3-INT(10.4) 3-TS2(13.3) 4(-60.7)
Figure 1. B3LYP/6-31G(d) optimized geometries and relative energies (kcal mol-1) for the critical points along the reaction 3 → 4.
Remember that all structures on my blog can be viewed interactively by clicking on the image of the molecule.

For 1’, the first barrier (for the 8π cyclization) has a barrier of about 23 kcal mol-1, but the second step (the 4π cyclization) has an even larger barrier of 28 kcal mol-1. However, reducing the aromaticity of one of the fused rings (compound 3) leads to lower barriers of 18 and 13 kcal mol-1. For the cyclization of 5, only a single transition state was found – no intermediate and no second TS – with a barrier of 12 kcal mol-1. Thus, removing these external aromatic rings reduces the barrier of the reaction, and that is exactly what is found experimentally!

### References

(1) Fukazawa, A.; Oshima, H.; Shimizu, S.; Kobayashi, N.; Yamaguchi, S. "Dearomatization-Induced Transannular Cyclization: Synthesis of Electron-Accepting Thiophene-S,S-Dioxide-Fused Biphenylene," J. Am. Chem. Soc. 2014136, 8738-8745, DOI: 10.1021/ja503499n.

### InChIs:

1: InChI=1S/C44H64S4Si4/c1-41(2,3)49(13,14)37-25-29-30-26-38(50(15,16)42(4,5)6)46-34(30)23-24-36-32(28-40(48-36)52(19,20)44(10,11)12)31-27-39(51(17,18)43(7,8)9)47-35(31)22-21-33(29)45-37/h25-28H,1-20H3/b30-29-,32-31-
InChIKey=OCNQBMWQONUVNH-IOYDOZLVSA-N
1’:InChI=1S/C32H40S4Si4/c1-37(2,3)29-17-21-22-18-30(38(4,5)6)34-26(22)15-16-28-24(20-32(36-28)40(10,11)12)23-19-31(39(7,8)9)35-27(23)14-13-25(21)33-29/h17-20H,1-12H3/b22-21-,24-23-
InChIKey=GTFPBRMBCLREPG-ICHHBZPXSA-N
2: InChI=1S/C44H64S4Si4/c1-41(2,3)49(13,14)29-21-25-26-22-30(50(15,16)42(4,5)6)46-38(26)34-33(37(25)45-29)35-36(34)40-28(24-32(48-40)52(19,20)44(10,11)12)27-23-31(47-39(27)35)51(17,18)43(7,8)9/h21-24H,1-20H3
InChIKey=OTDXAOVIIQYYNV-UHFFFAOYSA-N
2’: InChI=1S/C32H40S4Si4/c1-37(2,3)21-13-17-18-14-22(38(4,5)6)34-30(18)26-25(29(17)33-21)27-28(26)32-20(16-24(36-32)40(10,11)12)19-15-23(35-31(19)27)39(7,8)9/h13-16H,1-12H3
InChIKey=IYZNCPPDTHWWCO-UHFFFAOYSA-N
3: InChI=1S/C32H40O2S4Si4/c1-39(2,3)29-17-21-22-18-30(40(4,5)6)37-27(22)15-16-28-24(20-32(38(28,33)34)42(10,11)12)23-19-31(41(7,8)9)36-26(23)14-13-25(21)35-29/h17-20H,1-12H3/b22-21-,24-23-
InChIKey=ZJBDGDJVLGNVOD-ICHHBZPXSA-N
4: InChI=1S/C32H40O2S4Si4/c1-39(2,3)21-13-17-18-14-22(40(4,5)6)36-30(18)26-25(29(17)35-21)27-28(26)32-20(16-24(38(32,33)34)42(10,11)12)19-15-23(37-31(19)27)41(7,8)9/h13-16H,1-12H3
InChIKey=QUSJUOMZBJUGON-UHFFFAOYSA-N
5: InChI=1S/C32H40O8S4Si4/c1-45(2,3)29-17-21-22-18-30(46(4,5)6)42(35,36)26(22)15-16-28-24(20-32(44(28,39)40)48(10,11)12)23-19-31(47(7,8)9)43(37,38)27(23)14-13-25(21)41(29,33)34/h17-20H,1-12H3/b22-21-,24-23-
InChIKey=NNZTUSIYEPMHMP-ICHHBZPXSA-N
6: InChI=1S/C32H40O8S4Si4/c1-45(2,3)21-13-17-18-14-22(46(4,5)6)42(35,36)30(18)26-25(29(17)41(21,33)34)27-28(26)32-20(16-24(44(32,39)40)48(10,11)12)19-15-23(47(7,8)9)43(37,38)31(19)27/h13-16H,1-12H3
InChIKey=JZHQQYXUIQXWLQ-UHFFFAOYSA-N

This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

## Friday, July 11, 2014

### Error Estimates for Solid-State Density-Functional Theory Predictions: An Overview by Means of the Ground-State Elemental Crystals

K. Lejaeghere, V. Van Speybroeck, G. Van Oost & S. Cottenier Critical Reviews in Solid State and Materials Sciences 2014, 39, 1-24
Contributed by David Bowler
Reposted from Atomistic Computer Simulations with permission

The question of how to characterise the accuracy of a computer code is a difficult one, and I have touched on these issues before (here, here and here for instance). However, given the large number of codes available, it should be possible to compare them to each other and to experiment (or higher level calculations) to test them. This is a well-established process in the quantum chemistry community, where there are various test sets for different properties, including enthalpies of formation(G97/2)[1], weak bonding (S22)[2] (and a vast database of different properties[3]).

A recent paper[4] and associated web site[5] offers a first approach for solid state codes, with the comparison based on the differences between all-electron and pseudopotential calculations. The presumption here is that all-electron calculations are the touchstone (though the website notes that the all-electron results have been refined to use extremely accurate tolerances and small muffin-tin radii, so there is clearly room for improvement in any method).

The idea of the comparison is to calculate binding energy curves for most elements in the periodic table, and from these curves to derive a single number which characterises the deviation from all-electron results. The deviation per element can be viewed, as can the deviation from experiment for the all-electron calculations. The deviation, delta, is defined by an integral over all calculated values of volume, and thus includes implicitly both the lattice constant and the bulk modulus, though not the cohesive energy. Most results shown on the website are for plane-wave codes, which generally perform rather well (the old norm-conserving FHI pseudopotentials are less accurate, though, and should be treated with care).

The approach is a good one, though a little heavy on the tester, and scripts to perform the necessary calculations are made freely available. However, the choice of the elemental form makes the tests rather restricted: there is no way to examine different types of bonding or different oxidation state, for instance. It is quite easy to imagine developing a set of these test suites for different purposes in solid state codes, just as there are different test sets in quantum chemistry.

The accuracy tests seem to be most illuminating for the pseudopotentials, rather than the codes themselves, and I think that it would be of immense value to the community if the pseudopotential generation details were made available. This should not just include the core radii and the reference configurations, but also a clear description of the pseudopotential algorithm (or an appropriate reference along with details).

There is something of a danger of using codes and supplied pseudopotential libraries as black boxes: there is a need to test parameters, though it’s rare (and I am happy to acknowledge that I don’t do it as much as I should). This paper and associated developments should go some way to standardising plane wave codes, and giving quantitative information on their reliability.

Note As I was writing this, a new paper in Science has just been published which takes a different view, and compares results from different functionals; it’s worth a read[6]

[1] J. Chem. Phys. 106, 1063 (1997) DOI:10.1063/1.473182
[2] Phys. Chem. Chem. Phys. 8, 1985 (2006) DOI: 10.1039/B600027D
[3] http://t1.chem.umn.edu/db/ and see arXiv http://arxiv.org/abs/1212.0944
[4] Crit. Rev. Sol. Stat. Mat. Sci. 39, 1–24. DOI:10.1080/10408436.2013.772503
[5] http://molmod.ugent.be/deltacodesdft
[6] Science 345, 197 (2014) DOI:10.1126/science.1253486

### Synthesis, Characterization, and Properties of [4]Cyclo-2,7-pyrenylene: Effects of Cyclic Structure on the Electronic Properties of Pyrene Oligomers

Iwamoto, T.; Kayahara, E.; Yasuda, N.; Suzuki, T.; Yamago, S. Angew. Chem. Int. Ed. 2014, 53, 6430-6434
Contributed by Steven Bachrach.
Reposted from Computational Organic Chemistry with permission

Macrocycles composed of aromatic subunits, like polycycloparaphenylenes, are of interest as components of nanotubes and for possible interesting optical properties. Tremendous advances have occurred over the past decade in preparing these rings ; see for examples these posts. Yamago now reports on the synthesis, optical properties and structure of [4]cyclo-2,7-pyrenylene 1, made by joining four pyrene units together.1

B3LYP/6-31G(d) optimization of the structure of 1 reveals a D2 geometry (Figure 1). This structure shows a very distorted pyrene unit. The strain energy of 1 is estimated as 392 kJ mol-1 (though how this was arrived at is not mentioned!), which is much larger than the strain energy of [8]-cycloparaphenylene.
Figure 1. B3LYP/6-31G(d) optimized structure of 1
This is another molecule to be sure to click on and rotate using JMol.

The nature of the HOMO and LUMO of 1 is very different than that of linear tetra-2,7-pyrene. The degenerate HOMOs and degenerate LUMOs of the linear compound have a node at the 2 and 7 positions and are localized to the terminal and central pyrene units, respectively. The HOMO and LUMO of 1 are fully delocalized. The implications of this are seen in the spectroscopy and electrochemistry of 1.

### References

(1) Iwamoto, T.; Kayahara, E.; Yasuda, N.; Suzuki, T.; Yamago, S. "Synthesis, Characterization, and Properties of [4]Cyclo-2,7-pyrenylene: Effects of Cyclic Structure on the Electronic Properties of Pyrene Oligomers," Angew. Chem. Int. Ed. 201453, 6430-6434, DOI:http://dx.doi.org/10.1002/anie.201403624.

### InChIs

1: InChI=1S/C64H32/c1-2-34-18-50-20-36-4-3-35-19-49(17-33(1)57(35)58(34)36)51-21-37-5-7-41-25-53(26-42-8-6-38(22-51)59(37)61(41)42)55-29-45-13-15-47-31-56(32-48-16-14-46(30-55)63(45)64(47)48)54-27-43-11-9-39-23-52(50)24-40-10-12-44(28-54)62(43)60(39)40/h1-32H/b51-49-,52-50-,55-53-,56-54-
InChIKey=AWPUJIHMRRFHHU-GYSOBDCPSA-N

This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.