Computational Chemistry Highlights: sampling

Showing posts with label sampling. Show all posts

Tuesday, July 29, 2014

Protein structure prediction from sequence variation

Debora S. Marks, Thomas A. Hopf and Chris Sander Nature Biotechnology 2012, 30, 1072
Contributed by +Jan Jensen

This perspective paper gives a great overview of a very new and very promising sub-field of computational protein structure determination that started with this 2011 paper (see also this interesting blogpost). The method predicts distance restraints between two amino acids by looking for correlated changes in the protein sequence. These distance restraints are then used to determine 3D protein structures using the same software package used to compute NMR structures using NOE constraints.

The method has been tested on globular and membrane proteins up to 258 and 483 amino acids, respectively. About 0.5 to 0.75 predicted constraints per residue is needed and ca 5$L$ (where $L$ is the number of amino acids) diverse sequences are needed to produce reasonable protein structures with $C_\alpha$ RMSDs < 5 Å relative to the corresponding x-ray structures.

A $C_\alpha$ RMSD of 5 Å may sounds like a lot but active site geometries may be significantly more accurate due to "strong evolutionary constraints". For example while the structure of trypsin was predicted with a $C_\alpha$ RMSD of 4.3 Å, the relative orientation of the catalytic triad was predicted with a $C_\alpha$ RMSD of only 0.6 Å (1.3 Å all atom-RMSD).

Furthermore, x-ray structures are often refined using, for example, MD simulations before they are used in computational studies. I would be very interesting to compare computational predictions (e.g. activation energies, pKa values of active site residues, or docking scores) based on x-ray structures and evolutionary constraints, i.e. to compare their chemical accuracy.

The number of available sequences is growing very quickly, so I believe the main general issue that must be addressed with this method is the efficient prediction of structures of globular proteins larger than ca 400 amino acids using distance restraints. This is still quite a demanding task.

The authors provide a very nice web-service for the prediction of contacts and structures, which I have used in my own research.

In conclusion, this method provides a very nice complement to homology modeling for cases where no close structural homologs, but many sequence homologues, are available. Given the pace with which new sequences are determined it won't be too many years before a reasonable protein structure can be predicted for the vast majority of cases.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Wednesday, May 21, 2014

Monte Carlo Free Ligand Diffusion with Markov State Model Analysis and Absolute Binding Free Energy Calculations

Takahashi, Ryoji, Víctor A. Gil, and Victor Guallar Journal of Chemical Theory and Computation 2014, 10, 282−288.

Contributed by +Jan Jensen

This study uses Monte Carlo (MC) sampling, and a Markov state model analysis of the resulting trajectories, to compute absolute binding free energies for four benzamidine ligands binding to trypsin that are in good agreement with experiment. The measured binding free energies for the same ligand vary a bit and the mean absolute deviation ranges from 0.9 to 1.4 kcal/mol.

The binding free energy for each ligand is derived from a Markov state model analysis of 840 MC trajectories constructed using six different random initial ligand positions - all well away from the protein surface. Each MC trajectory is constructed using the protein energy landscape exploration (PELE) method. There are three kinds of PELE MC moves: (1) the ligand can be translated or rotated rigidly, (2) the internal ligand geometry can be changed using a ligand-specific rotamer library, and (3) all protein atoms are displaced along a randomly picked mode derived from an anisotropic network model followed by minimization of all all atoms except the $\alpha$-carbons.

After each move is made the side-chain orientations close to the ligands are sampled from a rotamer library followed my an OPLS-AA/SGB energy minimization of all atoms affected by the move. The resulting "super move" is accepted or rejected based on a Metropolis criterion.

The total simulation time for a ligand is about 1 week using 64 cores. However, the binding site of each ligand could be identified using only 20-30 trajectories in 5-10 CPU hours. In fact, such a binding site search can be performed using the PELE web server developed by the authors.

With its use of "super moves" with extensive energy minimization this method strikes me as an excellent way to generate snapshots for QM/MM calculations and it seems to me it could be easily adapted to look at enzyme catalysis.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Sunday, March 9, 2014

Protein NMR Structures Refined with Rosetta Have Higher Accuracy Relative to Corresponding X-ray Crystal Structures

Binchen Mao, Roberto Tejero, David Baker, and Gaetano T. Montelione Journal of the American Chemical Society 2014, 136, 1893

Highlighted by +Jan Jensen

In a previous study Baker, Mentelione and co-workers showed that refining an NMR structure using Rosetta moved it closer to the x-ray structure, but increased the number of restraint violations (i.e. decreased the apparent agreement with the NMR data). A subsequent study of found the same for two additional proteins and raises intriguing questions:

Do those violated restraints reflect true structural differences between NMR structures and X-ray crystal structures? If that is the case, then would incorporating those NMR experimental restraints into Rosetta refinement drive the NMR structure away from its X-ray counterpart?

This study sets out to answer these questions in by refining 40 NMR structures using Rosetta or Rosettta + distance and dihedral angle restraints and comparing the resulting ensembles to the corresponding x-ray structure.

It is found that

unrestrained Rosetta refinement generally decreases the precision of NMR structures, while restrained Rosetta refinement can increase the precision of the side chain heavy atoms of otherwise well-defined residues. Additionally, restrained Rosetta refined structures fit the unassigned NOESY peak list data significantly better than unrestrained Rosetta refined structures. Rosetta refinement can generally improve the stereochemical quality and geometry of NMR structures. More specifically, the experimental backbone dihedral angle restraints can guide Rosetta to generate models with even better backbone structures than is achieved without restraints.

Thus, it is possible to find structural ensembles that agree better with x-ray structures and the measured NMR data. The refinement protocol allows for only relative limited movement of the protein compared to that used in protein structure determination, but converges much faster (10-20 minutes for a 100-residue protein). So this extra step add negligible computational cost to conventional NMR protein structure determination, but probably only applicable to relatively high quality NMR structural ensembles.

Intriguingly, the relaxed X-ray structures have lower energies than the restrained-Rosetta structures for most proteins. This means that the sampling is still incomplete. But does it also point to the different between solution phase and crystal structures? That may be so if the number of restraint violations in the relaxed X-ray structures are larger than for the restraint-refined structures.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Wednesday, February 19, 2014

Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples

Oliver F. Lange, Paolo Rossi, Nikolaos G. Sgourakis, Yifan Song, Hsiau-Wei Lee, James M. Aramini, Asli Ertekin, Rong Xiao, Thomas B. Acton, Gaetano T. Montelione, and David Baker Proceedings of the National Academy of Sciences USA 2012, 109, 10873
Contributed by +Jan Jensen

Most NMR structures deposited in the PDB are relatively small: < ~20 kDa (< ~220 amino acids) compared to an average protein size of 400-500 amino acids in eukaryotes. This is because the NMR spectra of larger proteins become increasingly messy and hard to assign because the peaks tend to broaden and overlap. Deuterating the protein leads to sharper peaks for C, N, and exchangeable amide protons. However, the methyl proton peaks disappear and this is a problem because conventional NMR structure determination relies in large part on the availability of a large number of methyl H-H (NOE) distance constraints that help guide the conformational search and identify the correct structure.

In 2010 Lange, Montelione, Baker and co-workers showed that the backbone chemical shifts, amide proton NOEs, and NH RDCs (a measure of NH bond dipole-dipole interactions) can be used to guide the sampling and, when combined with the ROSETTA force field, yield accurate protein structures. While this approach appears to work consistently well for smaller (< ~12 kDa) proteins the performance is very target dependent for larger proteins. The problem appears not to be the ROSETTA force field but rather the lack of constraints to help guide the sampling.

The current paper shows that the addition of NOE distance constraints involving protons on Ile, Leu, and Val residues are sufficient to get good protein structures of larger proteins (it is experimentally possible to selectively label these amino acids in an otherwise deuterated protein). Using this approach proteins up to 20 kDa can be determined quite reliably (only one case out of ten failed to converge) and even a good structure of the 370 residue maltose binding protein (MBP) could be obtained.

In broad strokes, the main difference between the current and conventional approaches to protein structure determination by NMR is that in the current approach the NMR data serves mainly to guide the sampling while the ROSETTA force field serves to identify the correct (lowest energy) structure. Conventional NMR structure determination tend to use rather simplistic force fields that cannot reliably score a structure, so the NMR data must also serve that role.

The use of the ROSETTA force field comes at a cost. For example, the MBP structure determination required ~40 hours using 512-cores. However, as the authors note "although these computer requirements generally exceed the in-lab resources of the average NMR lab, it is not problematic nowadays to allocate such resources e.g., through adjunct computer centers, cloud computing, or a grid project such as the European Grid Infrastructure (http://www.egi.eu)."

It should be noted that a "conventional" NMR structure of the MBP has already been obtained, but using significantly more experimental constraints than in this study and less automation. The question is not whether it is possible to obtain structures of large proteins but whether it is practically feasible to do on a semi-routine basis. Up until now the answer has been "no" and a small percentage of NMR labs focus on structure determination of large proteins. In light of the current study this ought to change.

This work is licensed under a Creative Commons Attribution 4.0 International License.

Wednesday, July 3, 2013

Atomic-level simulations of current-voltage relationships in single-file ion channels

M.Ø. Jensen; V. Jogini; M. P. Eastwood and D.E. Shaw. J.Gen. Physiol, 141: 619-632 (2013)

Contributed by Ben Corry

There has been much discussion in these pages about papers assessing the accuracy of classical molecular dynamics in reproducing structural aspects of proteins, but less about whether such simulations can be used to understand physiological processes. Many such processes take place over time scales of ms to s and simulating them directly has long been beyond the scope of atomistic methods. But, the situation is rapidly changing with advances in computer hardware and software. One example of an important physiological process close to my heart is the conduction of ions through trans-membrane channels which underlies electrical signalling in nerve cells and takes place on a ms time scale. I am sure that many of us working in the field have thought that direct atomistic simulations have the potential to elucidate the physical mechanisms underlying this process, and have wondered how accurately such simulations could reproduce this phenomena. Yet again, D.E. Shaw Research provide us with an answer.

Making use of extensive computational resources and more than 1ms of simulation time, Jensen et al¹ measure the ionic current passing through a voltage gated potassium channel and the simpler gramicidin A channel under a range of voltages, allowing for a direct comparison to one of the most fundamental experimentally measureable properties. Sadly, the results are disappointing. Currents are about 40 times less and 300 times less than equivalent (highly accurate) experimental measurements in the potassium channel and gramicidin A respectively.

What are the reasons for this poor performance? Jensen et al point the finger at the most likely culprit, the accuracy of the non-polarisable biomolecular force field. Altering parameters such as the interaction strength between permeating ions and the protein has only a very small influence on the calculated current highlighting that simple modifications are unlikely to resolve the discrepancy between simulation and experiment. Deficiencies in the lipid model, such as the overestimation of the membrane dipolar potential, are also discussed; and while these are suggested to be a significant factor in the poor performance of the simulation they do not appear to be the major reason for the underestimation of ion currents. The final suggestion is that polarisable force fields may be required to accurately reproduce permeation rates under experimental conditions. Indeed, in the simple case of gA in which ions permeate one at a time, it would appear plausible that the inclusion of polarisability would improve the results by stabilising ions in the pore and reducing the barriers to permeation. It is not discussed is how much the structure of the proteins change during the simulations, especially under the influence of the electric field. If the structures of the proteins deviate from reality (again due to force field limitations) it is possible that this could also contribute to the poor performance of the simulations, in addition to problems with the ion-protein interactions and lack of polarisability. Given that it has only just become feasible to directly simulate ion currents with non polarisable force fields, it may be some time yet before we know if the use of polarisable ones will make the brute force simulation of physiological processes reliable.

References

1. M.Ø. Jensen; V. Jogini; M. P. Eastwood and D.E. Shaw. J. Gen. Physiol, 141: 619-632

Thursday, August 23, 2012

Refinement of protein structure homology models via long, all-atom molecular dynamics simulations

Alpan Raval, Stefano Piana, Michael P. Eastwood, Ron O. Dror, and David E. Shaw, Proteins 2012, 80, 2071-2079 (Paywall)

Contributed by Victor Guallar

Many theoretical chemists work routinely on biological systems and, in particular, on proteins. While it might not be their main interest, predicting the conformational sampling associated to these systems is certainly a concern. Those who have been around for a while have seen how the necessary conformational sampling has moved from few picoseconds to hundreds of nanoseconds and even microseconds (while I do not agree, molecular dynamics has almost the exclusivity as a sampling technique). Clearly the latest development of special-purpose computers, such as the remarkable effort from the D. E. Shaw Research group, together with the development of molecular dynamics for graphical processing units, have contributed to this time expansion. Along these advances we surely had the following questions: are the force fields up to it?, how meaningful are these long molecular dynamics simulations?

The Shaw group has probably already answered these questions for us. In a comprehensive study¹ they produce at least a hundred microseconds simulation for 24 proteins used in recent CASP competitions. They frame their study under the capabilities of molecular dynamics (and force fields) in refining homology models. Thus, for each system they produce a trajectory from both an initial homology model and from the native X-ray structure (or NMR). This study followed a previous one where the simulations were capable of accurately reproducing the native state on several fast-folders. The results this time, however, are quite surprising and even worrisome. For most of the systems the structures drift away from the native state. Furthermore, this drift occurs even when starting from the native state. Overall the results indicate that for most systems the force field minimum is not consistent with the X-ray or NMR experimental structures. While the authors only used two force fields (considered to be the best ones), they conclude that most likely this is a limitation for all available force fields.

The authors obtain better results when imposing constraints to the simulation (limiting the drift away from the native structure). Thus, one can conclude from this work that brute force molecular dynamics simulations are still far away from being accurate. Obviously similar conclusion could be applied to other sampling techniques using the same force fields (for example Monte Carlo techniques). While we wait for better force fields (maybe polarizable ones such AMOEBA?), we probably should use molecular dynamics as a local exploration rather than to predict novel conformations, or to score significantly different ones. Of course these limitations might not apply to those systems with a strong preference for a state, such as fast-folder proteins.

References

1 Alpan Raval, Stefano Piana, Michael P. Eastwood,1 Ron O. Dror, and David E. Shaw, Proteins 2012, 80, 2071-2079

Saturday, February 11, 2012

NMR Structure Determination for Larger Proteins Using Backbone-Only Data

S. Raman, O. F. Lange, P. Rossi, M. Tyka, X. Wang, J. Aramini, G. Liu, T. A. Ramelot, A. Eletsky, T. Szyperski, M. A. Kennedy, J. Prestegard, G. T. Montelioni, D. Baker Science 2010, 327, 1014 (Free access with registration)

Baker and co-wokers present show that inclusion of experimental NMR data for the backbone atoms (chemical shifts, RDC, and H^N-H^N NOEs) that can be measured relatively easily can help significantly in protein structure determination using ROSETTA. This CS-RDC-NOE-ROSETTA protocol not only helps guide the conformational search, but also helps alleviate errors in the energy function.

The protocol was tested on 12 protein with up to 266 residues, which is quite large by NMR standards. For the larger proteins "the computed structures are not completely converged and have large disordered regions. Furthermore, the method was validated by a blind test on five proteins (with up to 122 residues) in which the CS-RDC-NOE-ROSETTA protocol found satisfactory structures in all cases.

Another major step forward in protein structure determination from the Baker lab. I could not find any mentioned of the required CPU time, but it is no doubt substantial.

This work is licensed under a Creative Commons Attribution 3.0 Unported License.

Pages

Tuesday, July 29, 2014

Wednesday, May 21, 2014

Sunday, March 9, 2014

Wednesday, February 19, 2014

Wednesday, July 3, 2013

Thursday, August 23, 2012

Saturday, February 11, 2012