Wednesday, February 19, 2014

Determination of solution structures of proteins up to 40 kDa using CS-Rosetta with sparse NMR data from deuterated samples

Oliver F. Lange, Paolo Rossi, Nikolaos G. Sgourakis, Yifan Song, Hsiau-Wei Lee, James M. Aramini, Asli Ertekin, Rong Xiao, Thomas B. Acton, Gaetano T. Montelione, and David Baker Proceedings of the National Academy of Sciences USA 2012, 109, 10873
Contributed by +Jan Jensen

Most NMR structures deposited in the PDB are relatively small: < ~20 kDa (< ~220 amino acids) compared to an average protein size of 400-500 amino acids in eukaryotes. This is because the NMR spectra of larger proteins become increasingly messy and hard to assign because the peaks tend to broaden and overlap.  Deuterating the protein leads to sharper peaks for C, N, and exchangeable amide protons.  However, the methyl proton peaks disappear and this is a problem because conventional NMR structure determination relies in large part on the availability of a large number of methyl H-H (NOE) distance constraints that help guide the conformational search and identify the correct structure.

In 2010 Lange, Montelione, Baker and co-workers showed that the backbone chemical shifts, amide proton NOEs, and NH RDCs (a measure of NH bond dipole-dipole interactions) can be used to guide the sampling and, when combined with the ROSETTA force field, yield accurate protein structures.  While this approach appears to work consistently well for  smaller (< ~12 kDa) proteins the performance is very target dependent for larger proteins.  The problem appears not to be the ROSETTA force field but rather the lack of constraints to help guide the sampling.

The current paper shows that the addition of NOE distance constraints involving protons on Ile, Leu, and Val residues are sufficient to get good protein structures of larger proteins (it is experimentally possible to selectively label these amino acids in an otherwise deuterated protein). Using this approach proteins up to 20 kDa can be determined quite reliably (only one case out of ten failed to converge) and even a good structure of the 370 residue maltose binding protein (MBP) could be obtained.  

In broad strokes, the main difference between the current and conventional approaches to protein structure determination by NMR is that in the current approach the NMR data serves mainly to guide the sampling while the ROSETTA force field serves to identify the correct (lowest energy) structure. Conventional NMR structure determination tend to use rather simplistic force fields that cannot reliably score a structure, so the NMR data must also serve that role.

The use of the ROSETTA force field comes at a cost. For example, the MBP structure determination required ~40 hours using 512-cores.  However, as the authors note "although these computer requirements generally exceed the in-lab resources of the average NMR lab, it is not problematic nowadays to allocate such resources e.g., through adjunct computer centers, cloud computing, or a grid project such as the European Grid Infrastructure (http://www.egi.eu)."

It should be noted that a "conventional" NMR structure of the MBP has already been obtained, but using significantly more experimental constraints than in this study and less automation. The question is not whether it is possible to obtain structures of large proteins but whether it is practically feasible to do on a semi-routine basis. Up until now the answer has been "no" and a small percentage of NMR labs focus on structure determination of large proteins.  In light of the current study this ought to change.