Wednesday, March 14, 2012

Numerical Errors and Chaotic Behavior in Docking Simulations

Miklos Feher and Christopher I. Williams, Journal of Chemical Information and Modeling, DOI: 10.1021/ci200598m (paywall)

The authors have made a very interesting study of the effect of numerical problems in molecular docking. Specifically, they have studied the two docking software GOLD and GLIDE, both with three different accuracy settings.

As the authors point out in the conclusion:  
"This study clearly demonstrates that seemingly insignificant differences in ligand input, such as small coordinate perturbations or permuting the atom order in an input file, can have a dramatic effect on the final top-scoring docked pose."

The authors have investigated how robust the docking algorithms are with respect to poses and scores for two different input variations. First, very small variations to the input structure (max 0.1 degree of torsional angle changes, total max 0.1 Å RMSD). Second, permutations of atom order in the ligand input files. Both are variations that the normal user would expect to have no effect on the final output of a docking run.

The results are highly interesting and show that the virtual screening settings in the software seem to correlate with low robustness. Hence, using standard settings (or higher) improves robustness for these two methods.
While one could expect that GLIDE which is a deterministic approach (empirical scoring function) perhaps would be less prone to robustness problems than GOLD (genetic algorithm), this does not seem to be the case. The only major difference between the two softwares is that GOLD seem to generate more normal based distributions of scores and RMSDs than GLIDE.

Both small changes in the torsions of the input structures and the permutation of atom order in the input files lead to many cases where the docking protocols simply fail to be robust and can result in vastly different top scoring poses, especially for the low accuracy settings. Thus, applications of docking software with virtual screening settings are more prone to problems with reproducibility, but even with the highest accuracy robustness cannot be guaranteed.

For the future, it would be highly interesting to see if these problems are the same for other docking software, and how they can be alleviated.

[edit: changed the comment on CPU time relative to robustness, since there is no statistical significant difference of the robustness of the two more accurate settings used in the two software]