Sunday, October 30, 2016

Automatic chemical design using a data-driven continuous representation of molecules


Rafael Gómez-Bombarelli, David Duvenaud, José Miguel Hernández-Lobato, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alán Aspuru-Guzik (2016)
Contributed by Jan Jensen



Chemical space is discrete which makes it hard to search with standard techniques such as gradient-based minimisation.  This paper used a standard machine learning tool called an autoencoder to help solve that problem.  One way to think of an autoencoder is as a data-compressor where one neural network is trained to describe a data set such as an image in some compressed representation and another network is trained to recover the image from the compressed format.

The interesting thing in the context of chemical space is that the compressed format can be a continuous function such as a real-valued vector (latent space). (Another use of autoencoders is dimensionality reduction for data visualization, e.g. as an alternative to principal component analysis.)  This latent space is therefore a continuous representation of the chemical space (a set of SMILES strings) that the autoencoder was trained on.  Another neural net can then be trained to map some chemical property, such as logP values, on this latent space and the space can be searched for regions with desired logP values with techniques as simple as interpolation.

One problem with autoencoders is that they are "lossy" which in this case translates to the fact that not all points in latent space can be decoded to a valid molecule (SMILES string) but the failure rate is relatively low for the two proof-of-concept applications in the paper.

This is a very interesting new tool in the hunt for molecules with new properties.


This work is licensed under a Creative Commons Attribution 4.0

Wednesday, October 26, 2016

More examples of structure determination with computed NMR chemical shifts

Nguyen, Q. N. N.; Tantillo, D. J. “Using quantum chemical computations of NMR chemical shifts to assign relative configurations of terpenes from an engineered Streptomyces host,” J. Antibiotics 2016, 69, 534–540
Khokhar, S.; Pierens, G. K.; Hooper, J. N. A.; Ekins, M. G.; Feng, Y.; Rohan A. Davis, R. A. “Rhodocomatulin-Type Anthraquinones from the Australian Marine Invertebrates Clathria hirsuta and Comatula rotalaria,” J. Nat. Prod., 2016, 79, 946–953
Contributed by Steven Bachrach
Reposted from Computational Organic Chemistry with permission

Use of computed NMR chemical shifts in structure determination is really growing fast. Presented here are a couple of recent examples.

Nguyen and Tantillo used computed chemical shifts with the DP4 analysis to identify the structure of three terpenes 1-3.1 They optimized the geometries of all of the diastereomers of each compound, along with multiple conformations of each diastereomer, at B3LYP/6-31+G(d,p) and then computed the chemical shifts at SMD(CHCl3)–mPW1PW91/6-311+G(2d,p). The chemical shifts were Boltzmann weighted including all conformations within 3 kcal mol-1 of the lowest energy structure.


For 1, the DP4 analysis using just the proton shifts predicted a different isomer than using the carbon shifts, but when combined, DP4 predicted the structure, with 98.8% confidence, shown in the scheme above, and in Figure 1. For 2, the combined proton and carbon shift analysis with DP4 indicated a 100% confidence of the structure shown in the scheme and Figure 1. Lastly, for 3, which is more complicated due to the conformations of the 9-member ring, DP4 predicts with 100% confidence the structure shown in the scheme and Figure 1.

1

2

3
Figure 1. Optimized geometries of 1-3.

Feng, Davis and coworkers have examined a series of anthroquionones from Australian marine sponges.2The structure of one compound was a choice of two options: 4 or 5. Initial geometries were obtain by molecular mechanics and the low energy isomers were then reoptimized at B3LYP/6-31+G(d,p). The chemical shifts were computed using PCM/MPW1PW91/6-311+G(2d,p). Application of the DP4 method indicate the structure to be 4 with a 100% confidence level. The lowest energy conformer of 4 is shown in Figure 2.


Figure 2. Optimized geometry of 4.

References

1) Nguyen, Q. N. N.; Tantillo, D. J. “Using quantum chemical computations of NMR chemical shifts to assign relative configurations of terpenes from an engineered Streptomyces host,” J. Antibiotics 201669, 534–540, DOI: 10.1038/ja.2016.51.
2) Khokhar, S.; Pierens, G. K.; Hooper, J. N. A.; Ekins, M. G.; Feng, Y.; Rohan A. Davis, R. A. “Rhodocomatulin-Type Anthraquinones from the Australian Marine Invertebrates Clathria hirsuta andComatula rotalaria,” J. Nat. Prod., 2016, 79, 946–953, DOI: 10.1021/acs.jnatprod.5b01029.

InChIs

1: InChI=1S/C15H24/c1-10-5-6-15(4)8-11-7-14(2,3)9-12(11)13(10)15/h9-11,13H,5-8H2,1-4H3/t10-,11+,13-,15+/m1/s1
InChIKey=KVSCZIPUFBVHBM-OICBVUGWSA-N
2: InChI=1S/C15H24/c1-10-5-6-15(4)8-11-7-14(2,3)9-12(11)13(10)15/h5,11-13H,6-9H2,1-4H3/t11-,12-,13+,15-/m0/s1
InChIKey=ZLYGJLHCPYVGDA-XPCVCDNBSA-N
3: InChI=1S/C20H32/c1-14-6-9-18-19(3,4)10-11-20(18,5)13-17-15(2)7-8-16(17)12-14/h6,13,15-16,18H,7-12H2,1-5H3/b14-6-,17-13-/t15-,16-,18-,20+/m0/s1
InChIKey=JZGOFJIAHJJJDK-ICZJPRMTSA-N
4: InChI=1S/C18H14O7/c1-7(19)13-10(20)6-11(21)15-16(13)17(22)9-4-8(24-2)5-12(25-3)14(9)18(15)23/h4-6,20-21H,1-3H3
InChIKey=MPQMZEXRJVMYBT-UHFFFAOYSA-N
5: InChI=1S/C18H14O7/c1-7(19)13-10(20)6-11(21)15-16(13)14-9(17(22)18(15)23)4-8(24-2)5-12(14)25-3/h4-6,20-21H,1-3H3
InChIKey=WIKIUXNPFURKNF-UHFFFAOYSA-N

'
This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Wednesday, October 12, 2016

Expanding DP4: application to drug compounds and automation

Ermanis, K.; Parkes, K. E. B.; Agback, T.; Goodman, J. M. Org. Biomol. Chem., 2016, 14, 3943-3949
Contributed by Steven Bachrach
Reposted from Computational Organic Chemistry with permission

Computational chemistry has had a remarkable impact on the field of structure determination by NMR spectroscopy. The ability to efficiently compute 13C and 1H chemical shifts allows for comparison of the computed chemical shifts of potential structures against the experimental values, a tremendous aid in structure determination (see some examples in previous posts). Goodman and Smith developed the DP4 method1 (see this post) to assist in identifying proper structures by means of statistical distribution of errors and Bayes Theorem.

The Goodman group now reports on workflow solutions to structure prediction using DP4.2 They explore the use of open source computational tools both for predicting conformations and for computing the chemical shifts. They use a set of 10 drugs to test the performance. In general, the original DP4 method works very well in predicting drug structure, despite the fact that DP4 parameters were developed for natural products. The only failure is for simvastatin, where the large number of diastereomers and conformational flexibility prove to be too complex. The open source tools perform just slightly less effectively than the commercial packages, but are certainly a viable route for those with limited resources. The authors also provide a series of python scripts that allow users to create a seamless workflow; these should prove most helpful to the structure determination community.


Simvastatin

References

1) Smith, S. G.; Goodman, J. M. "Assigning Stereochemistry to Single Diastereoisomers by GIAO
NMR Calculation: The DP4 Probability," J. Am. Chem. Soc. 2010132, 12946-12959, DOI:10.1021/ja105035r.
2) Ermanis, K.; Parkes, K. E. B.; Agback, T.; Goodman, J. M. “Expanding DP4: application to drug compounds and automation,” Org. Biomol. Chem.201614, 3943-3949, DOI: 10.1039/c6ob00015k.

InChIs

Simvastatin: InChI=1S/C25H38O5/c1-6-25(4,5)24(28)30-21-12-15(2)11-17-8-7-16(3)20(23(17)21)10-9-19-13-18(26)14-22(27)29-19/h7-8,11,15-16,18-21,23,26H,6,9-10,12-14H2,1-5H3/t15-,16-,18+,19+,20-,21-,23-/m0/s1
InChIKey=RYMZZMVNJRMUDD-HGQWONQESA-N


'
This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.