Thursday, July 4, 2019

Combining the Power of J Coupling and DP4 Analysis on Stereochemical Assignments: The J-DP4 Methods

Grimblat, N.; Gavín, J. A.; Hernández Daranas, A.; Sarotti, A. M., Org. Letters 2019, 21, 4003-4007
Contributed by Steven Bachrach
Reposted from Computational Organic Chemistry with permission

I have written quite a number of posts on using quantum mechanics computations to predict NMR spectra that can aid in identifying chemical structure. Perhaps the most robust technique is Goodman’s DP4 method (post), which has seen some recent revisions (updated DP4DP4+). I have also posted on the use of computed coupling constants (posts).

Grimblat, Gavín, Daranas and Sarotti have now combined these two approaches, using computed 1H and 13C chemical shifts and 3JHH coupling constants with the DP4 framework to predict chemical structure.1

They describe two different approaches to incorporate coupling constants:
  • dJ-DP4 (direct method) incorporates the coupling constants into a new probability function, using the coupling constants in an analogous way as chemical shifts. This requires explicit computation of all chemical shifts and 3JHH coupling constants for all low-energy conformations.
  • iJ-DP4 (indirect method) uses the experimental coupling constants to set conformational constraints thereby reducing the number of total conformations that need be sampled. Thus, large values of the coupling constant (3JHH > 8 Hz) selects conformations with coplanar hydrogens, while small values (3JHH < 4 Hz) selects conformations with perpendicular hydrogens. Other values are ignored. Typically, only one or two coupling constants are used to select the viable conformations.

The authors test these two variants on 69 molecules. The original DP4 method predicted the correct stereoisomer for 75% of the examples, while dJ-DP4 correct identifies 96% of the cases. As a test of the indirect method, they examined marilzabicycloallenes A and B (1 and 2). DP4 predicts the correct stereoisomer with only 3.1% (1) or <0.1% (2) probability. dJ-DP4 predicts the correct isomer for 1 with 99.9% probability and 97.6% probability for 2. The advantage of iJ-DP4 is that using one coupling constant reduces the number of conformations that must be computed by 84%, yet maintains a probability of getting the correct assignment at 99.2% or better. Using two coupling constants to constrain conformations means that only 7% of all of the conformations need to be samples, and the predictive power is maintained.

1

2
Both of these new methods clearly deserve further application.


References

1. Grimblat, N.; Gavín, J. A.; Hernández Daranas, A.; Sarotti, A. M., “Combining the Power of J Coupling and DP4 Analysis on Stereochemical Assignments: The J-DP4 Methods.” Org. Letters 201921, 4003-4007, DOI: 10.1021/acs.orglett.9b01193.


InChIs

1: InChI=1S/C15H21Br2ClO4/c1-8-15(20)14-6-10(17)12(19)7-11(18)13(22-14)5-9(21-8)3-2-4-16/h3-4,8-15,19-20H,5-7H2,1H3/t2-,8-,9+,10-,11+,12+,13+,14+,15-/m0/s1
InChIKey=APNVVMOUATXTFG-NTSAAJDMSA-N
2: InChI=1S/C15H21Br2ClO4/c1-8-15(20)14-6-10(17)12(19)7-11(18)13(22-14)5-9(21-8)3-2-4-16/h3-4,8-15,19-20H,5-7H2,1H3/t2-,8-,9-,10-,11+,12+,13+,14+,15-/m0/s1
InChIKey=APNVVMOUATXTFG-SSBNIETDSA-N



'
This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Wednesday, June 26, 2019

The logic of translating chemical knowledge into machine-processable forms: A modern playground for physical-organic chemistry

Karol Molga, Ewa P. Gajewska, Sara Szymkuć, and Bartosz A. Grzybowski (2019)
Highlighted by Jan Jensen
Figure 11 from the paper (c) RSC

This paper offers a, to me, fascinating "look behind the scenes" of Chematica. At the core this program has 75,000 handcrafted reaction rules (SMARTS and Reaction SMARTS strings as shown in the above figure) extracted from the literature (which took over a decade). The authors estimate that there ca 3000-5000 new reaction classes/types appearing in the literature each years and "that there are on the order of 100,000 distinct reaction classes constituting the body of modern organic chemistry. So their work is almost done :).

The paper does a really excellent job of outlining the challenges involved in constructing these rules and present several cases where the rules must be augmented by ML, MM, and Hückel calculations in order to take non-local structural (e.g. strain and steric hindrance) and electronic effects (e.g. on regioselectivity) into account. Such calculations must be done on the millisecond time scale as many thousand intermediates must be inspected during a retrosynthetic search. At the same time they must be very accurate as inaccuracies accumulate with each step on the retrosynthetic path.

It will be very interesting to see if purely ML-based alternatives can beat this approach!


This work is licensed under a Creative Commons Attribution 4.0 International License.

Wednesday, June 12, 2019

Vibrational Signatures of Chirality Recognition Between α-Pinene and Alcohols for Theory Benchmarking

Medel, R.; Stelbrink, C.; Suhm, M. A., Angew. Chem. Int. Ed. 2019, 58, 8177
Contributed by Steven Bachrach
Reposted from Computational Organic Chemistry with permission

Can vibrational spectroscopy be used to identify stereoisomers? Medel, Stelbrink, and Suhm have examined the vibrational spectra of (+)- and (-)-α-pinene, (±)-1, in the presence of four different chiral terpenes 2-5.1 They recorded gas phase spectra by thermal expansion of a chiral α-pinene with each chiral terpene.


For the complex of 4 with (+)-1 or (-)-1 and 5 with (+)-1 or (-)-1, the OH vibrational frequency is identical for the two different stereoisomers. However, the OH vibrational frequencies differ by 2 cm-1 with 3, and the complex of 3/(+)-1 displays two different OH stretches that differ by 11 cm-1. And in the case of the complex of α-pinene with 2, the OH vibrational frequencies of the two different stereoisomers differ by 11 cm-1!

The B3LYP-D3(BJ)/def2-TZVP optimized geometry of the 2/(+)-1 and 2/(-)-1 complexes are shown in Figure 2, and some subtle differences in sterics and dispersion give rise to the different vibrational frequencies.

2/(+)-1

2/(-)-1
Figure 2. B3LYP-D3(BJ)/def2-TZVP optimized geometry of the 2/(+)-1 and 2/(-)-1

Of interest to readers of this blog will be the DFT study of these complexes. The authors used three different well-known methods – B3LYP-D3(BJ)/def2-TZVP, M06-2x/def2-TZVP, and ωB97X-D/def2-TZVP – to compute structures and (most importantly) predict the vibrational frequencies. Interestingly, M06-2x/def2-TZVP and ωB97X-D/ def2-TZVP both failed to predict the vibrational frequency difference between the complexes with the two stereoisomers of α-pinene. However, B3LYP-D3(BJ)/def2-TZVP performed extremely well, with a mean average error (MAE) of only 1.9 cm-1 for the four different terpenes. Using this functional and the larger may-cc-pvtz basis set reduced the MAE to 1.5 cm-1 with the largest error of only 2.5 cm-1.

As the authors note, these complexes provide some fertile ground for further experimental and computational study and benchmarking.


Reference

1. Medel, R.; Stelbrink, C.; Suhm, M. A., “Vibrational Signatures of Chirality Recognition Between α-Pinene and Alcohols for Theory Benchmarking.” Angew. Chem. Int. Ed. 201958, 8177-8181, DOI: 10.1002/anie.201901687.


InChIs

(-)-1, (-)-α-pinene: InChI=1S/C10H16/c1-7-4-5-8-6-9(7)10(8,2)3/h4,8-9H,5-6H2,1-3H3/t8-,9-/m0/s1
InChIKey=GRWFGVWFFZKLTI-IUCAKERBSA-N
(+)-1, (-)-α-pinene: InChI=1S/C10H16/c1-7-4-5-8-6-9(7)10(8,2)3/h4,8-9H,5-6H2,1-3H3/t8-,9-/m1/s1
InChIKey=GRWFGVWFFZKLTI-RKDXNWHRSA-N
2, (-)borneol: InChI=1S/C10H18O/c1-9(2)7-4-5-10(9,3)8(11)6-7/h7-8,11H,4-6H2,1-3H3/t7-,8+,10+/m0/s1
InChiKey=DTGKSKDOIYIVQL-QXFUBDJGSA-N
3, (+)-fenchol: InChI=1S/C10H18O/c1-9(2)7-4-5-10(3,6-7)8(9)11/h7-8,11H,4-6H2,1-3H3/t7-,8-,10+/m0/s1
InChIKey=IAIHUHQCLTYTSF-OYNCUSHFSA-N
4, (-1)-isopinocampheol: InChI=1S/C10H18O/c1-6-8-4-7(5-9(6)11)10(8,2)3/h6-9,11H,4-5H2,1-3H3/t6-,7+,8-,9-/m1/s1
InChIKey=REPVLJRCJUVQFA-BZNPZCIMSA-N
5, (1S)-1-phenylethanol: InChI=1S/C8H10O/c1-7(9)8-5-3-2-4-6-8/h2-7,9H,1H3/t7-/m0/s1
InChIKey=WAPNOHKVXSQRPX-ZETCQYMHSA-N



'
This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Wednesday, May 29, 2019

Activity-Based Screening of Homogeneous Catalysts through the Rapid Assessment of Theoretically Derived Turnover Frequencies

Matthew D. Wodrich, Boodsarin Sawatlon, Ephrath Solel, Sebastian Kozuch, and Clémence Corminboeuf (2019)
Highlighted by Jan Jensen

Figure 1. Adapted from images in the preprint posted under the CC-BY-NC-ND 4.0 license

LFESRs linearly relate the reaction energies of barrier heights to a single reaction energy. In this work the all the barriers and reaction energies in Figure 1a is computed via the free energy difference between 1 and 4 [ΔG(4)]

The volcano plot is then obtained by plotting the largest free energy difference in the cycle as a function of ΔG(4). In this particular case that is the barrier between 1 and 4 when ΔG(4) is small and the energy difference between 2 and 3 when  ΔG(4) is large. The optimum catalysts is the one with a ΔG(4) for which these two lines meet and one can screen for such catalyst by computing a single free energy difference.

One problem with thus approach is that the largest free energy difference in the cycle is not always directly related to the turn over frequency (TOF), which is what is measured experimentally. In principle, the TOF should be determined by microkinetic modeling for each value of ΔG(4) to find the maximum TOF. But in this work TOFs are efficiently estimated by the energy span model, which basically considers all energy differences in the cycle (e.g. also between 1 and 3).

Using the TOF plot different energy differences between important and the optimum ΔG(4) value decreases (Figure 1b). The points in Figure 1b show the corresponding TOFs computed without the LFESRs and demonstrate the accuracy of this approach.

Monday, April 29, 2019

Exploration of Chemical Compound, Conformer, and Reaction Space with Meta-Dynamics Simulations Based on Tight-Binding Quantum Chemical Calculations

Highlighted by Jan Jensen


The paper describes a new way to search for conformers, chemical reactions, and estimate barriers using the semiempirical GFNn-XTB method using meta-dynamics. A force term is included that scales exponentially with the Cartesian RMSD from previously found structures, thereby forcing the MD explore new areas of phase space. For simulations with more than one molecule it is necessary to add a constraining potential so that the RMSD cannot be increased simply by increasing the distance between molecules. Each individual MD can be relatively short and most of the CPU time is actually spend on energy minimising the snapshots that are saved.

The results depend on a few hyperparameters, so several MD simulations with different values are run in parallel. Because of the extra force the temperature is also a hyperparameters so the method doesn't necessarily tell you what reactions are most likely to occur at, say, 300K.

The conformational search is tested on 22 (mostly) organic molecules and includes the GFN2-xTB energies of the lowest energy conformer for each molecules. This is a valuable benchmark set for other conformational search algorithms designed to find the global minimum.


This work is licensed under a Creative Commons Attribution 4.0 International License.

Wednesday, April 10, 2019

Ambimodal Trispericyclic Transition State and Dynamic Control of Periselectivity

Xue, X.-S.; Jamieson, C. S.; Garcia-Borràs, M.; Dong, X.; Yang, Z.; Houk, K. N., J. Am. Chem. Soc. 2019, 141, 1217
Contributed by Steven Bachrach
Reposted from Computational Organic Chemistry with permission

A major topic of this blog has been the growing body of studies that demonstrate that dynamic effects can control reaction products (see these posts). Often these examples crop up with valley ridge inflection points. Another cause can be bispericyclic transition states, first discovered by Caramello et al for the dimerization of cyclopentadiene.1 The Houk group now reports on the first trispericyclic transition state.2

Using ωB97X-D/6-31G(d), they examined the reaction of the tropone derivative 1 with dimethylfulvene 2. Three possible products can arrive from different pericyclic reactions: 3, the [4+6] product; 4, the [6+4] product; and 5, the [8+2] product. The thermodynamic product is predicted to be 5, but it is only 1.2 kcal mol-1 lower in energy than 4 and 6.2 kcal mol-1 lower than 3.


They identified one transition state originating from the reactants TS1. Hypothesizing that it would be trispericyclic, they performed a molecular dynamics study with trajectories starting from TS1. They ran a total of 142 trajectories, and 87% led to 3, 3% led to 4, and 3% led to 5. This demonstrates the unusual nature of TS1 and the dynamic effects on this reaction surface.


TS1

TS2

TS3
Figure 1. ωB97X-D/6-31G(d) optimized geometries of TS1-TS3.

Additionally, there are two different Cope rearrangements (through TS2 and TS3) that convert 3 into 4 and 5. Some trajectories can pass from TS1 and then directly through either TS2 or TS3 and these give rise to products 4 and 5. In other words, some trajectories will pass from a trispericyclic transition state and then through a bispericyclic transition state before ending in product.


References

1. Caramella, P.; Quadrelli, P.; Toma, L., “An Unexpected Bispericyclic Transition Structure Leading to 4+2 and 2+4 Cycloadducts in the Endo Dimerization of Cyclopentadiene.” J. Am. Chem. Soc. 2002124, 1130-1131, DOI: 10.1021/ja016622h
2. Xue, X.-S.; Jamieson, C. S.; Garcia-Borràs, M.; Dong, X.; Yang, Z.; Houk, K. N., “Ambimodal Trispericyclic Transition State and Dynamic Control of Periselectivity.” J. Am. Chem. Soc. 2019141, 1217-1221, DOI: 10.1021/jacs.8b12674.


InChIs

1: InChI=1S/C10H6N2/c11-7-10(8-12)9-5-3-1-2-4-6-9/h1-6H
InChIKey=KAWLLELUFONBGI-UHFFFAOYSA-N
2: InChI=1S/C8H10/c1-7(2)8-5-3-4-6-8/h3-6H,1-2H3
InChIKey=WXACXMWYHXOSIX-UHFFFAOYSA-N
3: InChI=1S/C18H16N2/c1-11(2)17-15-7-8-16(17)14-6-4-3-5-13(15)18(14)12(9-19)10-20/h3-8,13-16H,1-2H3
InChIKey=DRPXVBLNTKGMTB-UHFFFAOYSA-N
4: InChI=1S/C18H16N2/c1-18(2)13-6-8-14(12(10-19)11-20)15(9-7-13)16-4-3-5-17(16)18/h3-9,13,15-16H,1-2H3
InChIKey=FSIPGNLAWKVXDD-UHFFFAOYSA-N
5: InChI=1S/C18H16N2/c1-12(2)13-8-9-16-17(13)14-6-4-3-5-7-15(14)18(16,10-19)11-20/h3-9,14,16-17H,1-2H3/t14?,16-,17-/m1/s1
InChIKey=SYLWEGLODFLARZ-VNCLPFQGSA-N



'
This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Wednesday, March 27, 2019

A Universal Density Matrix Functional from Molecular Orbital-Based Machine Learning: Transferability across Organic Molecules

Highlighted by Jan Jensen


Figure 3c from the paper, showing results for MP2 correlation energies

Some years ago I wrote about the ∆-ML approach where ML is used to estimate the energy difference between expensive and cheap methods based on the molecular structure. I remember wondering at the time whether additional information could be extracted from the cheap method and used as descriptors. 

This has now been tested for correlation energies and it does indeed lead to a significant improvement in accuracy. The method uses Fock, Coulomb, and exchange matrix elements in an LMO basis (which makes me wonder why it's called a density matrix functional) and Gaussian process regression (GPR) to machine learn the LMO contributions to MP2, CCSD, and CCSD(T) correlation energies.

Using just 140 molecules with 7 heavy atoms the MOB-ML method can be trained to give reasonably accurate results for molecules with 13 heavy atoms (see figure above), and offer a significant improvement over the ∆-ML approach. An MAE of 0.25 mH/heavy atom translates into an MAE of roughly 2 kcal/mol for a molecule with 13 heavy atoms, which can translate into 4 kcal/mol ∆E-errors depending on the sign, so the method may not be quite accurate enough for many purposes yet. Unfortunately, it doesn't look like training on more molecules leads to additional improvements for transferability to larger molecules, but this is definitely a promising step in the right direction.