Friday, May 11, 2018

MD studies of simple pericyclic reactions

Mackey, J. L.; Yang, Z.; Houk, K. N., "Dynamically concerted and stepwise trajectories of the Cope rearrangement of 1,5-hexadiene." Chem. Phys. Lett. 2017, 683, 253-257
Yang, Z.; Zou, L.; Yu, Y.; Liu, F.; Dong, X.; Houk, K. N., "Molecular dynamics of the two-stage mechanism of cyclopentadiene dimerization: concerted or stepwise?" Chem. Phys. 2018, in press
Yang, Z.; Dong, X.; Yu, Y.; Yu, P.; Li, Y.; Jamieson, C.; Houk, K. N., "Relationships between Product Ratios in Ambimodal Pericyclic Reactions and Bond Lengths in Transition Structures." J. Am. Chem. Soc. 2018, 140, 3061-3067
Contributed by Steven Bacharach
Reposted from Computational Organic Chemistry with permission

At the recent ACS meeting in New Orleans, Ken Houk spoke at the Dreyfus award session in honor of Michele Parrinello. Ken’s talk included discussion of some recent molecular dynamics studies of pericyclic reactions. Because of their similarities in approaches and observations, I will discuss three recent papers from his group (which Ken discussed in New Orleans) in this post.

The Cope rearrangement, a fundamental organic reaction, has been studied extensively by computational means (see Chapter 4.2 of my book). Mackey, Yang, and Houk examine the degenerate Cope rearrangement of 1,5-hexadiene with molecular dynamics at the (U)B3LYP/6-31G(d) level.1 They examined 230 trajectories, and find that of the 95% of them that are reactive, 94% are trajectories that directly cross through the transition zone. By this, Houk means that the time gap between the breaking and forming C-C bonds is less than 60 fs, the time for one C-C bond vibration. The average time in the transition zone is 35 fs. This can be thought of as “dynamically concerted”. For the other few trajectories, a transient diradical with lifetime of about 100 fs is found.

The dimerization of cyclopentadiene finds the two [4+2] pathways merging into a single bispericylic transition state. 2 Only a small minority (13%) of the trajectories sample the region about the Cope rearrangement that interconverts the two mirror image dimers. These trajectories average about 60 fs in this space, which comes from the time separation between the formation of the two new C-C bonds. The majority of the trajectories quickly pass through the dimerization transition zone in about 18 fs, and avoid the Cope TS region entirely. These paths can be thought of as “dynamically concerted”, while the other set of trajectories are “dynamically stepwise”. It should be noted however that the value of S2 in the Cope transition zone are zero and so no radicals are being formed.

Finally, Yang, Dong, Yu, Yu, Li, Jamieson, and Houk examined 15 different reactions that involve ambimodal (i.e. bispericyclic) transition states.3 They find a strong correlation between the differences in the bond lengths of the two possible new bond vs. their product distribution. So for example, in the reaction shown in Scheme 1, bond a is the one farthest along to forming. Bond b is slightly shorter than bond c. Which of these two is formed next is dependent on the dynamics, and it turns out the Pab is formed from 73% of the trajectories while Pac is formed only 23% of the time. This trend is seen across the 15 reaction, namely the shorter of bond b or c in the transition state leads to the larger product formation. When competing reactions involve bonds with differing elements, then a correlation can be found with bond order instead of with bond length.

Scheme 1


References

1) Mackey, J. L.; Yang, Z.; Houk, K. N., "Dynamically concerted and stepwise trajectories of the Cope rearrangement of 1,5-hexadiene." Chem. Phys. Lett. 2017, 683, 253-257, DOI: 10.1016/j.cplett.2017.03.011.
2) Yang, Z.; Zou, L.; Yu, Y.; Liu, F.; Dong, X.; Houk, K. N., "Molecular dynamics of the two-stage mechanism of cyclopentadiene dimerization: concerted or stepwise?" Chem. Phys. 2018, in press, DOI: 10.1016/j.chemphys.2018.02.020.
3) Yang, Z.; Dong, X.; Yu, Y.; Yu, P.; Li, Y.; Jamieson, C.; Houk, K. N., "Relationships between Product Ratios in Ambimodal Pericyclic Reactions and Bond Lengths in Transition Structures." J. Am. Chem. Soc. 2018,140, 3061-3067, DOI: 10.1021/jacs.7b13562.

'
This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Friday, April 27, 2018

Hunting for organic molecules with artificial intelligence: Molecules optimized for desired excitation energies

Highlighted by Jan Jensen

Figure 1 from the paper. Reproduced under the CC-BY-NC-ND license

Sumita and co-workers combine Monte Carlo tree search (MCTS) and a recurrent neural network (RNN) to discover molecules with specific excitation levels.  The general approach is very similar to the one used by Segler, Waller, and co-workers to predict retrosynthetic pathways, that I highlighted last month

At the core of the method (called ChemTS) is a RNN trained to generate SMILES string representations of molecules - another approach pioneered by Segler and Waller. Trained on thousands of valid SMILES strings, the RNN predicts that, for example, a likely next character in the SMILES string "c1ccccc" is "1" (to form benzene), just like an RNN trained on thousands of English words would predict that a likely next character in "chemistr" is "y".

Since there is more than one probable choice for each new character the number of possible SMILES strings quickly become unmanageable: even five possible characters for each position in a 20-character SMILES string results in $10^{14}$ possibilities. This is where MCTS is helpful (paraphrased from my previous highlight):

A MCTS starts by evaluating a number of possible SMILES strings randomly and then assigning likelihood scores to the early parts of the string depending on whether the encoded molecule has a desired property or not. The process is then repeated except that the early parts of the SMILES string is chosen based on likelihood scores, which are continuously updated and added to unscored characters. The changing likelihood scores means that the search for new SMILES strings is directed towards the more promising areas of the tree. I have given a short illustration of the process here. The process is repeated for a given number of steps and the SMILES strings with properties closest to the target are selected.

The desired property is a certain value of the molecules lowest excitation level (200, 300, 400, 500, or 600 nm), which is predicted using TDFT at the B3LYP/3-21G* level of theory.  For example, given two days of CPU time on 12 cores, ChemTS generated 646 possible molecules of which 34 has a predicted excitation energy within 20 nm of 200 nm. Two of these molecules where tested experimentally and one molecule did indeed have an excitation energy in the desired range.

Thursday, April 26, 2018

The Molecular Structure of gauche-1,3-Butadiene: Experimental Establishment of Non-planarity

Baraban, J. H.; Martin-Drumel, M.-A.; Changala, P. B.; Eibenberger, S.; Nava, M.; Patterson, D.; Stanton, J. F.; Ellison, G. B.; McCarthy, M. C., Angew. Chem. Int. Ed. 2018, 57, 1821-1825
Contributed by Steven Bacharach
Reposted from Computational Organic Chemistry with permission

Sometimes you run across a paper that is surprising for a strange reason: hasn’t this work been done years before? That was my response to seeing this paper on the structure of gauche-1,3-butadiene.1Surely, a molecule as simple as this has been examined to death. But, in fact there has been some controversy over whether the cis or gauche form is the second lowest energy conformation. Computations have indicated that the cis form is a transition state for interconverting the two gauche isomers, but experimental confirmation was probably so late in coming due to the small amount of the gauche form present and its small dipole moment.

This paper describes Fourier-transform microwave (FTMW) spectroscopy using two variants: cavity-enhanced FTMW combined with a supersonic expansion and chirped-pulse FTMW in a cryogenic buffer gas cell. In addition, computations were done at CCSD(T) using cc-pCVTZ through cc-pCV5Z basis sets and corrections for perturbative quadruples. The computed structure is shown in Figure 1. In addition to confirming this non-planar structure, with a C-C-C-C dihedral angle of 33.8°, they demonstrate the tunneling between the two mirror image gauche conformations, through the cis transition state.

Figure 1. Computed geometry of gauche-1,3-butadiene.


References

1. Baraban, J. H.; Martin-Drumel, M.-A.; Changala, P. B.; Eibenberger, S.; Nava, M.; Patterson, D.; Stanton, J. F.; Ellison, G. B.; McCarthy, M. C., "The Molecular Structure of gauche-1,3-Butadiene: Experimental Establishment of Non-planarity." Angew. Chem. Int. Ed. 2018, 57, 1821-1825, DOI: 10.1002/anie.201709966.


InChIs

1,3-butadiene: InChI=1S/C4H6/c1-3-4-2/h3-4H,1-2H2
InChIKey=KAKZBPTYRLMSJV-UHFFFAOYSA-N

'
This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Wednesday, April 11, 2018

A Quintuple [6]Helicene with a Corannulene Core as a C5-Symmetric Propeller-Shaped π-System

Kato, K.; Segawa, Y.; Scott, L. T.; Itami, K., Angew. Chem. Int. Ed. 2018, 57, 1337-1341
Contributed by Steven Bacharach
Reposted from Computational Organic Chemistry with permission

Corannulene 1 is an interesting aromatic compound because it is nonplanar, having a bowl shape. [6]helicene is an interesting aromatic compound because it is nonplanar, having the shape of a helix. Kato, Segawa, Scott and Itami have joined these together to synthesize the interesting quintuple helicene compound 3.1
The optimized structure of 3 is shown in Figure 1. They utilized computations to corroborate two experimental findings. First, the NMR spectra of 3 shows a small number of signals indicating that the bowl inversion should be rapid. The molecule has C5 symmetry due to the bowl shape of the corannulene core. Rapid inversion makes the molecule effectively D5. (The inversion transition state is of D5 symmetry, and would be a nice quiz question for those looking for molecules of unusual point groups.) The B3LYP/6-31G(d) computed bowl inversion barrier is only 1.9 kcal mol-1, significantly less that the bowl inversion barrier of 1: 10.4 kcal mol-1. This reduction is partly due to the shallower bowl depth of 3 (0.572 Å in the x-ray structure, 0.325 Å in the computed structure) than in 1 (0.87 Å).

Figure 1. Optimized structure of 3.

Second, they took the enhanced MMMMM-isomer and heated it to obtain the thermodynamic properties for the inversion to the PPPPP-isomer. (The PPPPP-isomer is shown in the top scheme.) The experimental values are ΔH = 36.8 kcal mol-1, ΔS = 8.7 cal mol-1 K-1, and ΔG = 34.2 kcal mol-1 at 298 K. They computed all of the stereoisomers of 3 along with the transition states connecting them. The largest barrier is found in going from MMMMM3 to MMMMP3 with a computed barrier of 34.5 kcal mol-1, in nice agreement with experiment.


References

1. Kato, K.; Segawa, Y.; Scott, L. T.; Itami, K., "A Quintuple [6]Helicene with a Corannulene Core as a C5-Symmetric Propeller-Shaped π-System." Angew. Chem. Int. Ed. 201857, 1337-1341, DOI: 10.1002/anie.201711985.


InChIs

1: InChI=1S/C20H10/c1-2-12-5-6-14-9-10-15-8-7-13-4-3-11(1)16-17(12)19(14)20(15)18(13)16/h1-10H
InChIKey=VXRUJZQPKRBJKH-UHFFFAOYSA-N
2: InChI=1S/C26H16/c1-3-7-22-17(5-1)9-11-19-13-15-21-16-14-20-12-10-18-6-2-4-8-23(18)25(20)26(21)24(19)22/h1-16H
InChIKey=UOYPNWSDSPYOSN-UHFFFAOYSA-N
3: InChI=1S/C80H40/c1-11-31-51-41(21-1)42-22-2-12-32-52(42)62-61(51)71-63-53-33-13-3-23-43(53)44-24-4-14-34-54(44)64(63)73-67-57-37-17-7-27-47(57)48-28-8-18-38-58(48)68(67)75-70-60-40-20-10-30-50(60)49-29-9-19-39-59(49)69(70)74-66-56-36-16-6-26-46(56)45-25-5-15-35-55(45)65(66)72(62)77-76(71)78(73)80(75)79(74)77/h1-40H
InChIKey=XYUIBQJVZTYREY-UHFFFAOYSA-N

'
This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Friday, March 30, 2018

Planning chemical syntheses with deep neural networks and symbolic AI

Marwin H. S. Segler, Mike Preuss, Mark P. Waller (2018)
Highlighted by Jan Jensen

Figure 1 from the paper. Copyright 2018 Springer Nature

The paper uses a Monte Carlo tree search (MCTS) algorithm (also used in AlphaGo Zero) to suggest retrosynthetic routes that were just as good as those proposed by expert organic chemist. Remarkably the underlying "expert knowledge" is automatically extracted from reaction databases into three neural networks. Thus, the method is referred to as 3N-MCTS.

At the core of this approach are two neural networks that can predict the probability of a molecule undergoing one of either 301,671 or 17,134 chemical transformations, the latter being more computationally efficient than the former. The networks were trained on tranformation rules from 12.4 million single-step reactions from the Reaxys chemistry database, i.e. determined automatically without human intervention.
  
The retrosynthetic "game" is won if the target molecule can be completely decomposed into predefined precursor molecules within 25 retrosynthetic steps, where the 50 most probable chemical transformations are considered for each step. It is not practically possible to test all $50^{25} \approx 10^{40}$ possible retrosynthetic paths so a MCTS is used to search for the best path.

A MCTS starts by evaluating a number of paths randomly and then assigning likelihood scores to the early parts of the paths depending on whether the paths lead to winners or not. The process is then repeated except that the early steps in the path are chosen based on likelihood scores, which are continuously updated and added to unscored steps.  The changing likelihood scores means that the search for new paths is directed towards the more promising areas of the path tree. I have given a short illustration of the process here. The process is repeated for a given number of steps and the path with the best set of likelihood scores is selected.

One of the tests of the method was a double blind study where experienced synthetic chemists were asked to choose between retrosynthetic routes developed by experts and by 3N-MCTS. The study found no clear preference!

I couldn't find any information about code availability.

Tuesday, March 27, 2018

Beyond optical rotation: what’s left is not always right in total synthesis

Joyce, L. A.; Nawrat, C. C.; Sherer, E. C.; Biba, M.; Brunskill, A.; Martin, G. E.; Cohen, R. D.; Davies, I. W., Chem. Sci. 2018, 9, 415
Contributed by Steven Bacharach
Reposted from Computational Organic Chemistry with permission

The structure of (+)-frondosin B 1 has been the subject of some concern. The compound has been synthesized by a number of research groups with the expected R isomer as the target. However, the Danishefsky1 and MacMillan2 synthesis led to a molecule with [α]D of about +16°, while Trauner3 reports a value of -16.8° and Ovaska4 prepared the S isomer with [α]D = -17.3°. Something is amiss here.

Joyce and coworkers have looked into this structure problem through a combination of advanced analytical techniques and computational chemistry.5 They utilize optical activity, electronic circular dichroism (ECD) and vibrational circular dichroism (VCD) and compare the experiments with computational results. IR and VCD were computed at B3LYP/6-31G** using a Boltzmann-weighted set of low-energy conformations. ECD computations were done at CAM-B3LYP/6-31++G**//B3LYP/6-31G**.

Basically, they found that (+)-frondosin B does have the R stereocenter. The different synthetic schemes did actually all lead to the same isomer, tested by looking at key intermediates along the way. The discrepancy in the optical activity is due to a small impurity, 2, that has the opposite rotation and a magnitude 10 times greater than that of authentic 1.


This paper is another nice example demonstrating the power of modern computational approaches to spectra that can be extremely valuable in structure determination. Organic chemists of all stripes should certainly be aware of how this tool can complement experiments.

My thanks to Derek Lowe who posted on this paper in his In The Pipeline blog.


References

1) Inoue, M.; Carson, M. W.; Frontier, A. J.; Danishefsky, S. J., "Total Synthesis and Determination of the Absolute Configuration of Frondosin B." J. Am. Chem. Soc.
2001123, 1878-1889, DOI: 10.1021/ja0021060.
2) Reiter, M.; Torssell, S.; Lee, S.; MacMillan, D. W. C., "The organocatalytic three-step total synthesis of (+)-frondosin B." Chem. Sci. 20101, 37-42, DOI: 10.1039/C0SC00204F.
3) Hughes, C. C.; Trauner, D., "Palladium-catalyzed couplings to nucleophilic heteroarenes: the total synthesis of (−)-frondosin B." Tetrahedron 200460, 9675-9686, DOI: 10.1016/j.tet.2004.07.041.
4) Ovaska, T. V.; Sullivan, J. A.; Ovaska, S. I.; Winegrad, J. B.; Fair, J. D., "Asymmetric Synthesis of Seven-Membered Carbocyclic Rings via a Sequential Oxyanionic 5-Exo-Dig Cyclization/Claisen Rearrangement Process. Total Synthesis of (−)-Frondosin B." Org. Letters 200911, 2715-2718, DOI: 10.1021/ol900967j.
5) Joyce, L. A.; Nawrat, C. C.; Sherer, E. C.; Biba, M.; Brunskill, A.; Martin, G. E.; Cohen, R. D.; Davies, I. W., "Beyond optical rotation: what’s left is not always right in total synthesis." Chem. Sci. 20189, 415-424, DOI: 10.1039/C7SC04249C.


InChIs

1: InChI=1S/C20H24O2/c1-12-6-8-16-14(5-4-10-20(16,2)3)18-15-11-13(21)7-9-17(15)22-19(12)18/h7,9,11-12,21H,4-6,8,10H2,1-3H3/t12-/m1/s1
InChIKey=LSPMJSWSYGOLFD-GFCCVEGCSA-N
2: InChI=1S/C20H24O2/c1-12-5-4-10-20(3)16(12)8-6-13(2)19-18(20)15-11-14(21)7-9-17(15)22-19/h7,9,11,13,21H,4-6,8,10H2,1-3H3/t13-,20-/m1/s1
InChIKey=ZBXZDKMLFIJFHG-ZUOKHONESA-N

'
This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Wednesday, March 14, 2018

DeePCG: A Deep Neural Network Molecular Force Field


DeePCG: constructing coarse-grained models via deep neural networks. L Zhang, J Han, H Wang, R Car, Weinan E. arXiv:1802.08549v2 [physics.chem-ph]
Contributed by Jesper Madsen

The idea of “learning” a molecular force field (FF) using neural networks can be traced back to Blank et al. in 1995.[1] Modern variations (reviewed recently by Behler[2]), such as the DeePCG scheme[3] that I highlight here, seem to have two key innovations to set them apart from earlier work: network depth and atomic environment descriptors. The latter was the topic of my recent highlight and Zhang et al.[3] take advantage of similar ideas.
Figure 1: “Schematic plot of the neural network input for the environment of CG particle i, using water as an example. Red and white balls represent the oxygen and the hydrogen atoms of the microscopic system, respectively. Purple balls denote CG particles, which, in our example, are centered at the positions of the oxygens.)” from ref. [3]    
Zhang et al. simulate liquid water using ab initio molecular dynamics (AIMD) on the DFT/PBE0 level of theory in order to train a coarse-grained (CG) molecular water model. The training is done by a standard protocol used in CGing where mean forces are fitted by minimizing a loss-function (the natural choice is the residual sum of squares) over the sampled configurations. CGing liquid water is difficult because of the necessity of many-body contributions to interactions, especially so upon integrating out degrees-of-freedom. One would therefore expect that a FF capable of capturing such many-body effects to perform well, just as DeePCG does, and I think this is a very nice example of exactly how much can be gained by using faithful representations of atomic neighborhoods instead of radially symmetric pair potentials. Recall that traditional force-matching, while provably exact in the limit of the complete many-body expansion,[4] still shows non-negligible deviations from the target distributions for most simple liquids when standard approximations are used.

FF transferability, however, is likely where the current grand challenge is to be found. Zhang et al. remark that it would be convenient to have an accurate yet cheap (e.g., CG) model for describing phase transitions in water. They do not attempt this in the current preprint paper, but I suspect that it is not *that* easy to make a decent CG model that can correctly get subtle long-range correlations right at various densities, let alone different phases of water and ice, coexistences, interfaces, impurities (non-water moieties), etc. Machine-learnt potentials continuously demonstrate excellent accuracy over the parameterization space of states or configurations, but for transferability and extrapolations, we are still waiting to see how far they can get.

References

[1] Neural network models of potential energy surfaces. TB Blank, SD Brown, AW Calhoun, DJ Doren. J Chem Phys 103, 4129 (1995)
[2] Perspective: Machine learning potentials for atomistic simulations. J Behler. J Chem Phys 145, 170901 (2016)
[3] DeePCG: constructing coarse-grained models via deep neural networks. L Zhang, J Han, H Wang, R Car, Weinan E. arXiv:1802.08549v2 [physics.chem-ph]
[4] The multiscale coarse-graining method. I. A rigorous bridge between atomistic and coarse-grained models. WG Noid, J-W Chu, GS Ayton, V Krishna, S Izvekov, GA Voth, A Das, HC Andersen. J Chem Phys 128, 244114 (2008)