Friday, April 27, 2018

Hunting for organic molecules with artificial intelligence: Molecules optimized for desired excitation energies

Highlighted by Jan Jensen

Figure 1 from the paper. Reproduced under the CC-BY-NC-ND license

Sumita and co-workers combine Monte Carlo tree search (MCTS) and a recurrent neural network (RNN) to discover molecules with specific excitation levels.  The general approach is very similar to the one used by Segler, Waller, and co-workers to predict retrosynthetic pathways, that I highlighted last month

At the core of the method (called ChemTS) is a RNN trained to generate SMILES string representations of molecules - another approach pioneered by Segler and Waller. Trained on thousands of valid SMILES strings, the RNN predicts that, for example, a likely next character in the SMILES string "c1ccccc" is "1" (to form benzene), just like an RNN trained on thousands of English words would predict that a likely next character in "chemistr" is "y".

Since there is more than one probable choice for each new character the number of possible SMILES strings quickly become unmanageable: even five possible characters for each position in a 20-character SMILES string results in $10^{14}$ possibilities. This is where MCTS is helpful (paraphrased from my previous highlight):

A MCTS starts by evaluating a number of possible SMILES strings randomly and then assigning likelihood scores to the early parts of the string depending on whether the encoded molecule has a desired property or not. The process is then repeated except that the early parts of the SMILES string is chosen based on likelihood scores, which are continuously updated and added to unscored characters. The changing likelihood scores means that the search for new SMILES strings is directed towards the more promising areas of the tree. I have given a short illustration of the process here. The process is repeated for a given number of steps and the SMILES strings with properties closest to the target are selected.

The desired property is a certain value of the molecules lowest excitation level (200, 300, 400, 500, or 600 nm), which is predicted using TDFT at the B3LYP/3-21G* level of theory.  For example, given two days of CPU time on 12 cores, ChemTS generated 646 possible molecules of which 34 has a predicted excitation energy within 20 nm of 200 nm. Two of these molecules where tested experimentally and one molecule did indeed have an excitation energy in the desired range.

Thursday, April 26, 2018

The Molecular Structure of gauche-1,3-Butadiene: Experimental Establishment of Non-planarity

Baraban, J. H.; Martin-Drumel, M.-A.; Changala, P. B.; Eibenberger, S.; Nava, M.; Patterson, D.; Stanton, J. F.; Ellison, G. B.; McCarthy, M. C., Angew. Chem. Int. Ed. 2018, 57, 1821-1825
Contributed by Steven Bacharach
Reposted from Computational Organic Chemistry with permission

Sometimes you run across a paper that is surprising for a strange reason: hasn’t this work been done years before? That was my response to seeing this paper on the structure of gauche-1,3-butadiene.1Surely, a molecule as simple as this has been examined to death. But, in fact there has been some controversy over whether the cis or gauche form is the second lowest energy conformation. Computations have indicated that the cis form is a transition state for interconverting the two gauche isomers, but experimental confirmation was probably so late in coming due to the small amount of the gauche form present and its small dipole moment.

This paper describes Fourier-transform microwave (FTMW) spectroscopy using two variants: cavity-enhanced FTMW combined with a supersonic expansion and chirped-pulse FTMW in a cryogenic buffer gas cell. In addition, computations were done at CCSD(T) using cc-pCVTZ through cc-pCV5Z basis sets and corrections for perturbative quadruples. The computed structure is shown in Figure 1. In addition to confirming this non-planar structure, with a C-C-C-C dihedral angle of 33.8°, they demonstrate the tunneling between the two mirror image gauche conformations, through the cis transition state.

Figure 1. Computed geometry of gauche-1,3-butadiene.


References

1. Baraban, J. H.; Martin-Drumel, M.-A.; Changala, P. B.; Eibenberger, S.; Nava, M.; Patterson, D.; Stanton, J. F.; Ellison, G. B.; McCarthy, M. C., "The Molecular Structure of gauche-1,3-Butadiene: Experimental Establishment of Non-planarity." Angew. Chem. Int. Ed. 2018, 57, 1821-1825, DOI: 10.1002/anie.201709966.


InChIs

1,3-butadiene: InChI=1S/C4H6/c1-3-4-2/h3-4H,1-2H2
InChIKey=KAKZBPTYRLMSJV-UHFFFAOYSA-N

'
This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.

Wednesday, April 11, 2018

A Quintuple [6]Helicene with a Corannulene Core as a C5-Symmetric Propeller-Shaped π-System

Kato, K.; Segawa, Y.; Scott, L. T.; Itami, K., Angew. Chem. Int. Ed. 2018, 57, 1337-1341
Contributed by Steven Bacharach
Reposted from Computational Organic Chemistry with permission

Corannulene 1 is an interesting aromatic compound because it is nonplanar, having a bowl shape. [6]helicene is an interesting aromatic compound because it is nonplanar, having the shape of a helix. Kato, Segawa, Scott and Itami have joined these together to synthesize the interesting quintuple helicene compound 3.1
The optimized structure of 3 is shown in Figure 1. They utilized computations to corroborate two experimental findings. First, the NMR spectra of 3 shows a small number of signals indicating that the bowl inversion should be rapid. The molecule has C5 symmetry due to the bowl shape of the corannulene core. Rapid inversion makes the molecule effectively D5. (The inversion transition state is of D5 symmetry, and would be a nice quiz question for those looking for molecules of unusual point groups.) The B3LYP/6-31G(d) computed bowl inversion barrier is only 1.9 kcal mol-1, significantly less that the bowl inversion barrier of 1: 10.4 kcal mol-1. This reduction is partly due to the shallower bowl depth of 3 (0.572 Å in the x-ray structure, 0.325 Å in the computed structure) than in 1 (0.87 Å).

Figure 1. Optimized structure of 3.

Second, they took the enhanced MMMMM-isomer and heated it to obtain the thermodynamic properties for the inversion to the PPPPP-isomer. (The PPPPP-isomer is shown in the top scheme.) The experimental values are ΔH = 36.8 kcal mol-1, ΔS = 8.7 cal mol-1 K-1, and ΔG = 34.2 kcal mol-1 at 298 K. They computed all of the stereoisomers of 3 along with the transition states connecting them. The largest barrier is found in going from MMMMM3 to MMMMP3 with a computed barrier of 34.5 kcal mol-1, in nice agreement with experiment.


References

1. Kato, K.; Segawa, Y.; Scott, L. T.; Itami, K., "A Quintuple [6]Helicene with a Corannulene Core as a C5-Symmetric Propeller-Shaped π-System." Angew. Chem. Int. Ed. 201857, 1337-1341, DOI: 10.1002/anie.201711985.


InChIs

1: InChI=1S/C20H10/c1-2-12-5-6-14-9-10-15-8-7-13-4-3-11(1)16-17(12)19(14)20(15)18(13)16/h1-10H
InChIKey=VXRUJZQPKRBJKH-UHFFFAOYSA-N
2: InChI=1S/C26H16/c1-3-7-22-17(5-1)9-11-19-13-15-21-16-14-20-12-10-18-6-2-4-8-23(18)25(20)26(21)24(19)22/h1-16H
InChIKey=UOYPNWSDSPYOSN-UHFFFAOYSA-N
3: InChI=1S/C80H40/c1-11-31-51-41(21-1)42-22-2-12-32-52(42)62-61(51)71-63-53-33-13-3-23-43(53)44-24-4-14-34-54(44)64(63)73-67-57-37-17-7-27-47(57)48-28-8-18-38-58(48)68(67)75-70-60-40-20-10-30-50(60)49-29-9-19-39-59(49)69(70)74-66-56-36-16-6-26-46(56)45-25-5-15-35-55(45)65(66)72(62)77-76(71)78(73)80(75)79(74)77/h1-40H
InChIKey=XYUIBQJVZTYREY-UHFFFAOYSA-N

'
This work is licensed under a Creative Commons Attribution-NoDerivs 3.0 Unported License.