Thursday, April 30, 2020

Reactants, products, and transition states of elementary chemical reactions based on quantum chemistry

Colin A. Grambow, Lagnajit Pattanaik, William H. Green (2020)
Highlighted by Jan Jensen

Figure 1 from the paper. Reproduced under the CC BY-NC-ND 4.0 licence

This paper describes a new data set of DFT barrier heights for 12,000 diverse chemical reactions and should stimulate a lot of new ML studies on chemical reactivity.

The molecules are sampled from GDB-7 so they are relative small and contain only H, C, N, and O.  Each reaction is generated from a single molecule using single-ended GSM, so reactions with two reactants and two products are not represented in the data set. Other than these limitations the data set is very diverse:

The reactions span a wide range of both barriers and reaction energies (as seen in the figure above). Reactions with anywhere from 1 to 6 bond changes are represented (though there are only a handful with 6) as are changes to pretty much all bond types (C-H, C-C, C-N, etc). There are only 8 reaction templates with more than 100 examples and many have only a single reaction example. So, very diverse.

Best of all the authors provide atom-mapped reaction SMILES along with the barriers and reaction energies, which makes further benchmarking, analysis, and ML-studies very easy. It will be very exciting to see this data being put to good use!

This work is licensed under a Creative Commons Attribution 4.0 International License.