## Sunday, April 26, 2015

### Big Data Meets Quantum Chemistry Approximations: The ∆-Machine Learning Approach

Contributed by +Jan Jensen
Figure 1. Two hypothetical property profiles connecting two constitutional isomers of C$_7$H_$_{10}$O$_2$. The Δ-model, estimates the difference between baseline and target line properties (arrow) which differ in level of theory (b → t), geometry ($R_b$ → $R_t$), and property ($E_b$ → $H_t$). Reprinted with permission from J. Chem. Theory Comput. 2015, ASAP. Copyright (2015) American Chemical Society.

The idea behind this method is best explained by a specific example.  The G4MP2 enthalpies [$H_t(R_t)$] of  C$_7$H_$_{10}$O$_2$ isomers are estimated using PM7 electronic energies [$E_t(R_b)$] by
$$H_t(R_t) \approx \Delta_b^t(R_b) = E_b(R_b)+ \sum_{i=1}^N\alpha_i e^{|R_i-R_b|/\sigma}$$
Here {$\alpha_i$} and $\sigma$ are parameters found by regression using a training set of $N$ molecules and $|R_i-R_b|$ is a measure of similarity between the target molecule and training molecule $i$.  The latter is described in more detail here, but I found it pretty interesting so I am summarizing it here.

A Coulomb matrix ($\mathbf{C}$) is constructed for each molecule
$$C_{kl}= \begin{cases} 0.5 Z_k^{2.4} & \text{if } i=j\\ Z_kZ_l/r_{kl}& \text{if } i \ne j \end{cases}$$
where $r_{kl}$ is the distance between atom $k$ and $l$ and $Z_k$ is the nuclear charge of atom $k$. Then the elements are sorted such that the diagonal elements are in descending order and the similarity is computed by
$$|R_i-R_b| = \sum_{k,l} |C_{kl}^i - C_{kl}^b |$$
Using this approach and a training set of ($N$ =) 1000 molecules the G4MP2 atomization enthalpies of 6095 constitutional isomers of C$_7$H_$_{10}$O$_2$ can be reproduced with a MAE of 3.9 kcal/mol using PM7, compared to an MAE of 6.4 kcal/mol for uncorrected PM7.  Using PBE or B3LYP/6-31G(2df,p) the MAE can be brought below 1 kcal/mol using a 1K training set.

In another interesting application the MAE of RHF/6-31G(d) relative to CCSD(T)/6-31G(d) atomization energies for the same set of molecules can be reduced from 3 to less than 1 kcal/mol using a 1K training set.

This is thus a very interesting approach for obtaining chemical accuracy using methods that are sufficiently fast to study thousands of molecules. The caveat is that about 1000 high level calculations appears to be needed to train the method but perhaps more generally applicable parameter sets can be found using, for example, functional group identification.

1. Wow!!! It was really an Informational Article which provide me with much Insightful Information.
gold elephant jewelry

2. This comment has been removed by the author.

3. I would recommend my profile is important to me, I invite you to discuss this topic.
Please check ExcelR Data Science Course in Pune

4. I'm happy to see the considerable subtle element here!.big data in malaysia
data scientist course malaysia
data analytics courses

5. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
data analytics course mumbai
data science interview questions

6. This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.
data analytics course mumbai
data science interview questions

7. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.
Data science course
Data analytics course
Data science interview questions

8. "Just saying thanks will not just be sufficient, for the fantastic lucidity in your writing. I will instantly grab your articles to get deeper into the topic. And as the same way ExcelR also helps organisations by providing data science courses based on practical knowledge and theoretical concepts. It offers the best value in training services combined with the support of our creative staff to provide meaningful solution that suits your learning needs.

9. מזל שנתקלתי בכתבה הזאת. בדיוק בזמן
מערכות אזעקה

10. Chemically accurate and comprehensive studies of the virtual space of all possible molecules are severely limited by the computational cost of quantum chemistry. We introduce a composite strategy that adds machine learning corrections to computationally inexpensive approximate legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger molecular sets than used for training. For thermochemical properties of up to 16k isomers of C7H10O2 we present numerical evidence that chemical accuracy can be reached. We also predict electron correlation energy in post Hartree–Fock methods, at the computational cost of Hartree–Fock, and we establish a qualitative relationship between molecular entropy and electron correlation. The transferability of our approach is demonstrated, using semiempirical quantum chemistry and machine learning models trained on 1 and 10% of 134k organic molecules, to reproduce enthalpies of all remaining molecules at density functional theory level of accuracy. Thank you ~ Charlotte W. from data science and big data analytics

11. Good post! I must say thanks for the information.
data analytics course L
Data analytics Interview Questions

12. Nice blog! Such a good information about data analytics and its future..
data analytics course L
Data analytics Interview Questions

13. סופסוף מישהו שתואם לדעותיי בנושא. תודה.
תמונה על עץ

14. I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!! data science courses in Bangalore

15. I was just browsing through the internet looking for some information and came across your blog. I am impressed by the information that you have on this blog.Bookmarked this page, will come back for more.
data science courses
data analytics course

16. I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!! data science courses in Bangalore