Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, and O. Anatole von Lilienfeld

*Journal of Chemical Theory and Computation*2015, ASAP (arXiv)
Contributed by +Jan Jensen

Figure 1. Two hypothetical property profiles connecting two constitutional isomers of C$_7$H_$_{10}$O$_2$. The Δ-model, estimates the difference between baseline and target line properties (arrow) which differ in level of theory (b → t), geometry ($R_b$ → $R_t$), and property ($E_b$ → $H_t$). Reprinted with permission fromJ. Chem. Theory Comput.2015, ASAP. Copyright (2015) American Chemical Society.

The idea behind this method is best explained by a specific example. The G4MP2 enthalpies [$H_t(R_t)$] of C$_7$H_$_{10}$O$_2$ isomers are estimated using PM7 electronic energies [$E_t(R_b)$] by

Here {$\alpha_i$} and $\sigma$ are parameters found by regression using a training set of $N$ molecules and $|R_i-R_b|$ is a measure of similarity between the target molecule and training molecule $i$. The latter is described in more detail here, but I found it pretty interesting so I am summarizing it here.

A Coulomb matrix ($\mathbf{C}$) is constructed for each molecule

$$

C_{kl}= \begin{cases}

0.5 Z_k^{2.4} & \text{if } i=j\\

Z_kZ_l/r_{kl}& \text{if } i \ne j

\end{cases}

$$

where $r_{kl}$ is the distance between atom $k$ and $l$ and $Z_k$ is the nuclear charge of atom $k$. Then the elements are sorted such that the diagonal elements are in descending order and the similarity is computed by

$$|R_i-R_b| = \sum_{k,l} |C_{kl}^i - C_{kl}^b | $$

Using this approach and a training set of ($N$ =) 1000 molecules the G4MP2 atomization enthalpies of 6095 constitutional isomers of C$_7$H_$_{10}$O$_2$ can be reproduced with a MAE of 3.9 kcal/mol using PM7, compared to an MAE of 6.4 kcal/mol for uncorrected PM7. Using PBE or B3LYP/6-31G(2df,p) the MAE can be brought below 1 kcal/mol using a 1K training set.

In another interesting application the MAE of RHF/6-31G(d) relative to CCSD(T)/6-31G(d) atomization energies for the same set of molecules can be reduced from 3 to less than 1 kcal/mol using a 1K training set.

This is thus a very interesting approach for obtaining chemical accuracy using methods that are sufficiently fast to study thousands of molecules. The caveat is that about 1000 high level calculations appears to be needed to train the method but perhaps more generally applicable parameter sets can be found using, for example, functional group identification.

This work is licensed under a Creative Commons Attribution 4.0

A Coulomb matrix ($\mathbf{C}$) is constructed for each molecule

$$

C_{kl}= \begin{cases}

0.5 Z_k^{2.4} & \text{if } i=j\\

Z_kZ_l/r_{kl}& \text{if } i \ne j

\end{cases}

$$

where $r_{kl}$ is the distance between atom $k$ and $l$ and $Z_k$ is the nuclear charge of atom $k$. Then the elements are sorted such that the diagonal elements are in descending order and the similarity is computed by

$$|R_i-R_b| = \sum_{k,l} |C_{kl}^i - C_{kl}^b | $$

Using this approach and a training set of ($N$ =) 1000 molecules the G4MP2 atomization enthalpies of 6095 constitutional isomers of C$_7$H_$_{10}$O$_2$ can be reproduced with a MAE of 3.9 kcal/mol using PM7, compared to an MAE of 6.4 kcal/mol for uncorrected PM7. Using PBE or B3LYP/6-31G(2df,p) the MAE can be brought below 1 kcal/mol using a 1K training set.

In another interesting application the MAE of RHF/6-31G(d) relative to CCSD(T)/6-31G(d) atomization energies for the same set of molecules can be reduced from 3 to less than 1 kcal/mol using a 1K training set.

This is thus a very interesting approach for obtaining chemical accuracy using methods that are sufficiently fast to study thousands of molecules. The caveat is that about 1000 high level calculations appears to be needed to train the method but perhaps more generally applicable parameter sets can be found using, for example, functional group identification.

This work is licensed under a Creative Commons Attribution 4.0

very nice post.

ReplyDeleteפרסום בטאבולה

Wow!!! It was really an Informational Article which provide me with much Insightful Information.

ReplyDeletegold elephant jewelry

This comment has been removed by the author.

ReplyDeleteI would recommend my profile is important to me, I invite you to discuss this topic.

ReplyDeletePlease check ExcelR Data Science Course in Pune

I'm happy to see the considerable subtle element here!.big data in malaysia

ReplyDeletedata scientist course malaysia

data analytics courses

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.

ReplyDeletedata analytics course mumbai

data science interview questions

This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.

ReplyDeletedata analytics course mumbai

data science interview questions

Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.

ReplyDeleteData science course

Data analytics course

Business analytics course

Data science interview questions

"Just saying thanks will not just be sufficient, for the fantastic lucidity in your writing. I will instantly grab your articles to get deeper into the topic. And as the same way ExcelR also helps organisations by providing data science courses based on practical knowledge and theoretical concepts. It offers the best value in training services combined with the support of our creative staff to provide meaningful solution that suits your learning needs.

ReplyDeleteBusiness Analytics Courses "

מזל שנתקלתי בכתבה הזאת. בדיוק בזמן

ReplyDeleteמערכות אזעקה

Chemically accurate and comprehensive studies of the virtual space of all possible molecules are severely limited by the computational cost of quantum chemistry. We introduce a composite strategy that adds machine learning corrections to computationally inexpensive approximate legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger molecular sets than used for training. For thermochemical properties of up to 16k isomers of C7H10O2 we present numerical evidence that chemical accuracy can be reached. We also predict electron correlation energy in post Hartree–Fock methods, at the computational cost of Hartree–Fock, and we establish a qualitative relationship between molecular entropy and electron correlation. The transferability of our approach is demonstrated, using semiempirical quantum chemistry and machine learning models trained on 1 and 10% of 134k organic molecules, to reproduce enthalpies of all remaining molecules at density functional theory level of accuracy. Thank you ~ Charlotte W. from data science and big data analytics

ReplyDeleteGood post! I must say thanks for the information.

ReplyDeletedata analytics course L

Data analytics Interview Questions

Nice blog! Such a good information about data analytics and its future..

ReplyDeletedata analytics course L

Data analytics Interview Questions

סופסוף מישהו שתואם לדעותיי בנושא. תודה.

ReplyDeleteתמונה על עץ

I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!! data science courses in Bangalore

ReplyDeleteI was just browsing through the internet looking for some information and came across your blog. I am impressed by the information that you have on this blog.Bookmarked this page, will come back for more.

ReplyDeletedata science courses

data analytics course

business analytic course

I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!! data science courses in Bangalore

ReplyDeleteGreat post.

ReplyDeleteגדרות אלומיניום

I am looking for and I love to post a comment that "The content of your post is awesome" Great work!

ReplyDeletedata analytics course mumbai

data science interview questions

is site really the fastidious material collection so that everybody can enjoy a lot.

ReplyDeletebig data in malaysia

data science course malaysia

data analytics courses

360DigiTMG

I finally found great post here.I will get back here. I just added your blog to my bookmark sites. thanks.Quality posts is the crucial to invite the visitors to visit the web page, that's what this web page is providing.

ReplyDeletedata analytics courses Mumbai

data science interview questions