Sunday, April 26, 2015

Big Data Meets Quantum Chemistry Approximations: The ∆-Machine Learning Approach

Contributed by +Jan Jensen 
Figure 1. Two hypothetical property profiles connecting two constitutional isomers of C$_7$H_$_{10}$O$_2$. The Δ-model, estimates the difference between baseline and target line properties (arrow) which differ in level of theory (b → t), geometry ($R_b$ → $R_t$), and property ($E_b$ → $H_t$). Reprinted with permission from J. Chem. Theory Comput. 2015, ASAP. Copyright (2015) American Chemical Society.

The idea behind this method is best explained by a specific example.  The G4MP2 enthalpies [$H_t(R_t)$] of  C$_7$H_$_{10}$O$_2$ isomers are estimated using PM7 electronic energies [$E_t(R_b)$] by 
$$H_t(R_t) \approx \Delta_b^t(R_b) = E_b(R_b)+ \sum_{i=1}^N\alpha_i e^{|R_i-R_b|/\sigma}$$
Here {$\alpha_i$} and $\sigma$ are parameters found by regression using a training set of $N$ molecules and $|R_i-R_b|$ is a measure of similarity between the target molecule and training molecule $i$.  The latter is described in more detail here, but I found it pretty interesting so I am summarizing it here.

A Coulomb matrix ($\mathbf{C}$) is constructed for each molecule
$$
C_{kl}= \begin{cases}
 0.5 Z_k^{2.4} & \text{if }  i=j\\
 Z_kZ_l/r_{kl}& \text{if } i \ne j
\end{cases}
$$
where $r_{kl}$ is the distance between atom $k$ and $l$ and $Z_k$ is the nuclear charge of atom $k$. Then the elements are sorted such that the diagonal elements are in descending order and the similarity is computed by
$$|R_i-R_b| = \sum_{k,l} |C_{kl}^i - C_{kl}^b | $$
Using this approach and a training set of ($N$ =) 1000 molecules the G4MP2 atomization enthalpies of 6095 constitutional isomers of C$_7$H_$_{10}$O$_2$ can be reproduced with a MAE of 3.9 kcal/mol using PM7, compared to an MAE of 6.4 kcal/mol for uncorrected PM7.  Using PBE or B3LYP/6-31G(2df,p) the MAE can be brought below 1 kcal/mol using a 1K training set.

In another interesting application the MAE of RHF/6-31G(d) relative to CCSD(T)/6-31G(d) atomization energies for the same set of molecules can be reduced from 3 to less than 1 kcal/mol using a 1K training set.

This is thus a very interesting approach for obtaining chemical accuracy using methods that are sufficiently fast to study thousands of molecules. The caveat is that about 1000 high level calculations appears to be needed to train the method but perhaps more generally applicable parameter sets can be found using, for example, functional group identification.


This work is licensed under a Creative Commons Attribution 4.0  

21 comments:

  1. Wow!!! It was really an Informational Article which provide me with much Insightful Information.
    gold elephant jewelry

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. I would recommend my profile is important to me, I invite you to discuss this topic.
    Please check ExcelR Data Science Course in Pune

    ReplyDelete
  4. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.
    data analytics course mumbai
    data science interview questions

    ReplyDelete
  5. This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.
    data analytics course mumbai
    data science interview questions

    ReplyDelete
  6. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.
    Data science course
    Data analytics course
    Business analytics course
    Data science interview questions

    ReplyDelete
  7. "Just saying thanks will not just be sufficient, for the fantastic lucidity in your writing. I will instantly grab your articles to get deeper into the topic. And as the same way ExcelR also helps organisations by providing data science courses based on practical knowledge and theoretical concepts. It offers the best value in training services combined with the support of our creative staff to provide meaningful solution that suits your learning needs.

    Business Analytics Courses "

    ReplyDelete
  8. מזל שנתקלתי בכתבה הזאת. בדיוק בזמן
    מערכות אזעקה

    ReplyDelete
  9. Chemically accurate and comprehensive studies of the virtual space of all possible molecules are severely limited by the computational cost of quantum chemistry. We introduce a composite strategy that adds machine learning corrections to computationally inexpensive approximate legacy quantum methods. After training, highly accurate predictions of enthalpies, free energies, entropies, and electron correlation energies are possible, for significantly larger molecular sets than used for training. For thermochemical properties of up to 16k isomers of C7H10O2 we present numerical evidence that chemical accuracy can be reached. We also predict electron correlation energy in post Hartree–Fock methods, at the computational cost of Hartree–Fock, and we establish a qualitative relationship between molecular entropy and electron correlation. The transferability of our approach is demonstrated, using semiempirical quantum chemistry and machine learning models trained on 1 and 10% of 134k organic molecules, to reproduce enthalpies of all remaining molecules at density functional theory level of accuracy. Thank you ~ Charlotte W. from data science and big data analytics

    ReplyDelete
  10. Nice blog! Such a good information about data analytics and its future..
    data analytics course L
    Data analytics Interview Questions

    ReplyDelete
  11. סופסוף מישהו שתואם לדעותיי בנושא. תודה.
    תמונה על עץ

    ReplyDelete
  12. I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!! data science courses in Bangalore

    ReplyDelete
  13. I was just browsing through the internet looking for some information and came across your blog. I am impressed by the information that you have on this blog.Bookmarked this page, will come back for more.
    data science courses
    data analytics course
    business analytic course

    ReplyDelete
  14. I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!! data science courses in Bangalore

    ReplyDelete
  15. I am looking for and I love to post a comment that "The content of your post is awesome" Great work!
    data analytics course mumbai
    data science interview questions

    ReplyDelete
  16. is site really the fastidious material collection so that everybody can enjoy a lot.
    big data in malaysia
    data science course malaysia
    data analytics courses
    360DigiTMG

    ReplyDelete
  17. I finally found great post here.I will get back here. I just added your blog to my bookmark sites. thanks.Quality posts is the crucial to invite the visitors to visit the web page, that's what this web page is providing.
    data analytics courses Mumbai
    data science interview questions

    ReplyDelete