Sunday, October 30, 2016

Automatic chemical design using a data-driven continuous representation of molecules

Rafael Gómez-Bombarelli, David Duvenaud, José Miguel Hernández-Lobato, Jorge Aguilera-Iparraguirre, Timothy D. Hirzel, Ryan P. Adams, and Alán Aspuru-Guzik (2016)
Contributed by Jan Jensen

Chemical space is discrete which makes it hard to search with standard techniques such as gradient-based minimisation.  This paper used a standard machine learning tool called an autoencoder to help solve that problem.  One way to think of an autoencoder is as a data-compressor where one neural network is trained to describe a data set such as an image in some compressed representation and another network is trained to recover the image from the compressed format.

The interesting thing in the context of chemical space is that the compressed format can be a continuous function such as a real-valued vector (latent space). (Another use of autoencoders is dimensionality reduction for data visualization, e.g. as an alternative to principal component analysis.)  This latent space is therefore a continuous representation of the chemical space (a set of SMILES strings) that the autoencoder was trained on.  Another neural net can then be trained to map some chemical property, such as logP values, on this latent space and the space can be searched for regions with desired logP values with techniques as simple as interpolation.

One problem with autoencoders is that they are "lossy" which in this case translates to the fact that not all points in latent space can be decoded to a valid molecule (SMILES string) but the failure rate is relatively low for the two proof-of-concept applications in the paper.

This is a very interesting new tool in the hunt for molecules with new properties.

This work is licensed under a Creative Commons Attribution 4.0


  1. This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.
    data analytics course mumbai
    data science interview questions

  2. Really impressive post. I read it whole and going to share it with my social circules. I enjoyed your article and planning to rewrite it on my own blog.
    Data science course
    Data analytics course
    Business analytics course
    Data science interview questions