Wednesday, December 30, 2020

Deep Molecular Dreaming: Inverse machine learning for de-novo molecular design and interpretability with surjective representations

Cynthia Shen, Mario Krenn, Sagi Eppel, Alan Aspuru-Guzik (2020)
Highlighted by Jan Jensen

Figure 2 from the paper. (c) the authors 2020. Reproduced under the CC-BY license

This paper presents an interesting approach for obtaining molecules with particular properties. A 4-layer NN is trained to predict logP values based on a one-hot encoded string representation (SELFIES) of molecules. The NN is trained in the usual way: a molecule is input, the predicted logP value is compared to the true value, and the NN weights are adjusted then adjusted to minimise the the difference - a process that is repeated for a certain number of epochs.

Once trained, the process is then reversed. A target logP value is chosen together with an arbitrary molecule. The difference in predicted and target logP value is then minimised by adjusting the one-hot encoded representation of the molecule - a process that is repeated for a certain number of epochs.

In both cases the adjustments are done based the gradient of the error with respect to weights (in the first case) and the one-hot encoded vectors (in the second case). Since the start vector is binary, but is changed to a real number vector after the optimisation starts there are some convergence problems. The authors show that this can be addressed by changing the 0's in the one-hot encoding randomly to some number between 0 and a maximum value.

Since selfies are being used, every vector representation can be resolved to a molecule, which means that one can also analyse the optimisation path to gain insight into how the NN translates molecules into a property prediction. 

This work is licensed under a Creative Commons Attribution 4.0 International License.