Sunday, September 30, 2018

DeepSMILES: An adaptation of SMILES for use in machine-learning of chemical structures

Highlighted by Jan Jensen

There's been a lot of work in the last few years on machine learning methods for suggesting molecules (see here and here for examples). Most of these "generative models" are trained using  SMILES representations of the molecules. But SMILES was never designed with machine learning in mind and contain features that can cause problems when doing so. The end result is that generative models suggest a lot of SMILES strings with the wrong syntax. For example CC(C(C instead of CC(C)C.

Noel and Andrew suggest a different SMILES syntax (DeepSMILES) that addresses many of these problems. Have a look at the figure above to see if you can deduce the conversion-rules and read the paper to see close you got. It will be very interesting to see whether DeepSMILES will lead to significant improvements in machine learning applications.

This work is licensed under a Creative Commons Attribution 4.0 International License.