Thursday, May 30, 2024

FragGT: Fragment-based evolutionary molecule generation with gene types

Joshua Meyers and Nathan Brown (2024)
Highlighted by Jan Jensen


Figure 1 from the paper. (c) The authors. Reproduced under the CC-BY license

Genetic algorithms (GAs) allow for changes at the atom level (as opposed to molecular fragments) allow for a very fine-grained search of chemical space. However, some of the resulting molecules are not chemically sensible and one usually has to include a synthetic accessibility constraint in the scoring function. 

However, another approach is to use fragments and include synthetic accessibility in the fragmentation scheme, which is what this study did. Specifically they use the BRICS fragmentation scheme implemented in RDKit and the corresponding combination rules to turn the genes into molecules. 

The authors do indeed find that the resulting molecules do indeed look more reasonable (though it is not quantified). However, the authors note that the method is a "relatively inefficient explorer of chemical space", requiring a large number of scoring function evaluations.

The problem is probably, the short-chromosome/many-genes problem. GAs do best at optimizing long chromosomes made of only a few different genes, while the opposite is the case here: there are 211,388 unique BRICS fragments and each molecule contains only around 10 fragments. So you need to run a lot to make sure that all (reasonably) possible genes have been sampled at each position.

It presents a very interesting open challenge to the cummunity.


This work is licensed under a Creative Commons Attribution 4.0 International License.