Sunday, July 31, 2022

Sample Efficiency Matters: A Benchmark for Practical Molecular Optimization

Wenhao Gao, Tianfan Fu, Jimeng Sun, Connor W. Coley (2022)
Highlighted by Jan Jensen

Figure 1 from the paper. (c) The authors 2022. Reproduced under the CC-BY license.

The development of generative models that can find molecules with certain properties has become very popular but there are very few studies that compare them, so it's hard to know what works best. This study compares the performance of 25 different generative models in 23 different optimisation tasks and draws some very interesting conclusions.

None of these methods find the optimum value given an "budget" of 10,000 oracle evaluations and for some tasks the best performance is not exactly impressive. This doesn't bode well for some real life applications where even a few hundred property evaluations are challenging. 

Some methods are slower to converge than others, so you might draw completely different conclusions regarding efficiency if you 100,000 oracle evaluations. Similarly, some methods have high variability in performance so you might draw very different conclusions from 1 run compared to 10 runs. This is especially a consideration for problems when you can only afford one run. It might be better to choose a method that performs slightly worse on average but is less variable, rather than risk a bad run from a highly variable method that performs better on average.

The method that performed best overall is one of the oldest methods, published in 2017! 

Food for thought

This work is licensed under a Creative Commons Attribution 4.0 International License.