Sunday, July 29, 2018

Error-Controlled Exploration of Chemical Reaction Networks with Gaussian Processes

Gregor N. Simm and Markus Reiher (2018)
Highlighted by Jan Jensen



See if you recognise this scenario: you benchmark a cheap method against an accurate, but expensive, method for a set of molecules and get a mean error that you can use to correct the results obtained with the cheap method. But sooner or later you  start using the cheap method on molecules that look increasingly different from your benchmark set. At what point should you do another benchmark calculation against your expensive method? Simm and Reiher use Gaussian Processes (GP) to provide a quantitative answer.

GP is a way to fit a numerical function to a set of data points with uncertainties of the fit for every point in the fit. Simm and Reiher's basic idea is to arrange the benchmark points on the x-axis by computing a distance between pairs of molecular structures and performing a GP fit. Now compute the x-coordinate of the molecule you're uncertain about: if the uncertainty in the GP fit for that point is larger than the standard deviation of the mean error of your cheap method computed for the benchmark set, then you need to benchmark your cheap method against the more expensive method for that molecule.