Expected Improvement for Bayesian Optimization: A Derivation
In this post, we derive the closed-form expression of the Expected Improvement $($EI$)$ criterion commonly used in Bayesian Optimization.
Modelled with a Gaussian Process, the function value at a given point $x$ can be considered as a normal random variable with mean $\mu$ and variance $\sigma^2$. Given the best $($minimum in a minimization setup$)$ function value obtained so far-let’s denote it by $f^*$:
we are interested in quantifying the improvement over $f^*$ we will have if we sample a point $x$ . Mathematically, the improvement at $x$ can be expressed as follows
$I(x) = \max(f^* - Y,0)$
where $Y$ is the random variable $\sim \mathcal{N}(\mu, \sigma^2)$ that corresponds to the function value at $x$. Since $I$ is a random variable, one can consider the average $($expected$)$ improvement $($EI$)$ to assess $x$:
$EI(x) = E_{Y\sim \mathcal{N}(\mu, \sigma^2)}[I(x)]$
With the reparameterization trick, $Y=\mu + \sigma \epsilon$ where $\epsilon\sim\mathcal{N}(0,1)$, we have:
$EI(x) = E_{\epsilon\sim \mathcal{N}(0,1)}[I(x)]$
which can be written as $($from linearity of integral, and the definition of $\frac{d}{d\epsilon}e^{-\epsilon^2 / 2}$ derivative $)$
where $\phi, \Phi$ are the PDF, CDF of standard normal distribution, respectively.