Expected Improvement for Bayesian Optimization: A Derivation

In this post, we derive the closed-form expression of the Expected Improvement $($EI$)$ criterion commonly used in Bayesian Optimization.

Modelled with a Gaussian Process, the function value at a given point $x$ can be considered as a normal random variable with mean $\mu$ and variance $\sigma^2$. Given the best $($minimum in a minimization setup$)$ function value obtained so far-let’s denote it by $f^*$:

we are interested in quantifying the improvement over $f^*$ we will have if we sample a point $x$ . Mathematically, the improvement at $x$ can be expressed as follows

$I(x) = \max(f^* - Y,0)$

where $Y$ is the random variable $\sim \mathcal{N}(\mu, \sigma^2)$ that corresponds to the function value at $x$. Since $I$ is a random variable, one can consider the average $($expected$)$ improvement $($EI$)$ to assess $x$:

$EI(x) = E_{Y\sim \mathcal{N}(\mu, \sigma^2)}[I(x)]$

With the reparameterization trick, $Y=\mu + \sigma \epsilon$ where $\epsilon\sim\mathcal{N}(0,1)$, we have:

$EI(x) = E_{\epsilon\sim \mathcal{N}(0,1)}[I(x)]$

which can be written as $($from linearity of integral, and the definition of $\frac{d}{d\epsilon}e^{-\epsilon^2 / 2}$ derivative $)$

$EI(x) = \int_{-\infty}^{\infty} I(x) \phi(\epsilon) d\epsilon$ $EI(x) = \int_{-\infty}^{(f^*-\mu)/\sigma} (f^* - \mu - \sigma \epsilon) \phi(\epsilon) d\epsilon$ $EI(x)= (f^* - \mu)\Phi(\frac{f^*-\mu}{\sigma}) - \sigma \int_{-\infty}^{(f^*-\mu)/\sigma} \epsilon \phi(\epsilon) d\epsilon$ $EI(x)=(f^* - \mu)\Phi(\frac{f^*-\mu}{\sigma}) + \frac{\sigma}{ \sqrt{2\pi}} \int_{-\infty}^{(f^*-\mu)/\sigma} (-\epsilon) e^{-\epsilon^2/2} d\epsilon$ $EI(x)=(f^* - \mu)\Phi(\frac{f^*-\mu}{\sigma}) + \frac{\sigma}{ \sqrt{2\pi}} e^{-\epsilon^2/2}|_{-\infty}^{(f^*-\mu)/\sigma}$ $EI(x)=(f^* - \mu)\Phi(\frac{f^*-\mu}{\sigma}) + \sigma \big(\phi(\frac{f^*-\mu}{\sigma}) - 0\big)$ $EI(x)=(f^* - \mu)\Phi(\frac{f^*-\mu}{\sigma}) + \sigma \phi(\frac{f^*-\mu}{\sigma})$

where $\phi, \Phi$ are the PDF, CDF of standard normal distribution, respectively.