Derivative-free methods

The line search and trust region methods introduced in the previous lesson all required that the user be able to calculate the gradient of the function $\nabla f$. However, in many cases the gradient is either not available or too error-prone to be of use. For example, the function $f$ might be only available as a compiled executable or the result of a physical experiment. The model might be stochastic, or the gradient evaluation might be noisy due to numerical innacuracies, or of sufficiently complexity that the gradient is either unknown or too expensive to compute.

Here we describe two of the most common methods for derivative-free optimisation, using a finite difference approximation to approximate the derivative, and the Nelder-Mead algorithm, which is a Simplex search method. However, there are a large number of derivative-free methods, ranging from the classical
Direct Search methods like Pattern search, Simplex search, Rosenbrock' or Powell's methods. Then there are emulator or model -based methods that build up a model of the function $f$ and minimise that using a gradient-based method, a powerful example of this class of methods is Bayesian Optimisation. Many global optimsiation algorithms are derivative-free, including randomised algorithms such as Simulated Annealing, and evolution-based strategies such as the popular Covariance matrix adaptation evolution strategy (CMA-ES), or swarm algorithms inspired from bees/ants like Particle Swarm Optimisation.