Click or drag to resize

Basic Concepts

Problem and Conditions

The problem of interest can be stated as that of finding a local solution x* to the task:

minimum x f ( x ) , where f is a scalar function: RnR.

The function f is called the objective (or cost) function and the solution x* is the minimizer. A maximizer of the function can be found as a minimizer of the function with opposite sign.

Iterative methods produce a series of vectors x0, x1, x2 ... , which in most cases converges under certain conditions towards a local minimizer x*:

xkx* for k → ∞.

In all the methods there are measures which enforce the descending property: f (xk+1) < f (xk).

The two conditions play a fundamental part for unconstrained optimization:

  • The necessary condition for a local minimum is that x* is a stationary point, that is, g ( x* ) = 0,

    where g ( x ) = [ ∂f(x) / ∂xi ] is the gradient vector.

  • The sufficient condition for x to solve the problem is both the necessary condition and positive-definite Hessian matrix

    H ( x*), where H ( x ) = [2f(x) / ∂xi ∂xj ].

Some optimization algorithms use only the necessary condition and require the objective function to be differentiable. Others use both the conditions and require the objective function to be twice-differentiable.

General characteristics of numerical optimization

Convergence

Convergence tells us how quickly we can get a result which agrees with the minimizer to a desired accuracy. Formally convergence is measured via algorithm errors:

ek = | xk - x* |, where k is the iteration number and xk is assumed close enough to the minimizer (i.e. k is close to the final number of iterations).

Most of the algorithms expose the following kinds of convergence:

  • Linear convergence most often occurring:

    ek+1 ≤ a ek with 0 < a < 1;

  • Quadratic convergence achieved by only a few of the practical algorithms:

    ek+1 ≤ b ek2 with b > 0;

  • Many of the methods used in practice have superlinear convergence:

    ek+1 / ek → 0 for k → ∞.

In practice errors are approximated by the differences of xk on subsequent iterations.

Derivative Order

Optimization algorithms differ according to how much information about f(x), g(x) and H(x) the user must provide:

  • If the algorithm requires evaluation of all the components on each iteration then it is called the Second Derivative method.

  • Gradient methods need estimations of the objective function and its gradient. These methods are usually less resource consuming but provide weaker convergence.

  • If the algorithm does not need evaluating of derivatives then it is no-derivative method

    Such methods have proven considerably worse convergence and robustness and are almost not in use for optimization.

Basic Approaches

Any optimization algorithm solves the two main subproplems on each iteration k, either explicitly or implicitly:

  1. It computes a descent direction dk to decrease the objective function;

  2. It computes a step length αk, usually as the optimal value in some sense.

These two values give the increment of argument: sk = αkdk so that xk+1 = xk + sk.

Formulation of these subproblems usually involves approximation of the objective function by Taylor expansion, either linear of quadratic:

  • f(x + s) ≈ q(s) = f(x) + sTg(x)

  • f(x + s) ≈ q(s) = f(x) + sTg(x) + ½ sTH(x) s

where g ( x ) is the gradient vector and H ( x ) is the Hessian matrix.

Globally convergent methods are usually designed on the basis of some prototype strategy which specifies an overall approach within which some variances might be possible. The two main strategies are Line Search and Trust Region (sometimes called Resticted Step).

Note Note

The name Line Search is used in the wide sense and in the narrow sense:

  • In the wide sense it denotes optimization strategy with explicitly separate computation of descent direction and step length;

  • In the narrow sense it is just a method of step length computing, and such meaning of the term is prevailing.

To distinct these two meanings, here and further the wide sense will be underlined with the word 'strategy' or 'family'.

The following table provides a general overview of the two basic optimization strategies:

Opt BFGSHead Band
Line Search algorithms

Opt BFGSHead Band
Trust Region algorithms

Rationale

With the Line Search strategy, the algorithm chooses a descent direction dk and searches along this direction from the current point xk for an lower or lowest function value gained at some distance αk (line search as such).

Trust Region algorithm first defines a maximum distance, the trust-region radius Δk, where the objective function can be 'well' approximated by the quadratic model q(s), and then seeks a direction and step that attain the best improvement of the model within the radius (i.e. it computes the direction and the step length 'at once').

Basic Algorithm

Do until stopping criteria:

  1. Determine a direction of search dk;

  2. Find a step length α either as minimization problem (exact line search):

    minα>0  f ( xk + α dk).

    or as an appropriate minimum approximation (soft or inexact line search).

  3. xk+1 = xk + skdk.

Do until stopping criteria:

  1. Model subproblem to find a candidate step s:

    mins qk ( s), where the model q is Taylor approximation and xk + s lies inside the trust region Δk.

  2. Evaluate reduction of the objective function with the s and either accept or reject the candidate:

    • repeat step 1 with reduced trust region Δk until the decrement is satisfactory;

    • accept the candidate step and increase trust region or keep it unchanged.

  3. xk+1 = xk + s.

Implementations

Different algorithms correspond to different ways of choosing direction dk and techniques applied to compute step length.

Most known in practice search direction methods are modifications of Conjugate Gradient method and Newton-like (Newton-type and Quasi-Newton) methods.

Line search methods are systematized either as exact or soft, the latter are more resource effective and use special conditions to ensure approximation accuracy (Wolfe, Goldstein, etc.).

Trust Region methods differ mainly in:

  • Trust region shape: usually it is a ball defined by a radius, elliptical and box-shaped trust regions are also in use;

  • Criterion to accept the regular candidate step;

  • Rules to regulate trust region size.

Note Note

Line Search methods as such (i.e. out of strategy context) do not have practical value for real-world problems and are not presented in this Guide being implemented and used in the FinMath library.