Basic Concepts

This topic contains the following sections:

Problem and Conditions
General characteristics of numerical optimization
Basic Approaches

Problem and Conditions

The problem of interest can be stated as that of finding a local solution x^* to the task:

minimum _x f ( x ) , where f is a scalar function: Rⁿ → R.

The function f is called the objective (or cost) function and the solution x^* is the minimizer. A maximizer of the function can be found as a minimizer of the function with opposite sign.

Iterative methods produce a series of vectors x₀, x₁, x₂ ... , which in most cases converges under certain conditions towards a local minimizer x^*:

x_k → x^* for k → ∞.

In all the methods there are measures which enforce the descending property: f (x_k+1) < f (x_k).

The two conditions play a fundamental part for unconstrained optimization:

The necessary condition for a local minimum is that x^* is a stationary point, that is, g ( x^* ) = 0,
where g ( x ) = [ ∂f(x) / ∂x_i ] is the gradient vector.
The sufficient condition for x to solve the problem is both the necessary condition and positive-definite Hessian matrix
H ( x^*), where H ( x ) = [∂²f(x) / ∂x_i∂x_j ].

Some optimization algorithms use only the necessary condition and require the objective function to be differentiable. Others use both the conditions and require the objective function to be twice-differentiable.

General characteristics of numerical optimization

Convergence

Convergence tells us how quickly we can get a result which agrees with the minimizer to a desired accuracy. Formally convergence is measured via algorithm errors:

e_k = | x_k - x^* |, where k is the iteration number and x_k is assumed close enough to the minimizer (i.e. k is close to the final number of iterations).

Most of the algorithms expose the following kinds of convergence:

Linear convergence most often occurring:
e_k+1 ≤ a e_k with 0 < a < 1;
Quadratic convergence achieved by only a few of the practical algorithms:
e_k+1 ≤ b e_k² with b > 0;
Many of the methods used in practice have superlinear convergence:
e_k+1 / e_k → 0 for k → ∞.

In practice errors are approximated by the differences of x_k on subsequent iterations.

Derivative Order

Optimization algorithms differ according to how much information about f(x), g(x) and H(x) the user must provide:

If the algorithm requires evaluation of all the components on each iteration then it is called the Second Derivative method.
Gradient methods need estimations of the objective function and its gradient. These methods are usually less resource consuming but provide weaker convergence.
If the algorithm does not need evaluating of derivatives then it is no-derivative method
Such methods have proven considerably worse convergence and robustness and are almost not in use for optimization.

Basic Approaches

Any optimization algorithm solves the two main subproplems on each iteration k, either explicitly or implicitly:

It computes a descent direction d_k to decrease the objective function;
It computes a step length α_k, usually as the optimal value in some sense.

These two values give the increment of argument: s_k = α_kd_k so that x_k+1 = x_k + s_k.

Formulation of these subproblems usually involves approximation of the objective function by Taylor expansion, either linear of quadratic:

f(x + s) ≈ q(s) = f(x) + s^Tg(x)
f(x + s) ≈ q(s) = f(x) + s^Tg(x) + ½ s^TH(x) s

where g ( x ) is the gradient vector and H ( x ) is the Hessian matrix.

Globally convergent methods are usually designed on the basis of some prototype strategy which specifies an overall approach within which some variances might be possible. The two main strategies are Line Search and Trust Region (sometimes called Resticted Step).

Note
The name Line Search is used in the wide sense and in the narrow sense: In the wide sense it denotes optimization strategy with explicitly separate computation of descent direction and step length; In the narrow sense it is just a method of step length computing, and such meaning of the term is prevailing. To distinct these two meanings, here and further the wide sense will be underlined with the word 'strategy' or 'family'.

Note

The name Line Search is used in the wide sense and in the narrow sense:

In the wide sense it denotes optimization strategy with explicitly separate computation of descent direction and step length;
In the narrow sense it is just a method of step length computing, and such meaning of the term is prevailing.

To distinct these two meanings, here and further the wide sense will be underlined with the word 'strategy' or 'family'.

The following table provides a general overview of the two basic optimization strategies:

	Line Search algorithms	Trust Region algorithms
Rationale	With the Line Search strategy, the algorithm chooses a descent direction d_k and searches along this direction from the current point x_k for an lower or lowest function value gained at some distance α_k (line search as such).	Trust Region algorithm first defines a maximum distance, the trust-region radius Δ_k, where the objective function can be 'well' approximated by the quadratic model q(s), and then seeks a direction and step that attain the best improvement of the model within the radius (i.e. it computes the direction and the step length 'at once').
Basic Algorithm	Do until stopping criteria: Determine a direction of search d_k; Find a step length α either as minimization problem (exact line search): min_α>0 f ( x_k + α d_k). or as an appropriate minimum approximation (soft or inexact line search). x_k+1 = x_k + s_kd_k.	Do until stopping criteria: Model subproblem to find a candidate step s: min_s q_k ( s), where the model q is Taylor approximation and x_k + s lies inside the trust region Δ_k. Evaluate reduction of the objective function with the s and either accept or reject the candidate: repeat step 1 with reduced trust region Δ_k until the decrement is satisfactory; accept the candidate step and increase trust region or keep it unchanged. x_k+1 = x_k + s.
Implementations	Different algorithms correspond to different ways of choosing direction d_k and techniques applied to compute step length. Most known in practice search direction methods are modifications of Conjugate Gradient method and Newton-like (Newton-type and Quasi-Newton) methods. Line search methods are systematized either as exact or soft, the latter are more resource effective and use special conditions to ensure approximation accuracy (Wolfe, Goldstein, etc.).	Trust Region methods differ mainly in: Trust region shape: usually it is a ball defined by a radius, elliptical and box-shaped trust regions are also in use; Criterion to accept the regular candidate step; Rules to regulate trust region size.

Line Search algorithms

Trust Region algorithms

Rationale

With the Line Search strategy, the algorithm chooses a descent direction d_k and searches along this direction from the current point x_k for an lower or lowest function value gained at some distance α_k (line search as such).

Trust Region algorithm first defines a maximum distance, the trust-region radius Δ_k, where the objective function can be 'well' approximated by the quadratic model q(s), and then seeks a direction and step that attain the best improvement of the model within the radius (i.e. it computes the direction and the step length 'at once').

Basic Algorithm

Do until stopping criteria:

Determine a direction of search d_k;
Find a step length α either as minimization problem (exact line search):
min_α>0 f ( x_k + α d_k).
or as an appropriate minimum approximation (soft or inexact line search).
x_k+1 = x_k + s_kd_k.

Do until stopping criteria:

Model subproblem to find a candidate step s:
min_s q_k ( s), where the model q is Taylor approximation and x_k + s lies inside the trust region Δ_k.
Evaluate reduction of the objective function with the s and either accept or reject the candidate:
- repeat step 1 with reduced trust region Δ_k until the decrement is satisfactory;
- accept the candidate step and increase trust region or keep it unchanged.
x_k+1 = x_k + s.

Implementations

Different algorithms correspond to different ways of choosing direction d_k and techniques applied to compute step length.

Most known in practice search direction methods are modifications of Conjugate Gradient method and Newton-like (Newton-type and Quasi-Newton) methods.

Line search methods are systematized either as exact or soft, the latter are more resource effective and use special conditions to ensure approximation accuracy (Wolfe, Goldstein, etc.).

Trust Region methods differ mainly in:

Trust region shape: usually it is a ball defined by a radius, elliptical and box-shaped trust regions are also in use;
Criterion to accept the regular candidate step;
Rules to regulate trust region size.

Note
Line Search methods as such (i.e. out of strategy context) do not have practical value for real-world problems and are not presented in this Guide being implemented and used in the FinMath library.