Basic Concepts |
This topic contains the following sections:
The problem of interest can be stated as that of finding a local solution x* to the task:
minimum x f ( x ) , where f is a scalar function: Rn → R.
The function f is called the objective (or cost) function and the solution x* is the minimizer. A maximizer of the function can be found as a minimizer of the function with opposite sign.
Iterative methods produce a series of vectors x0, x1, x2 ... , which in most cases converges under certain conditions towards a local minimizer x*:
xk → x* for k → ∞.
In all the methods there are measures which enforce the descending property: f (xk+1) < f (xk).
The two conditions play a fundamental part for unconstrained optimization:
The necessary condition for a local minimum is that x* is a stationary point, that is, g ( x* ) = 0,
where g ( x ) = [ ∂f(x) / ∂xi ] is the gradient vector.
The sufficient condition for x to solve the problem is both the necessary condition and positive-definite Hessian matrix
H ( x*), where H ( x ) = [∂2f(x) / ∂xi ∂xj ].
Some optimization algorithms use only the necessary condition and require the objective function to be differentiable. Others use both the conditions and require the objective function to be twice-differentiable.
Convergence tells us how quickly we can get a result which agrees with the minimizer to a desired accuracy. Formally convergence is measured via algorithm errors:
ek = | xk - x* |, where k is the iteration number and xk is assumed close enough to the minimizer (i.e. k is close to the final number of iterations).
Most of the algorithms expose the following kinds of convergence:
Linear convergence most often occurring:
ek+1 ≤ a ek with 0 < a < 1;
Quadratic convergence achieved by only a few of the practical algorithms:
ek+1 ≤ b ek2 with b > 0;
Many of the methods used in practice have superlinear convergence:
ek+1 / ek → 0 for k → ∞.
Optimization algorithms differ according to how much information about f(x), g(x) and H(x) the user must provide:
If the algorithm requires evaluation of all the components on each iteration then it is called the Second Derivative method.
Gradient methods need estimations of the objective function and its gradient. These methods are usually less resource consuming but provide weaker convergence.
If the algorithm does not need evaluating of derivatives then it is no-derivative method
Such methods have proven considerably worse convergence and robustness and are almost not in use for optimization.
Any optimization algorithm solves the two main subproplems on each iteration k, either explicitly or implicitly:
It computes a descent direction dk to decrease the objective function;
It computes a step length αk, usually as the optimal value in some sense.
Formulation of these subproblems usually involves approximation of the objective function by Taylor expansion, either linear of quadratic:
f(x + s) ≈ q(s) = f(x) + sTg(x)
f(x + s) ≈ q(s) = f(x) + sTg(x) + ½ sTH(x) s
Globally convergent methods are usually designed on the basis of some prototype strategy which specifies an overall approach within which some variances might be possible. The two main strategies are Line Search and Trust Region (sometimes called Resticted Step).
Note |
---|
The name Line Search is used in the wide sense and in the narrow sense:
|
The following table provides a general overview of the two basic optimization strategies:
Line Search algorithms | Trust Region algorithms | |
---|---|---|
Rationale | With the Line Search strategy, the algorithm chooses a descent direction dk and searches along this direction from the current point xk for an lower or lowest function value gained at some distance αk (line search as such). | Trust Region algorithm first defines a maximum distance, the trust-region radius Δk, where the objective function can be 'well' approximated by the quadratic model q(s), and then seeks a direction and step that attain the best improvement of the model within the radius (i.e. it computes the direction and the step length 'at once'). |
Basic Algorithm | Do until stopping criteria:
| Do until stopping criteria:
|
Implementations | Different algorithms correspond to different ways of choosing direction dk and techniques applied to compute step length. Most known in practice search direction methods are modifications of Conjugate Gradient method and Newton-like (Newton-type and Quasi-Newton) methods. Line search methods are systematized either as exact or soft, the latter are more resource effective and use special conditions to ensure approximation accuracy (Wolfe, Goldstein, etc.). | Trust Region methods differ mainly in:
|
Note |
---|
Line Search methods as such (i.e. out of strategy context) do not have practical value for real-world problems and are not presented in this Guide being implemented and used in the FinMath library. |