This topic contains the following sections:
The problem of interest can be stated as that of finding a local solution x^{*} to the task:
minimum _{ x} f ( x ) , where f is a scalar function: R^{n} → R.
The function f is called the objective (or cost) function and the solution x^{*} is the minimizer. A maximizer of the function can be found as a minimizer of the function with opposite sign.
Iterative methods produce a series of vectors x_{0}, x_{1}, x_{2} ... , which in most cases converges under certain conditions towards a local minimizer x^{*}:
x_{k} → x^{*} for k → ∞.
In all the methods there are measures which enforce the descending property: f (x_{k+1}) < f (x_{k}).
The two conditions play a fundamental part for unconstrained optimization:
The necessary condition for a local minimum is that x^{*} is a stationary point, that is, g ( x^{*} ) = 0,
where g ( x ) = [ ∂f(x) / ∂x_{i} ] is the gradient vector.
The sufficient condition for x to solve the problem is both the necessary condition and positivedefinite Hessian matrix
H ( x^{*}), where H ( x ) = [∂^{2}f(x) / ∂x_{i }∂x_{j } ].
Some optimization algorithms use only the necessary condition and require the objective function to be differentiable. Others use both the conditions and require the objective function to be twicedifferentiable.
Convergence tells us how quickly we can get a result which agrees with the minimizer to a desired accuracy. Formally convergence is measured via algorithm errors:
e_{k} =  x_{k}  x^{*} , where k is the iteration number and x_{k} is assumed close enough to the minimizer (i.e. k is close to the final number of iterations).
Most of the algorithms expose the following kinds of convergence:
Linear convergence most often occurring:
e_{k+1} ≤ a e_{k} with 0 < a < 1;
Quadratic convergence achieved by only a few of the practical algorithms:
e_{k+1} ≤ b e_{k}^{2} with b > 0;
Many of the methods used in practice have superlinear convergence:
e_{k+1} / e_{k} → 0 for k → ∞.
Optimization algorithms differ according to how much information about f(x), g(x) and H(x) the user must provide:
If the algorithm requires evaluation of all the components on each iteration then it is called the Second Derivative method.
Gradient methods need estimations of the objective function and its gradient. These methods are usually less resource consuming but provide weaker convergence.
If the algorithm does not need evaluating of derivatives then it is noderivative method
Such methods have proven considerably worse convergence and robustness and are almost not in use for optimization.
Any optimization algorithm solves the two main subproplems on each iteration k, either explicitly or implicitly:
It computes a descent direction d_{k} to decrease the objective function;
It computes a step length α_{k}, usually as the optimal value in some sense.
Formulation of these subproblems usually involves approximation of the objective function by Taylor expansion, either linear of quadratic:
f(x + s) ≈ q(s) = f(x) + s^{T}g(x)
f(x + s) ≈ q(s) = f(x) + s^{T}g(x) + ½ s^{T}H(x) s
Globally convergent methods are usually designed on the basis of some prototype strategy which specifies an overall approach within which some variances might be possible. The two main strategies are Line Search and Trust Region (sometimes called Resticted Step).
Note 

The name Line Search is used in the wide sense and in the narrow sense:

The following table provides a general overview of the two basic optimization strategies:
Line Search algorithms  Trust Region algorithms  

Rationale  With the Line Search strategy, the algorithm chooses a descent direction d_{k} and searches along this direction from the current point x_{k} for an lower or lowest function value gained at some distance α_{k} (line search as such).  Trust Region algorithm first defines a maximum distance, the trustregion radius Δ_{k}, where the objective function can be 'well' approximated by the quadratic model q(s), and then seeks a direction and step that attain the best improvement of the model within the radius (i.e. it computes the direction and the step length 'at once'). 
Basic Algorithm  Do until stopping criteria:
 Do until stopping criteria:

Implementations  Different algorithms correspond to different ways of choosing direction d_{k} and techniques applied to compute step length. Most known in practice search direction methods are modifications of Conjugate Gradient method and Newtonlike (Newtontype and QuasiNewton) methods. Line search methods are systematized either as exact or soft, the latter are more resource effective and use special conditions to ensure approximation accuracy (Wolfe, Goldstein, etc.).  Trust Region methods differ mainly in:

Note 

Line Search methods as such (i.e. out of strategy context) do not have practical value for realworld problems and are not presented in this Guide being implemented and used in the FinMath library. 