Name: aqSeabiscuit
Date: 2020/4/14
Second-order sufficient optimality conditions
Problem:
Suppose that f : Rn → R is twice differentiable at x. Please show that x is a strict local
minimum if ∇f (x) = 0 and the Hessian matrix H(x) is positive definite.
Solve:
First consider the situation where n = 2, then the function f : Rn → R can be written as
f (x1, x2), x1, x2 ∈ R2. Suppose that f is twice differentiable at ˆx, where ˆx is ( ˆx1, ˆx2), then
we can expand the function f (x) near ˆx with the help of Taylor series up to second-order. [1]
It can be given as
f (x) = f (ˆx) +
+
∂f
∂x1
1
2
ˆx
∂2f
∂x2
1
∆x1 +
∂f
∂x2
ˆx
∆x2
∂2f
∆x2
1 + 2
ˆx
∂x1∂x2
ˆx
∆x1∆x2 +
∂2f
∂x2
2
∆x2
2
ˆx
where ∆x1 = x1 − ˆx1, ∆x2 = x2 − ˆx2.
Turn the Taylor expansion above into a matrix form, then
1A
0@ ∆x1
264 ∂2f
∆x1
∂x2
1
∂2f
ˆx
375
0@ ∆x1
1A
∆x1
ˆx
∂2f
∂x1∂x2
∂2f
∂x2
2
f (x) = f (ˆx) +
∂f
∂x1
∂f
∂x1
+
1
2
(∆x1 ∆x2)
Equivalently, it’s vector form can also be given as
∂x2∂x1
(1)
(2)
(3)
(4)
(5)
where
f (x) = f (ˆx) + ∇f (ˆx) ∆x +
∆xT H (ˆx) ∆x
1
2
h
∆x = [∆x1 ∆x2]T
∇f (ˆx) =
∂f
∂x1
∂f
∂x2
i
1
Name: aqSeabiscuit
264 ∂2f
∂x2
1
∂2f
∂x2∂x1
375
ˆx
∂2f
∂x1∂x2
∂2f
∂x2
2
H (ˆx) =
Date: 2020/4/14
(6)
H (ˆx) is the second-order Hessian matrix of the function f at ˆx.
Similarly, we can generalize it for an arbitrary n.
For f : Rn → R, second-order Taylor expansion at ˆx in a vector form can be given as
the same as the formula (3) above, where
h
h
∆x1 ∆x2
∆x =
∇f (ˆx) =
∂f
∂x1
∂f
∂x2
iT
i
··· ∆xn
···
∂f
∂xn
∂2f
∂x2
1
∂2f
∂x2∂x1
...
∂2f
∂2f
∂x1∂x2
∂2f
∂x2
2
...
∂2f
∂xn∂x1
∂xn∂x2
···
···
...
···
∂2f
∂x1∂xn
∂2f
∂x2∂xn
...
∂2f
∂x2n
266666664
H (ˆx) =
(7)
(8)
(9)
377777775
ˆx
Since f is twice differentiable at ˆx, it has nothing to do with the order of differentiation
while calculating the second order partial derivative, that is
∂
∂xi
∂f
∂xj
=
∂
∂xj
∂f
∂xi
(10)
for Hessian matrix, we have Hij = Hji, so H (ˆx) is a symmetric matrices.
Let f (x) = X T AX, where A is a symmetric matrices and X = (x1, x2,··· , xn)T . By
the definition of the positive definite quadratic form, if matrix A is positive definite, then
for all x ∈ Rn, X ̸= 0, we have [2]
f (x) = X T AX > 0
(11)
Then, let g (x) = ∆xT H (ˆx) ∆x, where ∆x, H (ˆx) are as the same as (7)(9). Suppose that
H (ˆx) is positive definite, we can conclude from (11) that for all x ∈ Rn, x ̸= ˆx, we have
g (x) = ∆xT H (ˆx) ∆x > 0
(12)
2
Name: aqSeabiscuit
Date: 2020/4/14
According to (3), we can deduce the formula as below
f (ˆx + ηv) ) = f (ˆx) + η∇f (ˆx) vT +
η2
2
vT H (ˆx) v
where v ∈ Rn,|v| ̸= 0, and η is a constant called learning rate. [3]
Suppose that
∇f (ˆx) = 0
From (12)(13)(14), we can get
(13)
(14)
f (ˆx + ηv) ) = f (ˆx) +
η2
2
vT H (ˆx) v > f (ˆx) ,
∀v ∈ Rn,|v| ̸= 0
(15)
Obviously, ˆx is the strict local minimizer.
So, we can draw a conclusion:
if f : Rn → R is twice differentiable at x, then the sufficient conditions for that x is a
strict local minimizer are the Hessian matrix H(x) is positive definite and ∇f (x) = 0.
References
[1] T. U. Department of Mathematics, Advanced Mathematics (7th Editiion).
[2] ——, Linear Algebra (6th Edition).
[3] W. Jianxin, Pattern recognition.
3