Introductory Lectures on Convex Programming
Volume I: Basic course
Yu. Nesterov
July 2, 1998
Contents
Introduction
1 Nonlinear Programming
1.1 The World of Nonlinear Optimization . . . . . . . . . . . . . . . . . . . . . .
1.1.1 General formulation of the problem . . . . . . . . . . . . . . . . . . .
1.1.2 Performance of numerical methods
. . . . . . . . . . . . . . . . . . .
1.1.3 Complexity bounds for global optimization . . . . . . . . . . . . . . .
1.1.4
Identity cards of the fields . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Local methods in unconstrained minimization . . . . . . . . . . . . . . . . .
1.2.1 Relaxation and approximation . . . . . . . . . . . . . . . . . . . . . .
1.2.2 Classes of differentiable functions . . . . . . . . . . . . . . . . . . . .
1.2.3 Gradient method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.4 Newton method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 First-order methods in nonlinear optimization . . . . . . . . . . . . . . . . .
1.3.1 Gradient method and Newton method: What is different?
. . . . . .
1.3.2 Conjugate gradients
. . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.3 Constrained minimization . . . . . . . . . . . . . . . . . . . . . . . .
2 Smooth Convex Programming
Strongly convex functions
µ,L(Rn)
2.1 Minimization of Smooth Functions
. . . . . . . . . . . . . . . . . . . . . . .
Smooth convex functions . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1
2.1.2 Lower complexity bounds for F∞,1
L (Rn) . . . . . . . . . . . . . . . . .
2.1.3
. . . . . . . . . . . . . . . . . . . . . . . .
2.1.4 Lower complexity bounds for S 1,1
. . . . . . . . . . . . . . . . .
2.1.5 Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Optimal Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 Optimal Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.2 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.3 Gradient Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.4 Minimization methods for simple sets . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
2.3.1 MiniMax Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2 Gradient Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3 Minimization Problem with Smooth Components
5
9
9
9
12
14
18
20
20
24
28
34
38
38
42
46
51
51
51
56
60
63
65
67
67
76
80
82
84
84
86
1
2
CONTENTS
2.3.3 Minimization methods for minimax problem . . . . . . . . . . . . . .
2.3.4 Optimization with Functional Constraints
. . . . . . . . . . . . . . .
2.3.5 Constrained minimization process . . . . . . . . . . . . . . . . . . . .
89
92
96
3 Nonsmooth Convex Programming
3.2 Nonsmooth Minimization Methods
103
3.1 General Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.1.1 Motivation and Definitions . . . . . . . . . . . . . . . . . . . . . . . . 103
3.1.2 Operations with convex functions . . . . . . . . . . . . . . . . . . . . 108
3.1.3 Continuity and Differentiability of Convex Functions
. . . . . . . . . 112
Separation Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
3.1.4
3.1.5
Subgradients
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
3.1.6 Computing the subgradients . . . . . . . . . . . . . . . . . . . . . . . 119
. . . . . . . . . . . . . . . . . . . . . . . 124
3.2.1 General Lower Complexity Bounds
. . . . . . . . . . . . . . . . . . . 124
3.2.2 Main lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.2.3
Subgradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.2.4 Minimization with functional constraints . . . . . . . . . . . . . . . . 131
3.2.5 Complexity Bounds in Finite Dimension . . . . . . . . . . . . . . . . 133
3.2.6 Cutting Plane Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3.3 Methods with Complete Data . . . . . . . . . . . . . . . . . . . . . . . . . . 144
3.3.1 Model of nonsmooth function . . . . . . . . . . . . . . . . . . . . . . 144
3.3.2 Kelly method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3.3.3 Level Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
3.3.4 Constrained Minimization . . . . . . . . . . . . . . . . . . . . . . . . 151
4 Structural Programming
157
4.1 Self-Concordant Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
4.1.1 Black box concept in Convex Programming
. . . . . . . . . . . . . . 157
4.1.2 What the Newton method actually does? . . . . . . . . . . . . . . . . 159
4.1.3 Definition of self-concordant function . . . . . . . . . . . . . . . . . . 160
4.1.4 Main inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4.1.5 Minimizing the self-concordant function . . . . . . . . . . . . . . . . 169
4.2 Self-Concordant Barriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.2.2 Definition of self-concordant barriers
. . . . . . . . . . . . . . . . . . 175
4.2.3 Main Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.2.4 Path-following Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.2.5 Finding the analytic center . . . . . . . . . . . . . . . . . . . . . . . . 183
4.2.6 Problems with functional constraints . . . . . . . . . . . . . . . . . . 186
4.3 Applications of Structural Programming . . . . . . . . . . . . . . . . . . . . 189
4.3.1 Bounds on the parameter of a self-concordant barrier . . . . . . . . . 189
4.3.2 Linear and Quadratic Programming . . . . . . . . . . . . . . . . . . . 192
4.3.3
Semidefinite Programming . . . . . . . . . . . . . . . . . . . . . . . . 194
CONTENTS
3
4.3.4 Extremal ellipsoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4.3.5
Separable Programming . . . . . . . . . . . . . . . . . . . . . . . . . 199
4.3.6 Choice of the minimization scheme . . . . . . . . . . . . . . . . . . . 202
Bibliographical comments
References
Index
207
209
211
4
CONTENTS
Introduction
Optimization problems arise naturally in many application fields. Whatever people do, at
some point they get a craving to organize things in a best possible way. This intention,
converted in a mathematical form, turns out to be an optimization problem of certain type.
Depending on the field of interest, it could be the optimal design problem, the optimal
control problem, the optimal location problem or even the optimal diet problem. However,
the next step, consisting in finding a solution to the mathematical model, is not so trivial.
At the first glance, everything looks very simple: Many commercial optimization packages
are easily available and any user can get a “solution” to his model just by clicking on an
icon at the screen of the personal computer. The question is, how much can we trust it?
One of the goals of this course is to show that, despite to their attraction, the proposed
“solutions” of general optimization problems very often can break down the expectations of
a naive user. The main fact, which should be known to any person dealing with optimization
models, is that, in general, the optimization problems are unsolvable. In our opinion, this
statement, which is usually missed in standard optimization courses, is very important for
understanding the entire optimization theory, its past and its future.1
In many practical applications the process of creating a model takes a lot of time and
efforts. Therefore, the researchers should have a clear understanding of the properties of the
model they are creating. At the stage of the modeling, many different tools can be used to
approximate the reality. And it is absolutely necessary to understand the consequences of
each possible choice. Very often we need to choose between a good model, which we cannot
solve,2, and a “bad” model, which can be solved for sure. What is better?
In fact, the computational practice provides us with a kind of answer on the above
question. The most widespread optimization models now are the linear programming models.
It is very unlikely that such models can describe very well our nonlinear world. Thus, the
only reason for their popularity can be that the modelists prefer to deal with solvable models.
Of course, very often the linear approximation is poor, but usually it is possible to interpret
the consequences of such choice and make the correction of the solution, when it will be
available. May be it is better than trying to solve a model without any guarantee to get an
answer.
Another goal of this course consists in discussing the numerical methods for solvable
nonlinear models, we mean the convex programs. The development of convex optimization
1Therefore we start our course with a simple proof of this statement.
2More precisely, which we can try to solve
5
6
Introduction
theory in the last decade has been very rapid and exciting. Now it consists of several
competing branches, each of which has some strong and some weak points. We will discuss
in details their features, taking into account the historical aspect. More precisely, we will
try to understand the internal logic of the development for each field. At this moment the
main results of the development can be found only in special journals and monographs.
However, in our opinion, now this theory is completely ready to be explained to the final
users, industrial engineers, economists and students of different specializations. We hope that
this book will be interesting even for the experts in optimization theory since it contains
many results, which have been never published in English.
In this book we will try to convince the reader, that in order to apply the optimization
formulations successfully, it is necessary to be aware of some theory, which tells us what we
can and what we cannot do with optimization problems. The elements of this theory, clean
and simple, can be found almost in each lecture of the course. We will try to show that
optimization is an excellent example of a complete application theory, which is simple, easy
to learn and which can be very useful in practical applications.
In this course we discuss the most efficient modern optimization schemes and prove their
efficiency estimates. This course is self-contained; we prove all necessary results without
dragging in exotic facts from other fields. Nevertheless, the proofs and the reasoning should
not be a problem even for the second-year undergraduate students.3
The structure of the course is as follows. It consists of four relatively independent chap-
ters. Each chapter includes three sections, each of which approximately corresponds to a
two-hours lecture.
Chapter 1 is devoted to general optimization problems. In Section 1.1 we introduce
the terminology, the notions of the oracle and the black box, the complexity of the general
iterative schemes. We prove that the global optimization problems are unsolvable and dis-
cuss the main features of different fields of optimization theory. In Section 1.2 we discuss
two main local unconstrained minimization schemes: the gradient method and the Newton
method. We prove their local rate of convergence and discuss the possible troubles (diver-
gence, convergence to a saddle point). In Section 1.3 we compare the formal structure of
the gradient and the Newton method. This analysis leads to the idea of variable metric.
We describe quasi-Newton methods and the conjugate gradient schemes. We conclude this
section with the analysis of the sequential unconstrained minimization schemes.
In Chapter 2 we consider the smooth convex optimization methods. In Section 2.1 we
analyze the reason of our failures in the previous chapter and derive from that two good func-
tional classes, the smooth convex functions and the smooth strongly convex functions. We
prove the lower complexity bounds for corresponding unconstrained minimization problems.
We conclude this section with the analysis of the gradient scheme, which demonstrates that
this method is not optimal. The optimal schemes for smooth convex minimization problems
are discussed in Section 2.2. We start from the unconstrained minimization problem. After
that we introduce the convex sets and the notion of the gradient mapping for a minimization
problem over a simple convex set. We show that the gradient mapping can just replace gradi-
3This course was presented by the author to students in the form of transparencies. And the rule was to
place any proof at a single sheet. Thus, all of them are necessarily very short.
Introduction
7
ent in the optimal schemes. In Section 2.3 we discuss more complicated problems, which are
formed by several smooth convex functions, namely, the minimax problem and constrained
minimization problem. For both problems we introduce the notion of the gradient mapping
and present the optimal schemes.
In Chapter 3 we describe the nonsmooth convex optimization theory. Since we do not
assume that the reader has a background in convex analysis, we devote the whole Section
3.1 to a short exposition of this theory. The final goal of this section is to justify the rules for
computing the subgradients of convex function. And we are trying to reach this goal, starting
from the definition of nonsmooth convex function, in a fastest way. The only deviation from
the shortest path is Kuhn-Tucker theorem, which concludes the section. We start Section
3.2 from lower complexity bounds for nonsmooth optimization problems. After that we
present the general scheme for complexity analysis of the corresponding methods. We use
this scheme to prove the rate of convergence of the gradient method, the center-of-gravity
method and the ellipsoid method. We discuss also some other cutting plane schemes. Section
3.3 is devoted to the minimimization schemes, which use a piece-wise linear model of convex
function. We describe the Kelley method and show that its rate of convergence cam be very
low. After that we introduce the level method. We prove the efficiency estimates of this
method for unconstrained and constrained minimization problems.
Chapter 4 is devoted to convex minimization problems with explicit structure. In Sec-
tion 4.1 we discuss a certain contradiction in the black box concept, as applied to convex
optimization. We introduce the notion of mediator, a special reformulation of the initial
problem, for which we can point out a non-local oracle. We introduce the special class of
convex functions, the self-concordant functions, for which the second-order oracle is not local
and which can be easily minimized by the Newton method. We study the properties of these
function and prove the rate of convergence of the Newton method. In Section 4.2 we intro-
duce the self-concordant barriers, the subclass of self-concordant functions, which is suitable
for sequential unconstrained minimization schemes. We study the properties of such barriers
and prove the efficiency estimate of the path-following scheme. In Section 4.3 we consider
several examples of optimization problems, for which we can construct a self-concordant bar-
rier, and therefore they can be solved by a path-following scheme. We consider linear and
quadratic programming problem, semidefinite programming, problems with extremal ellip-
soids, separable programming, geometric programming and approximation in Lp-norms. We
conclude this chapter and the whole course by the comparative analysis of an interior-point
scheme and a nonsmooth optimization method as applied to a concrete problem instance.