logo资料库

Introductory Lectures on Convex Programming Volume I: Basic cour....pdf

第1页 / 共209页
第2页 / 共209页
第3页 / 共209页
第4页 / 共209页
第5页 / 共209页
第6页 / 共209页
第7页 / 共209页
第8页 / 共209页
资料共209页,剩余部分请下载后查看
Introductory Lectures on Convex Programming Volume I: Basic course Yu. Nesterov July 2, 1998
Contents Introduction 1 Nonlinear Programming 1.1 The World of Nonlinear Optimization . . . . . . . . . . . . . . . . . . . . . . 1.1.1 General formulation of the problem . . . . . . . . . . . . . . . . . . . 1.1.2 Performance of numerical methods . . . . . . . . . . . . . . . . . . . 1.1.3 Complexity bounds for global optimization . . . . . . . . . . . . . . . 1.1.4 Identity cards of the fields . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Local methods in unconstrained minimization . . . . . . . . . . . . . . . . . 1.2.1 Relaxation and approximation . . . . . . . . . . . . . . . . . . . . . . 1.2.2 Classes of differentiable functions . . . . . . . . . . . . . . . . . . . . 1.2.3 Gradient method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.4 Newton method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 First-order methods in nonlinear optimization . . . . . . . . . . . . . . . . . 1.3.1 Gradient method and Newton method: What is different? . . . . . . 1.3.2 Conjugate gradients . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.3 Constrained minimization . . . . . . . . . . . . . . . . . . . . . . . . 2 Smooth Convex Programming Strongly convex functions µ,L(Rn) 2.1 Minimization of Smooth Functions . . . . . . . . . . . . . . . . . . . . . . . Smooth convex functions . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 2.1.2 Lower complexity bounds for F∞,1 L (Rn) . . . . . . . . . . . . . . . . . 2.1.3 . . . . . . . . . . . . . . . . . . . . . . . . 2.1.4 Lower complexity bounds for S 1,1 . . . . . . . . . . . . . . . . . 2.1.5 Gradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Optimal Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 Optimal Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.2 Convex sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.3 Gradient Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.4 Minimization methods for simple sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.1 MiniMax Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3.2 Gradient Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Minimization Problem with Smooth Components 5 9 9 9 12 14 18 20 20 24 28 34 38 38 42 46 51 51 51 56 60 63 65 67 67 76 80 82 84 84 86 1
2 CONTENTS 2.3.3 Minimization methods for minimax problem . . . . . . . . . . . . . . 2.3.4 Optimization with Functional Constraints . . . . . . . . . . . . . . . 2.3.5 Constrained minimization process . . . . . . . . . . . . . . . . . . . . 89 92 96 3 Nonsmooth Convex Programming 3.2 Nonsmooth Minimization Methods 103 3.1 General Convex Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 3.1.1 Motivation and Definitions . . . . . . . . . . . . . . . . . . . . . . . . 103 3.1.2 Operations with convex functions . . . . . . . . . . . . . . . . . . . . 108 3.1.3 Continuity and Differentiability of Convex Functions . . . . . . . . . 112 Separation Theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 3.1.4 3.1.5 Subgradients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3.1.6 Computing the subgradients . . . . . . . . . . . . . . . . . . . . . . . 119 . . . . . . . . . . . . . . . . . . . . . . . 124 3.2.1 General Lower Complexity Bounds . . . . . . . . . . . . . . . . . . . 124 3.2.2 Main lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 3.2.3 Subgradient Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 3.2.4 Minimization with functional constraints . . . . . . . . . . . . . . . . 131 3.2.5 Complexity Bounds in Finite Dimension . . . . . . . . . . . . . . . . 133 3.2.6 Cutting Plane Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . 137 3.3 Methods with Complete Data . . . . . . . . . . . . . . . . . . . . . . . . . . 144 3.3.1 Model of nonsmooth function . . . . . . . . . . . . . . . . . . . . . . 144 3.3.2 Kelly method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 3.3.3 Level Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 3.3.4 Constrained Minimization . . . . . . . . . . . . . . . . . . . . . . . . 151 4 Structural Programming 157 4.1 Self-Concordant Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 4.1.1 Black box concept in Convex Programming . . . . . . . . . . . . . . 157 4.1.2 What the Newton method actually does? . . . . . . . . . . . . . . . . 159 4.1.3 Definition of self-concordant function . . . . . . . . . . . . . . . . . . 160 4.1.4 Main inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 4.1.5 Minimizing the self-concordant function . . . . . . . . . . . . . . . . 169 4.2 Self-Concordant Barriers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173 4.2.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 4.2.2 Definition of self-concordant barriers . . . . . . . . . . . . . . . . . . 175 4.2.3 Main Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 4.2.4 Path-following Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 180 4.2.5 Finding the analytic center . . . . . . . . . . . . . . . . . . . . . . . . 183 4.2.6 Problems with functional constraints . . . . . . . . . . . . . . . . . . 186 4.3 Applications of Structural Programming . . . . . . . . . . . . . . . . . . . . 189 4.3.1 Bounds on the parameter of a self-concordant barrier . . . . . . . . . 189 4.3.2 Linear and Quadratic Programming . . . . . . . . . . . . . . . . . . . 192 4.3.3 Semidefinite Programming . . . . . . . . . . . . . . . . . . . . . . . . 194
CONTENTS 3 4.3.4 Extremal ellipsoids . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 4.3.5 Separable Programming . . . . . . . . . . . . . . . . . . . . . . . . . 199 4.3.6 Choice of the minimization scheme . . . . . . . . . . . . . . . . . . . 202 Bibliographical comments References Index 207 209 211
4 CONTENTS
Introduction Optimization problems arise naturally in many application fields. Whatever people do, at some point they get a craving to organize things in a best possible way. This intention, converted in a mathematical form, turns out to be an optimization problem of certain type. Depending on the field of interest, it could be the optimal design problem, the optimal control problem, the optimal location problem or even the optimal diet problem. However, the next step, consisting in finding a solution to the mathematical model, is not so trivial. At the first glance, everything looks very simple: Many commercial optimization packages are easily available and any user can get a “solution” to his model just by clicking on an icon at the screen of the personal computer. The question is, how much can we trust it? One of the goals of this course is to show that, despite to their attraction, the proposed “solutions” of general optimization problems very often can break down the expectations of a naive user. The main fact, which should be known to any person dealing with optimization models, is that, in general, the optimization problems are unsolvable. In our opinion, this statement, which is usually missed in standard optimization courses, is very important for understanding the entire optimization theory, its past and its future.1 In many practical applications the process of creating a model takes a lot of time and efforts. Therefore, the researchers should have a clear understanding of the properties of the model they are creating. At the stage of the modeling, many different tools can be used to approximate the reality. And it is absolutely necessary to understand the consequences of each possible choice. Very often we need to choose between a good model, which we cannot solve,2, and a “bad” model, which can be solved for sure. What is better? In fact, the computational practice provides us with a kind of answer on the above question. The most widespread optimization models now are the linear programming models. It is very unlikely that such models can describe very well our nonlinear world. Thus, the only reason for their popularity can be that the modelists prefer to deal with solvable models. Of course, very often the linear approximation is poor, but usually it is possible to interpret the consequences of such choice and make the correction of the solution, when it will be available. May be it is better than trying to solve a model without any guarantee to get an answer. Another goal of this course consists in discussing the numerical methods for solvable nonlinear models, we mean the convex programs. The development of convex optimization 1Therefore we start our course with a simple proof of this statement. 2More precisely, which we can try to solve 5
6 Introduction theory in the last decade has been very rapid and exciting. Now it consists of several competing branches, each of which has some strong and some weak points. We will discuss in details their features, taking into account the historical aspect. More precisely, we will try to understand the internal logic of the development for each field. At this moment the main results of the development can be found only in special journals and monographs. However, in our opinion, now this theory is completely ready to be explained to the final users, industrial engineers, economists and students of different specializations. We hope that this book will be interesting even for the experts in optimization theory since it contains many results, which have been never published in English. In this book we will try to convince the reader, that in order to apply the optimization formulations successfully, it is necessary to be aware of some theory, which tells us what we can and what we cannot do with optimization problems. The elements of this theory, clean and simple, can be found almost in each lecture of the course. We will try to show that optimization is an excellent example of a complete application theory, which is simple, easy to learn and which can be very useful in practical applications. In this course we discuss the most efficient modern optimization schemes and prove their efficiency estimates. This course is self-contained; we prove all necessary results without dragging in exotic facts from other fields. Nevertheless, the proofs and the reasoning should not be a problem even for the second-year undergraduate students.3 The structure of the course is as follows. It consists of four relatively independent chap- ters. Each chapter includes three sections, each of which approximately corresponds to a two-hours lecture. Chapter 1 is devoted to general optimization problems. In Section 1.1 we introduce the terminology, the notions of the oracle and the black box, the complexity of the general iterative schemes. We prove that the global optimization problems are unsolvable and dis- cuss the main features of different fields of optimization theory. In Section 1.2 we discuss two main local unconstrained minimization schemes: the gradient method and the Newton method. We prove their local rate of convergence and discuss the possible troubles (diver- gence, convergence to a saddle point). In Section 1.3 we compare the formal structure of the gradient and the Newton method. This analysis leads to the idea of variable metric. We describe quasi-Newton methods and the conjugate gradient schemes. We conclude this section with the analysis of the sequential unconstrained minimization schemes. In Chapter 2 we consider the smooth convex optimization methods. In Section 2.1 we analyze the reason of our failures in the previous chapter and derive from that two good func- tional classes, the smooth convex functions and the smooth strongly convex functions. We prove the lower complexity bounds for corresponding unconstrained minimization problems. We conclude this section with the analysis of the gradient scheme, which demonstrates that this method is not optimal. The optimal schemes for smooth convex minimization problems are discussed in Section 2.2. We start from the unconstrained minimization problem. After that we introduce the convex sets and the notion of the gradient mapping for a minimization problem over a simple convex set. We show that the gradient mapping can just replace gradi- 3This course was presented by the author to students in the form of transparencies. And the rule was to place any proof at a single sheet. Thus, all of them are necessarily very short.
Introduction 7 ent in the optimal schemes. In Section 2.3 we discuss more complicated problems, which are formed by several smooth convex functions, namely, the minimax problem and constrained minimization problem. For both problems we introduce the notion of the gradient mapping and present the optimal schemes. In Chapter 3 we describe the nonsmooth convex optimization theory. Since we do not assume that the reader has a background in convex analysis, we devote the whole Section 3.1 to a short exposition of this theory. The final goal of this section is to justify the rules for computing the subgradients of convex function. And we are trying to reach this goal, starting from the definition of nonsmooth convex function, in a fastest way. The only deviation from the shortest path is Kuhn-Tucker theorem, which concludes the section. We start Section 3.2 from lower complexity bounds for nonsmooth optimization problems. After that we present the general scheme for complexity analysis of the corresponding methods. We use this scheme to prove the rate of convergence of the gradient method, the center-of-gravity method and the ellipsoid method. We discuss also some other cutting plane schemes. Section 3.3 is devoted to the minimimization schemes, which use a piece-wise linear model of convex function. We describe the Kelley method and show that its rate of convergence cam be very low. After that we introduce the level method. We prove the efficiency estimates of this method for unconstrained and constrained minimization problems. Chapter 4 is devoted to convex minimization problems with explicit structure. In Sec- tion 4.1 we discuss a certain contradiction in the black box concept, as applied to convex optimization. We introduce the notion of mediator, a special reformulation of the initial problem, for which we can point out a non-local oracle. We introduce the special class of convex functions, the self-concordant functions, for which the second-order oracle is not local and which can be easily minimized by the Newton method. We study the properties of these function and prove the rate of convergence of the Newton method. In Section 4.2 we intro- duce the self-concordant barriers, the subclass of self-concordant functions, which is suitable for sequential unconstrained minimization schemes. We study the properties of such barriers and prove the efficiency estimate of the path-following scheme. In Section 4.3 we consider several examples of optimization problems, for which we can construct a self-concordant bar- rier, and therefore they can be solved by a path-following scheme. We consider linear and quadratic programming problem, semidefinite programming, problems with extremal ellip- soids, separable programming, geometric programming and approximation in Lp-norms. We conclude this chapter and the whole course by the comparative analysis of an interior-point scheme and a nonsmooth optimization method as applied to a concrete problem instance.
分享到:
收藏