logo资料库

Lectures on Convex Optimization Yurii Nesterov.pdf

第1页 / 共603页
第2页 / 共603页
第3页 / 共603页
第4页 / 共603页
第5页 / 共603页
第6页 / 共603页
第7页 / 共603页
第8页 / 共603页
资料共603页,剩余部分请下载后查看
Preface
Acknowledgements
Introduction
Contents
Part I Black-Box Optimization
1 Nonlinear Optimization
1.1 The World of Nonlinear Optimization
1.1.1 General Formulation of the Problem
1.1.2 Performance of Numerical Methods
1.1.3 Complexity Bounds for Global Optimization
1.1.4 Identity Cards of the Fields
1.2 Local Methods in Unconstrained Minimization
1.2.1 Relaxation and Approximation
1.2.2 Classes of Differentiable Functions
1.2.3 The Gradient Method
1.2.4 Newton's Method
1.3 First-Order Methods in Nonlinear Optimization
1.3.1 The Gradient Method and Newton's Method: What Is Different?
1.3.2 Conjugate Gradients
1.3.3 Constrained Minimization
1.3.3.1 Lagrangian Relaxation
1.3.3.2 Penalty Functions
1.3.3.3 Barrier Functions
2 Smooth Convex Optimization
2.1 Minimization of Smooth Functions
2.1.1 Smooth Convex Functions
2.1.2 Lower Complexity Bounds for F∞,1L(Rn)
2.1.3 Strongly Convex Functions
2.1.4 Lower Complexity Bounds for S∞,1μ,L(Rn)
2.1.5 The Gradient Method
2.2 Optimal Methods
2.2.1 Estimating Sequences
2.2.2 Decreasing the Norm of the Gradient
2.2.3 Convex Sets
2.2.4 The Gradient Mapping
2.2.5 Minimization over Simple Sets
2.3 The Minimization Problem with Smooth Components
2.3.1 The Minimax Problem
2.3.2 Gradient Mapping
2.3.3 Minimization Methods for the Minimax Problem
2.3.4 Optimization with Functional Constraints
2.3.5 The Method for Constrained Minimization
3 Nonsmooth Convex Optimization
3.1 General Convex Functions
3.1.1 Motivation and Definitions
3.1.2 Operations with Convex Functions
3.1.3 Continuity and Differentiability
3.1.4 Separation Theorems
3.1.5 Subgradients
3.1.6 Computing Subgradients
3.1.7 Optimality Conditions
3.1.8 Minimax Theorems
3.1.9 Basic Elements of Primal-Dual Methods
3.2 Methods of Nonsmooth Minimization
3.2.1 General Lower Complexity Bounds
3.2.2 Estimating Quality of Approximate Solutions
3.2.3 The Subgradient Method
3.2.4 Minimization with Functional Constraints
3.2.5 Approximating the Optimal Lagrange Multipliers
3.2.6 Strongly Convex Functions
3.2.7 Complexity Bounds in Finite Dimension
3.2.8 Cutting Plane Schemes
3.3 Methods with Complete Data
3.3.1 Nonsmooth Models of the Objective Function
3.3.2 Kelley's Method
3.3.3 The Level Method
3.3.4 Constrained Minimization
4 Second-Order Methods
4.1 Cubic Regularization of Newton's Method
4.1.1 Cubic Regularization of Quadratic Approximation
4.1.2 General Convergence Results
4.1.3 Global Efficiency Bounds on Specific Problem Classes
4.1.3.1 Star-Convex Functions
4.1.3.2 Gradient-Dominated Functions
4.1.3.3 Nonlinear Transformations of Convex Functions
4.1.4 Implementation Issues
4.1.4.1 Minimizing the Cubic Regularization
4.1.4.2 Line Search Strategies
4.1.5 Global Complexity Bounds
4.2 Accelerated Cubic Newton
4.2.1 Real Vector Spaces
4.2.2 Uniformly Convex Functions
4.2.3 Cubic Regularization of Newton Iteration
4.2.4 An Accelerated Scheme
4.2.5 Global Non-degeneracy for Second-Order Schemes
4.2.6 Minimizing Strongly Convex Functions
4.2.7 False Acceleration
4.2.8 Decreasing the Norm of the Gradient
4.2.9 Complexity of Non-degenerate Problems
4.3 Optimal Second-Order Methods
4.3.1 Lower Complexity Bounds
4.3.2 A Conceptual Optimal Scheme
4.3.3 Complexity of the Search Procedure
4.4 The Modified Gauss–Newton Method
4.4.1 Quadratic Regularization of the Gauss–Newton Iterate
4.4.2 The Modified Gauss–Newton Process
4.4.3 Global Rate of Convergence
4.4.4 Discussion
4.4.4.1 A Comparative Analysis of Scheme (4.4.17)
4.4.4.2 Implementation Issues
Part II Structural Optimization
5 Polynomial-Time Interior-Point Methods
5.1 Self-concordant Functions
5.1.1 The Black Box Concept in Convex Optimization
5.1.2 What Does the Newton's Method Actually Do?
5.1.3 Definition of Self-concordant Functions
5.1.4 Main Inequalities
5.1.5 Self-Concordance and Fenchel Duality
5.2 Minimizing Self-concordant Functions
5.2.1 Local Convergence of Newton's Methods
5.2.2 Path-Following Scheme
5.2.3 Minimizing Strongly Convex Functions
5.3 Self-concordant Barriers
5.3.1 Motivation
5.3.2 Definition of a Self-concordant Barrier
5.3.3 Main Inequalities
5.3.4 The Path-Following Scheme
5.3.5 Finding the Analytic Center
5.3.6 Problems with Functional Constraints
5.4 Applications to Problems with Explicit Structure
5.4.1 Lower Bounds for the Parameter of a Self-concordant Barrier
5.4.2 Upper Bound: Universal Barrier and Polar Set
5.4.3 Linear and Quadratic Optimization
5.4.4 Semidefinite Optimization
5.4.5 Extremal Ellipsoids
5.4.5.1 Circumscribed Ellipsoid
5.4.5.2 Inscribed Ellipsoid with Fixed Center
5.4.5.3 Inscribed Ellipsoid with Free Center
5.4.6 Constructing Self-concordant Barriers for Convex Sets
5.4.7 Examples of Self-concordant Barriers
5.4.8 Separable Optimization
5.4.8.1 Logarithm and Exponent
5.4.8.2 Entropy Function
5.4.8.3 Increasing Power Functions
5.4.8.4 Decreasing Power Functions
5.4.8.5 Geometric Optimization
5.4.8.6 Approximation in an p-Norm
5.4.9 Choice of Minimization Scheme
6 The Primal-Dual Model of an Objective Function
6.1 Smoothing for an Explicit Model of an Objective Function
6.1.1 Smooth Approximations of Non-differentiable Functions
6.1.2 The Minimax Model of an Objective Function
6.1.3 The Fast Gradient Method for Composite Minimization
6.1.4 Application Examples
6.1.4.1 Minimax Strategies for Matrix Games
6.1.4.2 The Continuous Location Problem
6.1.4.3 Variational Inequalities with a Linear Operator
6.1.4.4 Piece-Wise Linear Optimization
6.1.5 Implementation Issues
6.1.5.1 Computational Complexity
6.1.5.2 Computational Stability
6.2 An Excessive Gap Technique for Non-smooth ConvexMinimization
6.2.1 Primal-Dual Problem Structure
6.2.2 An Excessive Gap Condition
6.2.3 Convergence Analysis
6.2.4 Minimizing Strongly Convex Functions
6.3 The Smoothing Technique in Semidefinite Optimization
6.3.1 Smooth Symmetric Functions of Eigenvalues
6.3.2 Minimizing the Maximal Eigenvalue of the Symmetric Matrix
6.4 Minimizing the Local Model of an Objective Function
6.4.1 A Linear Optimization Oracle
6.4.2 The Method of Conditional Gradients with Composite Objective
6.4.3 Conditional Gradients with Contraction
6.4.4 Computing the Primal-Dual Solution
6.4.5 Strong Convexity of the Composite Term
6.4.6 Minimizing the Second-Order Model
7 Optimization in Relative Scale
7.1 Homogeneous Models of an Objective Function
7.1.1 The Conic Unconstrained Minimization Problem
7.1.2 The Subgradient Approximation Scheme
7.1.3 Direct Use of the Problem Structure
7.1.4 Application Examples
7.1.4.1 Linear Programming
7.1.4.2 Minimization of the Spectral Radius
7.1.4.3 The Truss Topology Design Problem
7.2 Rounding of Convex Sets
7.2.1 Computing Rounding Ellipsoids
7.2.1.1 Convex Sets with Central Symmetry
7.2.1.2 General Convex Sets
7.2.1.3 Sign-Invariant Convex Sets
7.2.2 Minimizing the Maximal Absolute Value of Linear Functions
7.2.3 Bilinear Matrix Games with Non-negative Coefficients
7.2.4 Minimizing the Spectral Radius of Symmetric Matrices
7.3 Barrier Subgradient Method
7.3.1 Smoothing by a Self-Concordant Barrier
7.3.2 The Barrier Subgradient Scheme
7.3.3 Maximizing Positive Concave Functions
7.3.4 Applications
7.3.4.1 The Fractional Covering Problem
7.3.4.2 The Maximal Concurrent Flow Problem
7.3.4.3 The Minimax Problem with Nonnegative Components
7.3.4.4 Semidefinite Relaxation of the Boolean Quadratic Problem
7.3.5 Online Optimization as an Alternative to Stochastic Programming
7.3.5.1 A Decision-Making Process in an Uncertain Environment
7.3.5.2 Portfolio Management
7.3.5.3 Processes with Full Production Cycles
7.3.5.4 Discussion
7.4 Optimization with Mixed Accuracy
7.4.1 Strictly Positive Functions
7.4.2 The Quasi-Newton Method
7.4.3 Interpretation of Approximate Solutions
7.4.3.1 Relative Accuracy
7.4.3.2 Absolute Accuracy
A Solving Some Auxiliary Optimization Problems
A.1 Newton's Method for Univariate Minimization
A.2 Barrier Projection onto a Simplex
Bibliographical Comments
Chapter 1: Nonlinear Optimization
Chapter 2: Smooth Convex Optimization
Chapter 3: Nonsmooth Convex Optimization
Chapter 4: Second-Order Methods
Chapter 5: Polynomial-Time Interior-Point Methods
Chapter 6: The Primal-Dual Model of an Objective Function
Chapter 7: Optimization in Relative Scale
References
Index
Springer Optimization and Its Applications 137 Yurii Nesterov Lectures on Convex Optimization Second Edition
Springer Optimization and Its Applications Volume 137 Managing Editor Panos M. Pardalos (University of Florida) Editor-Combinatorial Optimization Ding-Zhu Du (University of Texas at Dallas) Advisory Board J. Birge (University of Chicago) S. Butenko (Texas A & M University) F. Giannessi (University of Pisa) S. Rebennack (Karlsruhe Institute of Technology) T. Terlaky (Lehigh University) Y. Ye (Stanford University) Aims and Scope Optimization has been expanding in all directions at an astonishing rate during the last few decades. New algorithmic and theoretical techniques have been developed, the diffusion into other disciplines has proceeded at a rapid pace, and our knowledge of all aspects of the field has grown even more profound. At the same time, one of the most striking trends in optimization is the constantly increasing emphasis on the interdisciplinary nature of the field. Optimization has been a basic tool in all areas of applied mathematics, engineering, medicine, economics and other sciences. The series Springer Optimization and Its Applications publishes undergraduate and graduate textbooks, monographs and state-of-the-art expository works that focus on algorithms for solving optimization problems and also study applications involving such problems. Some of the topics covered include nonlinear optimization (convex and nonconvex), network flow problems, stochastic optimization, optimal control, discrete optimization, multi-objective programming, description of soft- ware packages, approximation techniques and heuristic approaches.
More information about this series at http://www.springer.com/series/7393
Yurii Nesterov Lectures on Convex Optimization Second Edition 123
Yurii Nesterov CORE/INMA Catholic University of Louvain Louvain-la-Neuve, Belgium ISSN 1931-6828 Springer Optimization and Its Applications ISBN 978-3-319-91577-7 https://doi.org/10.1007/978-3-319-91578-4 ISSN 1931-6836 (electronic) ISBN 978-3-319-91578-4 (eBook) Library of Congress Control Number: 2018949149 Mathematics Subject Classification (2010): 49M15, 49M29, 49N15, 65K05, 65K10, 90C25, 90C30, 90C46, 90C51, 90C52, 90C60 © Springer Nature Switzerland AG 2004, 2018 This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
To my wife Svetlana
Preface The idea of writing this book came from the editors of Springer, who suggested that the author should think about a renewal of the book Introductory Lectures on Convex Optimization: Basic Course, which was published by Kluwer in 2003 [39]. In fact, the main part of this book was written in the period 1997–1998, so its material is at least twenty years old. For such a lively field as Convex Optimization, this is indeed a long time. However, having started to work with the text, the author realized very quickly that this modest goal was simply unreachable. The main idea of [39] was to present a short one-semester course (12 lectures) on Convex Optimization, which reflected the main algorithmic achievements in the field at the time. Therefore, some important notions and ideas, especially related to all kinds of Duality Theory, were eliminated from the contents without any remorse. In some sense, [39] still remains the minimal course representing the basic concepts of algorithmic Convex Optimization. Any enlargements to this text would require difficult explanations as to why the selected material is more important than the many other interesting candidates which have been left on the shelf. Thus, the author came to a hard decision to write a new book, which includes all of the material of [39], along with the most important advances in the field during the last two decades. From the chronological point of view, this book covers the period up to the year 2012.1 Therefore, the newer results on random coordinate descent methods and universal methods, complexity results on zero- order algorithms and methods for solving huge-scale problems are still missing. However, in our opinion, these very interesting topics have not yet matured enough for a monographic presentation, especially in the form of lectures. From the methodological point of view, the main novelty of this book consists in the wide presence of duality. Now the reader can see the story from both sides, 1Well, just for consistency, we added the results from several last-minute publications, which are important for the topics discussed in the book. vii
viii Preface primal and dual. As compared to [39], the size of the book is doubled, which looks to be a reasonable price to pay for a comprehensive presentation. Clearly, this book is too big now to be taught during one-semester. However, it fits well a two-semester term. Alternatively, different parts of it can be used in diverse educational programs on modern optimization. We discuss possible variants at the end of the Introduction. In this book we include three topics, which are new to the monographic literature. • The smoothing technique. This approach has completely changed our under- standing of complexity of nonsmooth optimization problems, which arise in the vast majority of applications. It is based on the algorithmic possibility of approximating a non-differentiable convex function by a smooth one, and minimizing the new objective by Fast Gradient Methods. As compared with standard subgradient methods, the complexity of each iteration of the new schemes does not change. However, the estimate for the number of iterations of these schemes becomes proportional to the square root of this number for the standard methods. Since in practice, these numbers are usually of the order of many thousands, or even millions, the gain in computational time becomes spectacular. • Global complexity bounds for second-order methods. Second-order methods, and their most famous representative, the Newton’s Method, are among the oldest schemes in Numerical Analysis. However, their global complexity analysis has only recently been carried out, after the discovery of the Cubic Regularization of Newton’s Method. For this new variant of classical scheme, we can write down the global complexity bounds for different problem classes. Consequently, we can now compare global efficiency of different second-order methods and develop accelerated schemes. A completely new feature of these methods is the accumulation of some model of the objective function during the minimization process. At the same time, we can derive for them lower complexity bounds and develop optimal second-order methods. Similar modifications can be made for methods solving systems of nonlinear equations. • Optimization in relative scale. The standard way of defining an approximate solution of an optimization problem consists in introducing absolute accuracy. However, in many engineering applications, it is natural to measure the quality of solution in a relative scale (percent). To adjust minimization methods toward this goal, we introduce a special model of objective function and apply efficient preprocessing algorithms for computing an appropriate metric, compatible with the topology of the objective. As a result, we get very efficient optimization methods with a weak dependence of their complexity bounds in the size of input data. We hope that this book will be useful for a wide audience, including students with mathematical, economical, and engineering specializations, practitioners of different fields, and researchers in Optimization Theory, Operations Research, and Computer Science. The main lesson of the development of our field in the last few decades is that efficient optimization methods can be developed only by intelligently
分享到:
收藏