logo资料库

陆吾生华东师大讲义笔记.pdf

第1页 / 共43页
第2页 / 共43页
第3页 / 共43页
第4页 / 共43页
第5页 / 共43页
第6页 / 共43页
第7页 / 共43页
第8页 / 共43页
资料共43页,剩余部分请下载后查看
Notes for ECNU 2007 Short Course October 2007 0. Introduction • Purpose of the course: to offer the students a concise introduction to modern unconstrained and constrained optimization methods, algorithms, and computer software; functional optimization and applications will also be briefly addressed. • Background mathematics includes first-year calculus and linear algebra, some knowledge of calculus of variations and PDE are helpful but not critical. • Familiarity with MATLAB is important. • Most of the distributed course material (Chapters 1, 2, 4, 5, 7, 9, 10, 12, 13, 14, 16, and Appendix A) are from the book Andreas Antoniou and Wu-Sheng Lu, Practical Optimization: Algorithms and Engineering Applications, Springer, New York, March 2007. The book has a web site: http://www.ece.uvic.ca/~optimization/ I. Numerical Optimization and Applications 1.1 Unconstrained Optimization (Chaps. 1, 2, 4, 5, 7, 9) • Basic optimization problem: x ( ) • Assumption: f(x) is twice continuously differentiable. • An example: solve a system of algebraic equations pi(x) = 0 for i = 1, 2, …, m. • Basic structure of a numerical algorithm for minimizing f(x) mimimize R∈ for x f n Step 1: choose an initial point x0, set a convergence tolerance ε, and set a counter k = 0. Step 2: determine a search direction dk for reducing f(x) from point xk. d is minimized for Step 3: determine a step size k + x + = 1k , stop and output a solution xk+1, else set k := k + 1 and repeat from Step 2. ε
(we illustrate the basic ideas behind the Dichotomous search, Golden-section search, and quadratic interpolation methods.) ♦ 1-D optimization for general smooth functions: the inexact line search method by Roger Fletcher. Matlab codes implementing these search methods are available for download at http://www.ece.uvic.ca/~optimization/ • Determination of a search direction dk: ♦ Basis of the methods: Taylor series of f(x) in a small vicinity surrounding point xk: f ( x k + ) δ = f ( x k ) T + ∇ f ( x k ) or T ⋅ + δ δ 2 ⋅∇ f ( x k ) ⋅ + δ O ( )3 δ H x ( ) ⋅ + δ O k ( )3 δ 1 2 1 2 k k k f ( ) ) ( x x + ) δ T ⋅ + δ δ □ Remarks: g x ( T + )k (ii) The steepest descent algorithm is a descent algorithm. (iii) The steepest descent algorithm is usually quite slow. f ( = ⋅ ♦ Steepest descent method: dk = f−∇ x , i.e., dk = –g(xk) (i) The concept of a descent algorithm ♦ Newton method: dk = −−H x g x ) ) + ≤ 1 ) ( 1( x x ( ( ) f f k k k k H k 1 I β + β + □ Remarks: (i) if H(xk) = Hk is positive definite for all xk, then the Newton algorithm is a descent algorithm. (ii) if Hk is not positive definite, then the search direction in Newton method is modified to with a β such that H g where ˆ 1ˆ H − k I k β+ 0 = − H = d k k k for k = 0,1,...... The Newton method with such modifications is a descent algorithm. (iii) Typically Newton algorithm converges considerably faster than the steepest descent algorithm at a cost of increased computational intensity. ♦ Quasi-Newton methods: one can take a unifying look at the above two methods as where S k dk = –Skgk I H 1 − k for steepest descent for Newton ⎧ = ⎨ ⎩ □ Quasi-Newton methods use gradient information to generate successive low-rank estimation of the inverse Hessian matrix. In this way, quasi-Newton algorithms provide convergent rates comparable with that of Newton algorithm with reduced complexity. □ Notation: □ Requirements for Sk: (i) Symmetric and positive definite, (ii) □ Davidon-Fletcher-Powell (DFP) updating formula: S0 = I, and S γ δ 1k k + δ k γ k − = = − = x x g g 1 + 1 + , k k k k k S + = 1 k S k + S S T T δδ γγ k k k k S T δ γ γ γ k k k T k − k k k □ Broyden-Fletcher-Goldfarb-Shanno (BFGS) updating formula: S0 = I, and S k 1 + = S k + 1 + ⎛ ⎜ ⎝ k T T T γ γ δδ δγ k k k γ δ γ δ k S T k k T k − ⎞ ⎟ ⎠ k k k T γδ k k k S S + k T γ δ k k 2
1.2 Constrained Optimization: The Karush-Kuhn-Tucker (KKT) Conditions (Chap. 10) • The constrained optimization problem minimize subject to : f a i c j x ( ) x ( ) 0 = x ( ) 0 ≥ , and p for for i j 1,2,..., = 1,2,..., = p q j i j c = ≥ = = a i 1,2,..., q 1,2,..., } x ( ) 0 for x ( ) 0 for x • Feasible region: { : • A point x is said to be feasible if it is in the feasible region (inside or on the boundary). We say an inequality constraint cj(x) ≥ 0 is active at a feasible point x, if cj(x) = 0. And inequality constraint cj(x) ≥ 0 is said to be inactive at a feasible point x if cj(x) > 0. • KKT conditions (first-order necessary conditions for point x* to be a minimum point): If x* is a local minimizer of the constrained problem, then (a) (b) (c) there exist Lagrange multipliers x ( ) 0 for ∗ = x ) 0 for ( ∗ ≥ p 1,2,..., = q 1,2,..., = ≤ ≤ such that for1 for1 ia jc ≤ ≤ and ∗ μ j ∗ λ i i j p q i j ∇ f ∗ ( x ) = ∗ ∇ λ i a i ∗ ( x ) + ∗ μ j ∇ c i ∗ ( x ) p ∑ i 1 = q ∑ j 1 = j j q x ) ∗ = 0 for 0 for j = 1,2,..., 1,2,..., q jc (d) ( μ∗ j (e) μ∗ ≥ Remarks: ♦ Conditions (a) and (b) are merely the constraints imposed in the problem. ♦ Condition (c) says at a local minimizer, the gradient of the objective function is a linear combination of the gradient of the constraint functions, and the coefficients of the combination are Lagrange = multipliers. See Example 10.10 and Fig. 10.8 for illustration. x ♦ Define Lagrangian ( ) ) λμ x ( ) x ( , L − = c f , p q μ j j a λ i i , the equation in (c) can be expressed as Also note that the equality and inequality constraints can be expressed in terms of Lagrangian as ∑ i 1 = ∇ − x ( ) ∑ 1 = x λ μ , ∗ , ∗ j L ∗ ( x ) = 0 xλ L ∗ ( ∇ , = 0 λμ ) , ∇ xμ L ∗ ( , ≤ 0 λμ ) , and ♦ Conditions in (d) are called complementarity conditions. It follows that if the kth inequality constraint is in active at x*, then ∇ x jc ) that are active at x*. ♦ The total number of equations p (from (a)) + n (from (c)) + q (from (d)) are equal to the total number of unknowns { kμ = . In other words, the equation in (c) only contains those terms x λ μ . } ∗ 0 ( , , ∗ ∗ ∗ 1.3 Convex Sets, Convex Functions, and Convex Programming (CP) (Chaps. 2, 10, 12, 13, 14) ♦ A set R is said to be convex if for every pair of points x1, x2 ∈ R and every real with 0 x x belongs to R for all 0 2 1α≤ ≤ . (1 + − x α 1 ) α α = α≤ ≤ , point 1 3
♦ A function f(x) defined over a convex set R is said to be convex if for every pair of points x1, x2 ∈ R and every real with 0 ≤ , α≤ α 1 f x ( α 1 (1 + − ) α x ) ≤ f α ( x 1 ) 2 (1 + − ) α f ( x ) 2 1C∈ . Function f(x) is convex over a convex set R if and only if ♦ Suppose f(x) f ( x 1 ) ≥ f x ( ) + g x ( ) ( T x 1 − x ) for all x and x1 ∈ R. ♦ Using the above property, it can be shown that a function f(x) only if the Hessian H(x) of f(x) is positive semidefinite for x ∈ R. ♦ The problem of minimizing a convex function f(x) (with no constraints) is not very complicated, because the first-order necessary condition becomes a sufficient condition, and any local minimizer of a convex function is a global minimizer. ♦ A constrained minimization problem 2C∈ is convex over a convex set R if anf minimize subject to : f a i c j x ( ) x ( ) 0 = x ( ) 0 ≥ for for i j 1,2,..., = 1,2,..., = p q i = = a i 1,2,..., x ( ) 0 for is said to be a convex programming (CP) problem if the objective function is convex and the feasible x q region defined by the constraints, i.e., { : 1,2,..., } convex set. In words, a CP problem is a problem of minimizing a convex function over a convex region. It x can be shown that the feasible region { : is convex if (i) all equality constraint functions ai(x) are linear, and (ii) all inequality functions –cj(x) are convex. ♦ An important property of the class of CP problems is that the KKT conditions become both necessary and sufficient. This implies that any { (minimizer) x* for the CP problem and any local minimizer of a CP problem is a global minimizer. x λ μ satisfying the KKT conditions gives a local solution ∗ x ( ) 0 for x ( ) 0 for x ( ) 0 for q 1,2,..., } 1,2,..., , and , and is a a i ≥ = = = ≥ = p p } c c j i j , , ∗ ∗ j j 1.4 Wolfe Duality (Chap. 10) Wolfe’s theorem (1991) on duality is concerned with the general CP problem (P) which is referred as the primal problem. Wolfe’s theorem says that if { for the primal problem, then { x λ μ also solves the dual problem which is defined by ∗ } ∗ ∗ − = b i 0 for for i j 1,2,..., = 1,2,..., = x λ μ solve the KKT system ∗ p q , , ∗ subject to : minimize a i c x ( ) a x x ( ) T i x ( ) 0 f = ≥ j , , ∗ maximize x , λμ , } L x ( , ) λμ , (D) subject to : ) λμ , = 0 x L∇ ( , x 0 μ ≥ f ) ( L =x ∗ In addition, ♦ Proof: From the KKT conditions of the primal problem, we obtain For a feasible set { , x λμ , we have ≥ 0μ and x λ μ . ) ∗ , } ( ) , , ∗ ∗ L ( x ∗ , ∗ ∗ λ μ , ) = f ∗ ( x ) ≥ f ∗ ( x ) , f ) ( =x ∗ x λμ , hence L∇ ( , x p ∑ i 4 = 0 q ∑ a λ i i μ j − − x x c 1 = 1 = ( ) ( ) = ∗ ∗ j j L ( x λ μ and ∗ ) , , ∗ ∗ ∗ ≥ 0μ . L ( x ∗ , λμ ) ,
∗ , L ( x x ( , ) L L ( ) ∗ ≥ With ≥ 0μ the Lagrangian x ∗ L L x λμ is convex, hence ( , , x λμ ) , , ) ) ) ( T λμ λμ − + ≥ x x λ μ solves the dual problem. ■ Therefore ( , , ) } λμ , i.e., set { , , ∗ ∗ ∗ x x x f L ♦ If we define the duality gap ( , ( ) , ( , ) λμ , then the KKT conditions imply that δ = − λμ x , ) 0 , , ) and ( , δ ∗ ∗ ∗ λ μ λμ δ = q ∑ ) . This is because for a feasible set ( , x ) 0 for feasible ( , ≥ x ( ) 0 ≥ , and λ μ have ) λμ , ) λμ , ) λμ , λμ a λ i i δ x ( , ∑ L x ( , L x ( , L x ( , ∑ x ( ) x ( ) x ( ) x ( , μ j ∇ μ j c = ∗ x = − ∗ , = c j + = f , , , , x q j 1 = j p i 1 = x λμ , we ) j 1.5 Linear Programming and Quadratic Programming (Chap. 12) 1 = j ( δ x ∗ , ∗ ∗ λ μ , ) = ∗ μ j c ( ∗ x ) 0 = j 1 = q ∑ ♦ Linear programming problems are of great importance in both theoretical and practical perspectives. The modern theory and algorithms of interior-point methods were initially developed for the LP problems in 1980’s and the basic concepts and solution techniques for LP have since been extended to larger classes of CP and non-convex problems. On the other hand, there are a great many of practical problems that can be accurately or approximately modeled as LP problems. ♦ The standard-form linear programming (LP) problem is given by (P) minimize subject to : c x x f ( ) T = Ax b = x ≥ 0 where matrix A is of size p × n with p < n and A is assumed to have full row rank. It is important to realize that LP problems are convex programming problems. Hence the KKT conditions are necessary and sufficient conditions, and any local solution is a global solution. ♦ Moreover, the Wolfe theorem applies: the dual of the above problem is given by (D) maximize subject to : T b h ( ) λ = λ A c T + = λ μ ≥ 0 μ ♦ The KKT conditions in this case are given by ♦ A useful concept is the development of interior-point algorithms for LP is central path. To introduce it we first write the KKT conditions as c + = λ μ A T Ax = b ix 0 μ = i x 0 , ≥ ≥ μ ≤ ≤ i n for 1 0 x with ≥ c with μ ≥ 0 0 Ax = b A T X λ μ + = 0 = μ 5
where X = diag{x1, x2, …, xn}. The central path for a standard-form LP is defined by a set of vectors x λ μ that satisfy { ( ), ( )} τ τ τ ( ), (C) Ax = b A T X + = λ μ e τ = μ x with > c with > μ with τ > 0 0 0, e = [1 1 1] T The duality gap on the central path is found to be [ ] c x x ( ) ( ), ( ) T τ = δ τ τ λ − x A ( ) ( ) ( ) T T ⎡ ⎤ = μ λ + τ τ τ ⎦ ⎣ x n ( ) ( ) T μ τ τ τ = = b T − ( ) λ τ b T λ ( ) τ We see that duality gap approaches zero as parameter τ approaches zero. In other words, the central path defines a parameterized trajectory approaching to the solution of both primal and dual problems as τapproaches zero. ♦ Below we describe a primal-dual path-following algorithm for LP problems. The idea behind the x λ μ and use first-order algorithm is to start with a strictly feasible initial set of vectors { , 0 perturbations of the equations (C) to obtain a set of increment vectors to update point wk. So in the kth iteration we seek =w } , 0 0 0 , w =δ δ δ δ such that the next iterate , { , } x λ μ x w { = } { = } δ λ δ μ δ μ remains strictly feasible and approaches the central path defined by (C) with system of linear equations as follows λ μ k , kλ + + + x 1 + 1 + 1 + 1 + , , k k k k x k 1kτ τ += . This leads (C) to a A δ x A T δ δ = μ M X e X − δ δ μ τ + k 1 0 = + λ + 0 = x μ k where ♦ Additional details: M = diag{( μ μ k ) ,( 1 k ) , 2 ,( μ ) } k n w + = 1k w k + δ k wα where kα is obtained by a line-search step to ensure the strict feasibility of point wk+1. □ The value of parameter 1kτ + is determine by xμ T k k n + ρ + = 1 τ k with ρ > n □ An interior-point algorithm with nonfeasible initialization cab be developed in a similar way, see Sec. 12.5.2. ♦ LP problems may be formulated and solved in an alternative-form which are equivalent to LP problems in the standard form: minimize subject to : c x x f ( ) T = b Ax ≥ Counterpart algorithms for LP problems in alternative-form can also be developed. 6
1.6 Semidefinite Programming (SDP) (Chap. 14) ♦ SDP is a class of CP problems that includes LP and convex quadratic programming problems as its subclasses and a lot more other CP problems of practical usefulness, 1990’s have witnessed intensive research interest in SDP that has resulted in many efficient solution methods and software tools for SDP problems. ♦ Notation and definitions We denote the set of n × n symmetric matrices by S n. For A, B ∈ S n, the inner product of A with B is defined by trace( A B ⋅ = AB ) n n = ∑∑ a b ij ij i 1 = j 1 = Matrix A being positive semidefinite is signified as A 0, and A being positive definite is denoted by A 0. ♦ The primal SDP problem is defined as minimize subject to : (P) C X ⋅ A X ⋅ i X = 0 ib for i = 1,2,..., p where C, X, and Ai are members of S n. Note that if C = diag{c}, Ai = diag{ai}, and x = diag{X} for some vectors c and ai, then the SDP problem becomes a standard LP problem: minimize subject to : c x x f ( ) T = b Ax = x ≥ 0 where A is a matrix with ♦ The dual SDP problem assumes the form T ia as its i-th row and b = [b1 b2 … bp]T. maximize (D) subject to : b y T p ∑ 1 = S i 0 y i A S C i + = By eliminating S and making simple variable changes, we can write the dual problem as (D’) minimize subject to : c x T F x ( ) 0 where ♦ The KKT conditions for SDP F x ( ) = F 0 p +∑ i 1 = x i F i A S C i + = p y i ∑ i 1 = A X = ⋅ i SX 0 = X 0 , b i for i = 1,2.,..., p S 0 7
♦ An example: convex quadratic programming (QP) problems The general convex QP problems can be expressed as minimize subject to : x Hx T Ax ≥ p x T+ b with H 0 which is equivalent to minimize subject to : where, by using a decomposition of H H H , the first constraint can be expressed as = T which is equivalent to δ− )T In addition, the second constraint can be written as ( G ( , δ x ) δ x Hx T b Ax ≥ + p x T ≤ δ ) 0 ≥ ( − ) ( T p x Hx Hx T Hx p x T − δ I Hx ⎡ = ⎢ ⎣ 0 ⎤ ⎥ ⎦ F x ( ) = F 0 x i F i 0 n +∑ i 1 = δ⎡ ⎤ = ⎢ ⎥ x ⎣ ⎦ c x T E x ( ) 0 minimize subject to : E x ( ) diag{ ( , δ G where F0 = –diag{b}, and Fi = diag{ai} with ai being the i-th column of A. By defining an augmented vector x , the convex QP problem is now formulated as the SDP problem c where x F x . ), ( )} 1.7 Second-Order Cone Programming (SOCP) (Chap. 14) ] 0 , and [ 1 0 = = T ♦ The class of SOCP problem is a subset of the class of SDP problems but it is still large enough to include LP, QP, and many other CP problems of practical importance. Moreover, interior-point algorithms that are specifically designed for SOCP is considerably more efficient than what the best an SDP algorithm can offer. ♦ Notation and definitions □ A second-order cone (also known as quadratic cone or Lorentz cone) of dimension n is defined as K = : t R ∈ , u ∈ R n 1 − for u ≤ t t ⎡ ⎤ ⎢ ⎥ u ⎣ ⎦ ⎧ ⎨ ⎩ ⎫ ⎬ ⎭ □ Examples: n = 1: a ray on the t-axis starting at t = 0, for n = 2 and 3 see Fig. 14.3. □ Note that the second-order cone is convex in Rn. ♦ The primal SOCP problem is given by minimize q ∑ i 1 = q c x T i i A x i i iK (P) subject to: ∑ i 1 = x ∈ i = b for i = 1,2,..., q 8
分享到:
收藏