logo资料库

nonlinear optimization.pdf

第1页 / 共21页
第2页 / 共21页
第3页 / 共21页
第4页 / 共21页
第5页 / 共21页
第6页 / 共21页
第7页 / 共21页
第8页 / 共21页
资料共21页,剩余部分请下载后查看
CHAPTER 6 NONLINEAR OPTIMIZATION Previous chapters dealt with linear optimization, specifically linear programming and linear integer programming. This chapter considers optimization problems that include a nonlinear objective and/or nonlinear constraints, where the decision variables are real continuous variables and both the objective and constraint functions are continuous and differentiable. The linear programming problem is a special case of this problem. We first discuss the unconstrained optimization problem, where we optimize an objective function without any constraints and without any sign restrictions for the decision variables. Then we will consider constraints restricting the variable values. These constraints may be either strict equality constraints or inequality constraints. We will also consider sign restrictions for variables. These problems are fundamentally different from the linear programming problem. First, unlike linear programming, there is no systematic procedure that can solve all nonlinear programming problems. Second, there may be local extremum points, which may not necessarily be global solutions to the problem. Third, local optima may occur on the boundary of the decision space or they may be interior points. Therefore, the convenience of searching an optimum solution among the corner solutions of a feasible polytope, as in linear programming, is lost immediately. This makes nonlinear programming problems much harder to solve than linear programming problems. Here we will develop only the optimality conditions for these problems and use those conditions to solve small scale example problems. In the following chapters we will develop nonlinear market equilibrium models and risk programming models. There the nonlinear programming optimality conditions will be utilized for understanding the properties of optimum solutions and explaining why those models are appropriate tools for the purpose. Nonlinear programming solution algorithms will not be covered. 6.1. Unconstrained optimization This is the simplest nonlinear optimization problem. Suppose we have a function f: which assigns a real number f(x) to each element decision variables. When n=1, we have the univariate optimization problem, which can be stated as Maximize f(x). Suppose x* is a solution of this problem. This means that . Here x=(x1 , x2, ...,xn) is a vector of n or for any xx*. For xx* we have arguments hold for a local minimum as well. Therefore, we obtain the first order necessary condition for a local extremum: . These two conditions imply that . The same 6.1 nnx)x*()x(ff-*0f(x)f(x)0**)()(xxxfxf0*)(0**)()(lim*xfxxxfxfxx0*)(xf0=′*)x(f
. Theorem: If a differentiable function attains its local minimum or local maximum at a point x*, then we have Note that this is only a necessary condition since both for minimum and maximum we have the same condition. In order to determine the nature of a critical point, i.e., x* where , we need to look at the second order derivative. The sufficiency condition is obtained from Taylor’s theorem stated below: Theorem: Suppose f is a twice differentiable function around a point x*. Then we can express the function as a polynomial as follows: is determined by the sign of the second derivative (here note that is an error term which goes to zero faster than the rate x approaches to x*. where If x* is a critical point, the term including the first derivative vanishes, in which case the sign of the difference both ½! and (x-x*)2 are positive, and the second derivative term dominates the error term since  goes to zero faster than x approaches to x*). Therefore, if , we have a local maximum, we have a local minimum. This gives us the second order sufficiency condition: and if Theorem: If f is a twice differentiable function and maximum occurs at x*. If Note that we have strictly positive or strictly negative second derivatives in the above theorem. If the second order derivative at x* is zero, we need to look at higher order derivatives. The above theorem is extended as follows: Theorem: If f is a differentiable function of order n or higher and n is the first higher order derivative with at a critical point x*, then a local maximum or local minimum occurs , then a local minimum occurs at x*. at a critical point x*, then a local , then a local minimum occurs at x* and if at x* if n is even. If maximum occurs at x*. If n is odd, then x* is a saddle (or inflection) point, that is neither a maximum nor a minimum occurs at x*. Example: Suppose f(x) = x2 - 6x +1. Then point. Example: f(x) = x2 - 2ln(x). namely x=1 and x= -1. Ignore x= -1 because the logarithm function is not defined at that point. at any x, in particular at x=3, therefore x=3 is a local minimum. = 2x - 6 = 0 implies x*=3. This is a critical = 2x - 2/x = 0, or x2=1. Therefore there are two critical points, a local =2+2/x2 = 4 at x=1 . Therefore, this point is local minimum. 6.2 0=′*)x(f0=′*)x(ff(x)f(x*)0<′*)x(f(*)0fx0<′*)x(f0>′*)x(f0≠*)x(fn(*)0nfx(*)0nfx*)x(f′2=′*)x(f*)x(f′*)x(f′2*)*)((''!21 *)*)((' *)( )(xxxfxxxfxfxf
implies 4x3 - 3x2 = 0 or x=0 or x=3/4. The second Example: f(x) = x4 - x3 + 2. derivative is 12x2 - 6x which is positive at x=3/4. Therefore, there is a local minimum at x=3/4. However, at x=0, the second derivative is also zero. Therefore, we cannot determine the nature of the critical point x=0 by using the second derivative test. The third derivative is 24x - 6 = -6 at x=0. Since the first higher order derivative which is not zero is an odd order derivative, we have a point of inflection at x=0. can be determined. The same arguments can be extended to functions of several variables. The only difference is that here we deal with partial derivatives instead of the ordinary univariate derivative. This problem can be approached as follows. Suppose f: is a multi-variate function, and x*=(x*1 , .., x*i , ..., x*n) is an optimum solution. Then function of one variable, namely xi, where all variables but xi are fixed at their optimum values, i.e. The above theorem and examples illustrate how a local extremum of a univariate function for all is a . Then the optimum value of h must be obtained at , therefore . However, is nothing but the partial derivative of f with respect to xi , i.e. . This is true for all i. Therefore, the first order necessary optimality condition in the case of multivariate functions is as follows: Theorem: If is a differentiable function (i.e., first order partial derivatives exist and continuous) and there is a local extremum at x*, then for all i. The above theorem can also be stated in a more compact form as vector of first order partial derivatives (called gradient) of f computed at x*. In a similar way to the univariate case, the second order sufficiency conditions can be derived from Taylor’s theorem. This requires second order partial derivatives and the Hessian of f at x*. The latter is a symmetric matrix formed by the second order partial derivatives of f, i.e., , where is the . Theorem: A twice differentiable function can be represented as a polynomial around 6.3 ()0fxn**1()(,... ,.., )iinhxfxxx*jjxxij≠*ix0=*)x('hi'hxiifh'(x*)(*)x:nf0=x∂∂*)(xfi0(*)x(*)x221212212............)(nnnxfxxfxxfxfxH:nf
a point x* as follows: , which means ‘approximately equal to’. can be made arbitrarily small by selecting x sufficiently close to x*. is the error term. We can state the above theorem without this term by replacing where The symbol the strict equality with Suppose x* is a critical point. Therefore, by the previous theorem the gradient term theorem vanishes at x*. Then, the sign of f(x) - f(x*) is determined by the quadratic form defined by the Hessian H. If H is a negative definite matrix, then the quadratic form is negative which means that f(x) - f(x*) <0, or x* is a local maximum. If H is positive definite then x* is a local minimum. If H(x*) is indefinite (in this case we can find different values of y for which the sign of y′Hy can be negative, zero, or positive, where y=x-x*). This leads to the second order sufficiency test: Theorem: Suppose x* is a critical point of a twice differentiable function is negative (positive) definite, then a local maximum (minimum) occurs at x* . If H(x*) is negative (positive) semi-definite in a neighborhood of x*, again a local maximum (minimum) occurs at x*. If H(x*) is indefinite, then x* is a saddle point. Note the difference between the two statements in the above theorem. In the first case, we test the strict definiteness of the function at the critical point x*. In the second case we test the semi- definiteness not only at the critical point but also at any point in a small neighborhood of it. Although the second condition is weaker, it has to be satisfied in an open set around x*, therefore it may not be as practical as it seems. Example: Suppose f(x,y,z)=x2 - xy + y2 - 3y + 2z2 -z. The first order optimality conditions are: The solution of the above linear system is x=1, y=2, z=1/4. To determine the nature of this critical point, form the Hessian and determine its definiteness. The Hessian at (1,2,1/4) (in general at any point) is : 2x - y =0 -x + 2y -3 = 0 4z -1 =0 . If H(x*) which is positive definite because all the leading principal minors of order 1, 2 and 3 are positive (they are 2, 3, and 12, respectively). Therefore, a minimum occurs at this point. 6.4 *)*)((**)(21*)*)((*)()(xxxxHxxxxxxxfffε:nf400021012H
Example: f(x,y) = 4x2 - 4xy + 2y2. The first order optimality conditions are: the two conditions solved simultaneously give x=0 and y=0 as the only critical point. To determine the nature of this critical point, form the Hessian and determine its definiteness. The Hessian at (0,0) (in general at any point) is : 8x - 4y =0 -4x + 4y = 0 which is positive definite because the leading principal minors of order 1 and 2 are 8 and 16, which are both positive. Therefore, a minimum occurs at this point. In these two simple problems, the first order conditions are given by linear equations because the objective functions are quadratic. When a non-quadratic objective function is involved, in general a nonlinear set of equations arises and solving that set of equations may be a serious problem itself. Nonlinear programming algorithms find the optimal solution (if exists and unique) using iterative procedures, such as steepest ascend and steepest descend methods used to solve a relative maximum or a relative minimum. This method is based on the concept of directional derivative. Definition: The directional derivative of f at a given point a in the direction of a unit vector u is defined by Example: Suppose . To determine the directional derivative at (0,0,0) in the direction of v=(1,-1,1), we first find the unit vector which has the same direction as the given and denoted by Duf(a). vector, (1,-1,1). Since , the unit vector we need is . The gradient of at (0,0,0) is (1,1,1). Therefore the directional derivative is: This means that a slight movement (displacement) by tu from the point a in the direction of u creates a change in the function value by about (0.577)t. For instance, if t=0.1, i.e. for the displacement , the change in the function value is approximately 0.0577. Theorem: The maximum change occurs in the value of a function f when we move away from a point a in the direction of the gradient vector (i.e. the directional derivative is maximum when u 6.5 4448H()f.auxyzf(x,y,z)e31)1(1222v)31,31,31(uxyzf(x,y,z)e577.031)31,31,31).(1,1,1(.uf)31.0,31.0,31.0(ut
. = The above result is a direct consequence of an analytic geometry theorem which states that u.v = , where is maximum when , the scalar product is largest if the two vectors are aligned. Therefore, the directional is the angle between vectors u and v. Since derivative will be maximum if the gradient vector and the direction vector have the same direction (or the angle between them is zero). This result is particularly important in optimization because it tells us that we would move in the direction of the gradient vector at any point in order to achieve the highest gain in the value of the function that we want to maximize (we choose the negative gradient direction if we want to minimize the function value). This result is the basis of the steepest descend (and ascend) algorithms. In light of the above theorem, the steepest descent algorithm (for minimization problems; if the problem is a maximization problem we use the steepest ascend algorithm which uses the gradient direction instead of the negative gradient direction used in the discussion below) is described as follows. Starting with a given initial point x0 , determine the negative gradient direction and move in that direction by an appropriate step size t so that the function value is decreased at maximum amount. That yields a new point, say x1 . Repeat the process at x1 , i.e. determine the negative gradient direction and move in that direction by an appropriate step size t so that the function value is decreased at maximum amount. This yields a new point, say x2 . Iterate this way until the function value cannot be decreased further or the change that can be obtained by another iteration is negligibly small. The points generated in two successive steps of the algorithm can be described by the iterative equation: xk+1=xk - t where t is the step size at iteration k. The step size t at any step is determined by solving the following univariate optimization problem: Minimize { (t) = f[xk - t ] } The last point generated by this procedure is a relative minimum. For a relative maximum, proceed in the gradient direction, rather than the negative gradient direction, and use maximize instead of minimize in the above expression. Here we will illustrate the workings of this algorithmic procedure using the second example given above. Example: f(x,y) = 4x2 - 4xy + 2y2. The gradient vector is (8x - 4y, - 4x + 4y). Let’s use x0=(2,3) as the initial point (that is selected arbitrarily). At this point the gradient is (4,4). Therefore we move in the direction of (-4,-4) to obtain x1=(2,3) + t(-4,-4)=(2-4t,3-4t). The optimum value of t is obtained from the following minimization: 6.6 f()aθosv.uCθθosC0=θ0-f()x1-f()xkf()xφkf()x
=(0,1)-t(-4,4) Minimize =(2,3) -1/2(4,4)=(0,1). (t) =f(2-4 t, 3-4 t)=4(2-4 t)2 - 4 (2-4 t)(3-4 t) + 2 (3-4 t) 2 Equating the first derivative with respect to t to zero and solving for t, we get -16(2-4 t)=0. This gives t=1/2 (note that this t value is a global minimizer since the second order derivative is 64>0). Therefore, the new point is x1 = x0 -t At x1=(0,1), the gradient is (-4,4). Therefore, the new point will be x2=x1 - t = (4t,1-4t). The optimum step size t is solved from : which gives t=1/10=0.1 (again note that the second order condition is met). Therefore, the new point is x2=(0,1)-0.1(-4,4) = (0.4, 0.6). At this point the gradient is (0.8, 0.8). Therefore, we have: x3 = x2 - solved from: which gives t=1/2 and therefore x3 = (0, 0.2). By these iterative steps we get closer and closer to (0,0), which is the global optimum solution of the problem. For instance if we had done one more iteration, the gradient direction would be (-0.8, 0.8) and we would get t=0.01 and therefore x4= (0.008, 0.192). The first three points obtained by the steepest descend algorithm are shown in the figure below: =(0.4, 0.6) - t3 (0.8, 0.8) = (0.4 - 0.8 t3, 0.6 -0.8 t3). The optimum value of t3 is (t) =f(4t,1-4t))=4(4 t )2 - 4 (4 t)(1-4 t) + 2 (1-4 t)2 Minimize ( t) =f(0.4 - 0.8 t, 0.6 -0.8 t) Minimize 6.7 φ0()fx1()fxφ2tf()xφX0=(2,3)X1=(0,1)X2=(0.4,0.6)X3=(0, 0.2)
As may have been noticed, the direction vectors are perpendicular to each other. This is not a coincidence. The steepest descend (or ascend) method always progresses in this way, i.e. always finds perpendicular direction vectors to the previous direction vector. Exercise: Solve the problem Maximize ( optimality conditions and then the steepest ascend algorithm (note that the problem is now maximize, therefore use the gradient direction for determining a new point when approaching the ) using the first order and second order optimal solution). The first approach should result in . When using the steepest ascend algorithm, start with and generate the next five points [Answer: , , , , , ] 6.2 Optimization with strict equality constraints Nonlinear constrained optimization has fundamentally different properties than linear programming in that the optimum solution can be an interior point or a boundary point. Even if the constraints are linear, an optimum solution can occur at a boundary point which may not necessarily be a corner solution (vertex). Therefore, as mentioned at the outset, the convenience of linear programming, namely searching only the corner solutions and finding the best one, is lost immediately. This is explained in the figure below. Suppose the solution space is the set of points satisfying g(x)≤b, which are represented by all points below the graph of g in the figure. Consider a given value of the objective function c and the set of all points x such that f(x)=c. This set is called a level curve or an isoquant (such as an isoutility curve or a production or cost isoquant). Three such level curves are shown in the figure. Suppose the nested closed curves indicate higher isoquants and suppose we want to maximize f(x). Thus, the optimum solution of the problem would be the interior point shown by x1*. At a point like this all directions are feasible (if the step size is small enough, i.e. we stay within the feasible region by taking a small displacement vector in any given direction), and therefore the directional derivatives must all be zero. In particular the partial derivatives must all be zero. However, the isoquants may be such that the optimum point can be on the boundary of the feasible region, as indicated by the dashed isoquants and the point x2*. Max f(x) s.t. 6.8 22xxyyy12*(,)33x0(1,1)x0(1,1)x11(,1)2x213(,)24x333(,)84x4311(,)816x51311(,)3216xg()bx
分享到:
收藏