Matrix Calculus:Derivation and Simple Application HU Pili.pdf-资料库

qq_38277863-10123024-4744302542945770609.pdf-第1页.png

第1页 / 共30页

qq_38277863-10123024-4744302542945770609.pdf-第2页.png

第2页 / 共30页

qq_38277863-10123024-4744302542945770609.pdf-第3页.png

第3页 / 共30页

qq_38277863-10123024-4744302542945770609.pdf-第4页.png

第4页 / 共30页

qq_38277863-10123024-4744302542945770609.pdf-第5页.png

第5页 / 共30页

qq_38277863-10123024-4744302542945770609.pdf-第6页.png

第6页 / 共30页

qq_38277863-10123024-4744302542945770609.pdf-第7页.png

第7页 / 共30页

qq_38277863-10123024-4744302542945770609.pdf-第8页.png

第8页 / 共30页

Matrix Calculus: Derivation and Simple Application HU, Pili∗ March 30, 2012† Abstract Matrix Calculus[3] is a very useful tool in many engineering prob- lems. Basic rules of matrix calculus are nothing more than ordinary calculus rules covered in undergraduate courses. However, using ma- trix calculus, the derivation process is more compact. This document is adapted from the notes of a course the author recently attends. It builds matrix calculus from scratch. Only prerequisites are basic cal- culus notions and linear algebra operation. To get a quick executive guide, please refer to the cheat sheet in section(4). To see how matrix calculus simplify the process of derivation, please refer to the application in section(3.4). ∗hupili [at] ie [dot] cuhk [dot] edu [dot] hk †Last compile:April 24, 2012 1

HU, Pili Contents 1 Introductory Example 2 Derivation Matrix Calculus 3 4 4 4 5 6 7 8 9 10 11 13 15 16 16 18 20 21 24 24 24 25 25 25 27 28 28 29 2.1 Organization of Elements . . . . . . . . . . . . . . . . . . . . 2.2 Deal with Inner Product . . . . . . . . . . . . . . . . . . . . . 2.3 Properties of Trace . . . . . . . . . . . . . . . . . . . . . . . . 2.4 Deal with Generalized Inner Product . . . . . . . . . . . . . . 2.5 Deﬁne Matrix Diﬀerential . . . . . . . . . . . . . . . . . . . . 2.6 Matrix Diﬀerential Properties . . . . . . . . . . . . . . . . . . 2.7 Schema of Hanlding Scalar Function . . . . . . . . . . . . . . 2.8 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.9 Vector Function and Vector Variable . . . . . . . . . . . . . . 2.10 Vector Function Diﬀerential . . . . . . . . . . . . . . . . . . . 2.11 Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Application 3.1 The 2nd Induced Norm of Matrix . . . . . . . . . . . . . . . . 3.2 General Multivaraite Gaussian Distribution . . . . . . . . . . 3.3 Maximum Likelihood Estimation of Gaussian . . . . . . . . . 3.4 Least Square Error Inference: a Comparison . . . . . . . . . . 4 Cheat Sheet 4.1 Deﬁnition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Schema for Scalar Function . . . . . . . . . . . . . . . . . . . 4.3 Schema for Vector Function . . . . . . . . . . . . . . . . . . . 4.4 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Frequently Used Formula . . . . . . . . . . . . . . . . . . . . 4.6 Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . Acknowledgements References Appendix 2

HU, Pili Matrix Calculus 1 Introductory Example We start with an one variable linear function: f (x) = ax To be coherent, we abuse the partial derivative notation: (1) (2) (3) (5) Extending this function to be multivariate, we have: f (x) = aixi = aTx Where a = [a1, a2, . . . , an]T and x = [x1, x2, . . . , xn]T. We ﬁrst compute partial derivatives directly: ∂f ∂xk i aixi) ∂xk = ak (4) for all k = 1, 2, . . . , n. Then we organize n partial derivatives in the following way: = a ∂f ∂x i ∂( =  ∂f ∂x1 ∂f ∂x2 ... ∂f ∂xn  =  = a a1 a2 ... an ∂f ∂x = The ﬁrst equality is by proper deﬁnition and the rest roots from ordinary calculus rules. Eqn(5) is analogous to eqn(2), except the variable changes from a scalar to a vector. Thus we want to directly claim the result of eqn(5) without those intermediate steps solving for partial derivatives separately. Actually, we’ll see soon that eqn(5) plays a core role in matrix calculus. Following sections are organized as follows: • Section(2) builds commonly used matrix calculus rules from ordinary calculus and linear algebra. Necessary and important properties of lin- ear algebra is also proved along the way. This section is not organized afterhand. All results are proved when we need them. • Section(3) shows some applications using matrix calculus. Table(1) shows the relation between Section(2) and Section(3). • Section(4) concludes a cheat sheet of matrix calculus. Note that this cheat sheet may be diﬀerent from others. Users need to ﬁgure out some basic deﬁnitions before applying the rules. 3

HU, Pili Matrix Calculus Table 1: Derivation and Application Correspondance Derivation Application 2.1-2.7 2.9,2.10 2.8,2.11 3.1 3.2 3.3 2 Derivation 2.1 Organization of Elements From the introductary example, we already see that matrix calculus does not distinguish from ordinary calculus by fundamental rules. However, with better organization of elements and proving useful properties, we can sim- plify the derivation process in real problems. The author would like to adopt the following deﬁnition: Deﬁnition 1. For a scalar valued function f (x), the result size with x. That is ∂f ∂x has the same (6)  ∂f ∂x = ∂f ∂x11 ∂f ∂x21 ... ∂f ∂xm1 ∂f ∂x12 ∂f ∂x22 ... ∂f ∂xm2 . . . . . . . . . . . . ∂f ∂x1n ∂f ∂x2n ... ∂f ∂xmn  ∂f ∂x In eqn(2), x is a 1-by-1 matrix and the result = a is also a 1-by-1 matrix. In eqn(5), x is a column vector(known as n-by-1 matrix) and the result ∂f ∂x = a has the same size. Example 1. By this deﬁnition, we have: ∂f ∂xT = ( ∂f ∂x )T = aT (7) Note that we only use the organization deﬁnition in this example. Later we’ll show that with some matrix properties, this formula can be derived without using as a bridge. ∂f ∂x 2.2 Deal with Inner Product Theorem 1. If there’s a multivariate scalar function f (x) = aTx, we have ∂f ∂x = a. 4

HU, Pili Matrix Calculus Proof. See introductary example. Thus, Since aTx is scalar, we can write it equivalently as the trace of its own. Proposition 2. If there’s a multivariate scalar function f (x) = TraTx, we have = a. ∂f ∂x Tr [•] is the operator to sum up diagonal elements of a matrix. In the next section, we’ll explore more properties of trace. As long as we can transform our target function into the form of theorem(1) or proposition(2), the result can be written out directly. Notice in proposition(2), a and x are both vectors. We’ll show later as long as their sizes agree, it holds for matrix a and x. 2.3 Properties of Trace Deﬁnition 2. Trace of square matrix is deﬁned as: Tr [A] = i Aii Example 2. Using deﬁnition(1,2), it is very easy to show: ∂Tr [A] ∂A = I (8) since only diagonal elements are kept by the trace operator. Theorem 3. Matrix trace has the following properties: • (1) Tr [A + B] = Tr [A] + Tr [B] • (2) Tr [cA] = cTr [A] • (3) Tr [AB] = Tr [BA] • (4) Tr [A1A2 . . . An] = Tr [AnA1 . . . An−1] • (5) TrATB = • (6) Tr [A] = TrAT i j AijBij where A, B are matrices with proper sizes, and c is a scalar value. Proof. See wikipedia [5] for the proof. Here we explain the intuitions behind each property to make it eas- ier to remenber. Property(1) and property(2) shows the linearity of trace. Property(3) means two matrices’ multiplication inside a the trace operator is commutative. Note that the matrix multiplication without trace is not commutative and the commutative property inside the trace does not hold 5

HU, Pili Matrix Calculus for more than 2 matrices. Property (4) is the proposition of property (3) by considering A1A2 . . . An−1 as a whole. It is known as cyclic property, so that you can rotate the matrices inside a trace operator. Property (5) shows a way to express the sum of element by element product using matrix product and trace. Note that inner product of two vectors is also the sum of ele- ment by element product. Property (5) resembles the vector inner product by form(ATB). The author regards property (5) as the extension of inner product to matrices(Generalized Inner Product). 2.4 Deal with Generalized Inner Product Theorem 4. If there’s a multivariate scalar function f (x) = TrATx, we have ∂f ∂x = A. (A, x can be matrices). Proof. Using property (5) of trace, we can write f as: f (x) = TrATx = Aijxij ∂( ij ij Aijxij) ∂xij = Aij It’s easy to show: ∂f ∂xij = Organize elements using deﬁnition(1), it is proved. With this theorem and properties of trace we revisit example(1). Example 3. For vector a, x and function f (x) = aTx = (f is scalar) = (property(3)) = (property(6)) = (property of transpose) = ∂f ∂xT ∂(aTx) ∂xT ∂(TraTx) ∂(TrxaT) ∂(TraxT) ∂(Tr(aT)TxT) ∂xT ∂xT ∂xT ∂xT (theorem(4)) = aT (9) (10) (11) (12) (13) (14) (15) (16) (17) The result is the same with example(1), where we used the basic deﬁnition. The above example actually demonstrates the usual way of handling a matrix derivative problem. 6

HU, Pili Matrix Calculus 2.5 Deﬁne Matrix Diﬀerential Although we want matrix derivative at most time, it turns out matrix diﬀer- ential is easier to operate due to the form invariance property of diﬀerential. Matrix diﬀerential inherit this property as a natural consequence of the fol- lowing deﬁnition. Deﬁnition 3. Deﬁne matrix diﬀerential:  dA11 dA21 ... dA12 dA22 ... dAm1 dAm2  . . . dA1n . . . dA2n . . . . . . dAmn ... (18) dA = Theorem 5. Diﬀerential operator is distributive through trace operator: dTr [A] = Tr [dA] Proof. LHS = d( Aii) = i  dA11 dA21 ... i i dAii dA12 dA22 ... . . . dA1n . . . dA2n . . . . . . dAmn ...  (19) (20) (21) RHS = Tr dAm1 dAm2 = dAii = LHS Now that matrix diﬀerential is well deﬁned, we want to relate it back to matrix derivative. The scalar version diﬀerential and derivative can be related as follows: df = dx (22) ∂f ∂x So far, we’re dealing with scalar function f and matrix variable x. and dx are both matrix according to deﬁnition. In order to make the quantities in eqn(22) equal, we must ﬁgure out a way to make the RHS a scalar. It’s not surprising that trace is what we want. ∂f ∂x Theorem 6. df = Tr ( ∂f ∂x )Tdx for scalar function f and arbitrarily sized x. 7 (23)

HU, Pili Proof. (deﬁnition of scalar diﬀerential) = LHS = df Matrix Calculus RHS = Tr ( )Tdx ij ij ij ( ij ∂f ∂xij dxij ∂f ∂x ∂f ∂x )ij(dx)ij ( ∂f ∂x )ijdxij ∂f ∂xij dxij = LHS (trace property (5)) = (deﬁnition(3)) = (deﬁnition(1)) = (24) (25) (26) (27) (28) (29) (30) (31) (32) (33) (34) (35) Theorem(6) is the bridge between matrix derivative and matrix diﬀer- ential. We’ll see in later applications that matrix diﬀerential is more con- venient to manipulate. After certain manipulation we can get the form of theorem(6). Then we can directly write out matrix derivative using this theorem. 2.6 Matrix Diﬀerential Properties Theorem 7. We claim the following properties of matrix diﬀerential: • d(cA) = cdA • d(A + B) = dA + dB • d(AB) = dAB + AdB Proof. They’re all natural consequences given the deﬁnition(3). We only show the 3rd one in this document. Note that the equivalence holds if LHS and RHS are equivalent element by element. We consider the (ij)-th element. k k = = LHSij = d( AikBkj) (dAikBkj + AikdBkj) RHSij = (dAB)ij + (AdB)ij dAikBkj + AikdBkj k = LHSij k 8

资料库

Matrix Calculus:Derivation and Simple Application HU Pili.pdf

相关推荐

人工智能

热门标签

最新资料