PRML课后习题及答案.pdf

发布时间：2022-05-31 发布人：admin 分类：说明书资料大小：2.14M 资料格式：pdf 举报版权申诉

fan807017570-10531723-bishop_prml_exsol.pdf-第1页.png

第1页 / 共254页

fan807017570-10531723-bishop_prml_exsol.pdf-第2页.png

第2页 / 共254页

fan807017570-10531723-bishop_prml_exsol.pdf-第3页.png

第3页 / 共254页

fan807017570-10531723-bishop_prml_exsol.pdf-第4页.png

第4页 / 共254页

fan807017570-10531723-bishop_prml_exsol.pdf-第5页.png

第5页 / 共254页

fan807017570-10531723-bishop_prml_exsol.pdf-第6页.png

第6页 / 共254页

fan807017570-10531723-bishop_prml_exsol.pdf-第7页.png

第7页 / 共254页

fan807017570-10531723-bishop_prml_exsol.pdf-第8页.png

第8页 / 共254页

文本预览

Pattern Recognition and Machine Learning Solutions to the Exercises: Tutors’ Edition Markus Svens´en and Christopher M. Bishop Copyright c! 2002–2008 This is the solutions manual (Tutors’ Edition) for the book Pattern Recognition and Machine Learning (PRML; published by Springer in 2006). This release was created March 20, 2008. Any future releases (e.g. with corrections to errors) will be announced on the PRML web-site (see below) and published via Springer. PLEASE DO NOT DISTRIBUTE Most of the solutions in this manual are intended as a resource for tutors teaching courses based on PRML and the value of this resource would be greatly diminished if was to become generally available. All tutors who want a copy should contact Springer directly. The authors would like to express their gratitude to the various people who have provided feedback on pre-releases of this document. The authors welcome all comments, questions and suggestions about the solutions as well as reports on (potential) errors in text or formulae in this document; please send any such feedback to Further information about PRML is available from prml-fb@microsoft.com http://research.microsoft.com/∼cmbishop/PRML

Contents Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . Chapter 1: Pattern Recognition . . . . . . . . . . . Chapter 2: Density Estimation . . Chapter 3: Linear Models for Regression . . . . . . . . . . . Chapter 4: Linear Models for Classiﬁcation . Chapter 5: Neural Networks . . . . . Chapter 6: Kernel Methods . . . . . . Chapter 7: Sparse Kernel Machines . . Chapter 8: Probabilistic Graphical Models . . . . . Chapter 9: Mixture Models . . . . . . . . . Chapter 10: Variational Inference and EM . . . . . . Chapter 11: Sampling Methods . . . . . . . . . Chapter 12: Latent Variables Chapter 13: Sequential Data Chapter 14: Combining Models . . . . . . . . . 5 7 . . . . . . . . . . . . 28 . . . . . . . . 62 . . . . . . . 78 . . . . . . . . . . . . . 93 . . . . . . . . . . . . 114 . . . . . . . . . . . . . . 128 . 136 . . . . . . . . . . . 149 . . . . . . . . . . . 163 . . . . . . 198 . . . . . . . . . . . . 207 . . . . . . . . . . . 223 . . . . . . 246 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

6 CONTENTS

Solutions 1.1–1.4 7 Chapter 1 Pattern Recognition 1.1 Substituting (1.1) into (1.2) and then differentiating with respect to wi we obtain N!n=1" M!j=0 n − tn# xi wjxj n = 0. (1) Re-arranging terms then gives the required result. 1.2 For the regularized sum-of-squares error function given by (1.4) the corresponding linear equations are again obtained by differentiation, and take the same form as (1.122), but with Aij replaced by$Aij, given by $Aij = Aij + λIij. 1.3 Let us denote apples, oranges and limes by a, o and l respectively. The marginal probability of selecting an apple is given by p(a) = p(a|r)p(r) + p(a|b)p(b) + p(a|g)p(g) 3 10 × 0.6 = 0.34 3 10 × 0.2 + 1 2 × 0.2 + = (3) where the conditional probabilities are obtained from the proportions of apples in each box. To ﬁnd the probability that the box was green, given that the fruit we selected was an orange, we can use Bayes’ theorem (2) (4) (5) (6) p(g|o) = p(o|g)p(g) p(o) . The denominator in (4) is given by p(o) = p(o|r)p(r) + p(o|b)p(b) + p(o|g)p(g) 4 10 × 0.2 + 1 2 × 0.2 + 3 10 × 0.6 = 0.36 = from which we obtain p(g|o) = 3 10 × 0.6 0.36 = 1 2 . 1.4 We are often interested in ﬁnding the most probable value for some quantity. In the case of probability distributions over discrete variables this poses little problem. However, for continuous variables there is a subtlety arising from the nature of prob- ability densities and the way they transform under non-linear changes of variable.

8 Solution 1.4 Consider ﬁrst the way a function f (x) behaves when we change to a new variable y where the two variables are related by x = g(y). This deﬁnes a new function of y given by (7) (8) $f (y) = f (g(y)). (7) with respect to y $f !(%y) = f !(g(%y))g!(%y) = 0. Suppose f (x) has a mode (i.e. a maximum) at%x so that f !(%x) = 0. The correspond- ing mode of$f (y) will occur for a value%y obtained by differentiating both sides of Assuming g!(%y) %= 0 at the mode, then f !(g(%y)) = 0. However, we know that f !(%x) = 0, and so we see that the locations of the mode expressed in terms of each of the variables x and y are related by%x = g(%y), as one would expect. Thus, ﬁnding a mode with respect to the variable x is completely equivalent to ﬁrst transforming to the variable y, then ﬁnding a mode with respect to y, and then transforming back to x. Now consider the behaviour of a probability density px(x) under the change of vari- ables x = g(y), where the density with respect to the new variable is py(y) and is given by ((1.27)). Let us write g!(y) = s|g!(y)| where s ∈ {−1, +1}. Then ((1.27)) can be written py(y) = px(g(y))sg!(y). Differentiating both sides with respect to y then gives p! y(y) = sp! x(g(y)){g!(y)}2 + spx(g(y))g!!(y). (9) Due to the presence of the second term on the right hand side of (9) the relationship not be the value obtained by transforming to py(y) then maximizing with respect to y and then transforming back to x. This causes modes of densities to be dependent on the choice of variables. In the case of linear transformation, the second term on the right hand side of (9) vanishes, and so the location of the maximum transforms %x = g(%y) no longer holds. Thus the value of x obtained by maximizing px(x) will according to%x = g(%y). This effect can be illustrated with a simple example, as shown in Figure 1. We begin by considering a Gaussian distribution px(x) over x with mean µ = 6 and standard deviation σ = 1, shown by the red curve in Figure 1. Next we draw a sample of N = 50, 000 points from this distribution and plot a histogram of their values, which as expected agrees with the distribution px(x). Now consider a non-linear change of variables from x to y given by x = g(y) = ln(y) − ln(1 − y) + 5. The inverse of this function is given by y = g−1(x) = 1 1 + exp(−x + 5) (10) (11)

分享到：

赞收藏

资料库

PRML课后习题及答案.pdf

相关推荐

人工智能

热门标签

最新资料