logo资料库

Pattern Recognition and Machine Learning习题答案(英文).pdf

第1页 / 共255页
第2页 / 共255页
第3页 / 共255页
第4页 / 共255页
第5页 / 共255页
第6页 / 共255页
第7页 / 共255页
第8页 / 共255页
资料共255页,剩余部分请下载后查看
Chapter 1 Introduction
Chapter 2 Probability Distributions
Chapter 3 Linear Models for Regression
Chapter 4 Linear Models for Classification
Chapter 5 Neural Networks
Chapter 6 Kernel Methods
Chapter 7 Sparse Kernel Machines
Chapter 8 Graphical Models
Chapter 9 Mixture Models and EM
Chapter 10 Approximate Inference
Chapter 11 Sampling Methods
Chapter 12 Continuous Latent Variables
Chapter 13 Sequential Data
Chapter 14 Combining Models
Pattern Recognition and Machine Learning Solutions to the Exercises: Tutors’ Edition Markus Svens´en and Christopher M. Bishop Copyright c⃝ 2002–2009 This is the solutions manual (Tutors’ Edition) for the book Pattern Recognition and Machine Learning (PRML; published by Springer in 2006). This release was created September 8, 2009. Any future releases (e.g. with corrections to errors) will be announced on the PRML web-site (see below) and published via Springer. PLEASE DO NOT DISTRIBUTE Most of the solutions in this manual are intended as a resource for tutors teaching courses based on PRML and the value of this resource would be greatly diminished if was to become generally available. All tutors who want a copy should contact Springer directly. The authors would like to express their gratitude to the various people who have provided feedback on earlier releases of this document. The authors welcome all comments, questions and suggestions about the solutions as well as reports on (potential) errors in text or formulae in this document; please send any such feedback to Further information about PRML is available from prml-fb@microsoft.com http://research.microsoft.com/∼cmbishop/PRML
Contents Contents . . . . . . . . . . . . . 5 7 . . . . . . Chapter 1: Introduction . 28 . . . . . Chapter 2: Probability Distributions 62 . . . Chapter 3: Linear Models for Regression . 78 . Chapter 4: Linear Models for Classification . 93 . . . . Chapter 5: Neural Networks . . . . . 114 . . Chapter 6: Kernel Methods . . . . . . . 128 . . . . Chapter 7: Sparse Kernel Machines . . . . . 136 Chapter 8: Graphical Models . . . . . . 150 Chapter 9: Mixture Models and EM . . . . . . 163 Chapter 10: Approximate Inference . . . . . . Chapter 11: Sampling Methods . . . . . . . . . . . . . . . . . . . . . . . 198 Chapter 12: Continuous Latent Variables . . . . 207 Chapter 13: Sequential Data . . . . . . . . . . . . . . . . . . . . . . . . 223 Chapter 14: Combining Models . . . . . . . . . . . . . . . . . . . . . . . 246 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
6 CONTENTS
Solutions 1.1–1.4 7 Chapter 1 Introduction 1.1 Substituting (1.1) into (1.2) and then differentiating with respect to wi we obtain N!n=1" M!j=0 n − tn# xi wjxj n = 0. (1) Re-arranging terms then gives the required result. 1.2 For the regularized sum-of-squares error function given by (1.4) the corresponding linear equations are again obtained by differentiation, and take the same form as (1.122), but with Aij replaced by$Aij, given by $Aij = Aij + λIij. probability of selecting an apple is given by 1.3 Let us denote apples, oranges and limes by a, o and l respectively. The marginal (2) (4) (5) (6) = 3 10 × 0.2 + p(a) = p(a|r)p(r) + p(a|b)p(b) + p(a|g)p(g) 3 10 × 0.6 = 0.34 (3) where the conditional probabilities are obtained from the proportions of apples in each box. To find the probability that the box was green, given that the fruit we selected was an orange, we can use Bayes’ theorem 1 2 × 0.2 + p(g|o) = p(o|g)p(g) p(o) . The denominator in (4) is given by p(o) = p(o|r)p(r) + p(o|b)p(b) + p(o|g)p(g) 4 10 × 0.2 + 1 2 × 0.2 + 3 10 × 0.6 = 0.36 = from which we obtain p(g|o) = 3 10 × 0.6 0.36 = 1 2 . 1.4 We are often interested in finding the most probable value for some quantity. In the case of probability distributions over discrete variables this poses little problem. However, for continuous variables there is a subtlety arising from the nature of prob- ability densities and the way they transform under non-linear changes of variable.
分享到:
收藏