Pattern Recognition and Machine Learning
Solutions to the Exercises: Tutors’ Edition
Markus Svens´en and Christopher M. Bishop
Copyright c 2002–2009
This is the solutions manual (Tutors’ Edition) for the book Pattern Recognition and Machine Learning
(PRML; published by Springer in 2006). This release was created September 8, 2009. Any future releases
(e.g. with corrections to errors) will be announced on the PRML web-site (see below) and published via
Springer.
PLEASE DO NOT DISTRIBUTE
Most of the solutions in this manual are intended as a resource for
tutors teaching courses based on PRML and the value of this resource
would be greatly diminished if was to become generally available. All
tutors who want a copy should contact Springer directly.
The authors would like to express their gratitude to the various people who have provided feedback on
earlier releases of this document.
The authors welcome all comments, questions and suggestions about the solutions as well as reports on
(potential) errors in text or formulae in this document; please send any such feedback to
Further information about PRML is available from
prml-fb@microsoft.com
http://research.microsoft.com/∼cmbishop/PRML
Contents
Contents
.
. .
.
. .
.
. .
Chapter 1: Introduction .
.
.
. .
Chapter 2: Probability Distributions
.
Chapter 3: Linear Models for Regression .
. .
Chapter 4: Linear Models for Classification .
.
. .
Chapter 5: Neural Networks
.
. .
.
Chapter 6: Kernel Methods . .
. .
. .
.
. .
Chapter 7: Sparse Kernel Machines . .
.
. .
Chapter 8: Graphical Models .
. .
.
. .
Chapter 9: Mixture Models and EM .
.
Chapter 10: Approximate Inference . .
. .
. .
Chapter 11: Sampling Methods .
.
. .
. .
Chapter 12: Continuous Latent Variables .
. .
.
Chapter 13: Sequential Data
.
Chapter 14: Combining Models .
.
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
5
7
.
28
.
62
.
78
.
.
93
. 114
. 128
. 136
. 150
. 163
. 198
. 207
. 223
. 246
5
6
CONTENTS
Solutions 1.1–1.4
7
Chapter 1 Introduction
1.1 Substituting (1.1) into (1.2) and then differentiating with respect to wi we obtain
NXn=1 MXj=0
n − tn! xi
wjxj
n = 0.
(1)
Re-arranging terms then gives the required result.
1.2 For the regularized sum-of-squares error function given by (1.4) the corresponding
linear equations are again obtained by differentiation, and take the same form as
(1.122), but with Aij replaced byeAij, given by
eAij = Aij + λIij.
probability of selecting an apple is given by
1.3 Let us denote apples, oranges and limes by a, o and l respectively. The marginal
(2)
(4)
(5)
(6)
p(a) = p(a|r)p(r) + p(a|b)p(b) + p(a|g)p(g)
3
10 × 0.6 = 0.34
3
10 × 0.2 +
1
2 × 0.2 +
=
(3)
where the conditional probabilities are obtained from the proportions of apples in
each box.
To find the probability that the box was green, given that the fruit we selected was
an orange, we can use Bayes’ theorem
p(g|o) =
p(o|g)p(g)
p(o)
.
The denominator in (4) is given by
p(o) = p(o|r)p(r) + p(o|b)p(b) + p(o|g)p(g)
4
10 × 0.2 +
1
2 × 0.2 +
3
10 × 0.6 = 0.36
=
from which we obtain
p(g|o) =
3
10 ×
0.6
0.36
=
1
2
.
1.4 We are often interested in finding the most probable value for some quantity.
In
the case of probability distributions over discrete variables this poses little problem.
However, for continuous variables there is a subtlety arising from the nature of prob-
ability densities and the way they transform under non-linear changes of variable.
8
Solution 1.4
Consider first the way a function f (x) behaves when we change to a new variable y
where the two variables are related by x = g(y). This defines a new function of y
given by
ef (y) = f (g(y)).
(7) with respect to y
(7)
(8)
ef0(by) = f0(g(by))g0(by) = 0.
Suppose f (x) has a mode (i.e. a maximum) atbx so that f0(bx) = 0. The correspond-
ing mode ofef (y) will occur for a valueby obtained by differentiating both sides of
Assuming g0(by) 6= 0 at the mode, then f0(g(by)) = 0. However, we know that
f0(bx) = 0, and so we see that the locations of the mode expressed in terms of each
of the variables x and y are related bybx = g(by), as one would expect. Thus, finding
a mode with respect to the variable x is completely equivalent to first transforming
to the variable y, then finding a mode with respect to y, and then transforming back
to x.
Now consider the behaviour of a probability density px(x) under the change of vari-
ables x = g(y), where the density with respect to the new variable is py(y) and is
given by ((1.27)). Let us write g0(y) = s|g0(y)| where s ∈ {−1, +1}. Then ((1.27))
can be written
py(y) = px(g(y))sg0(y).
Differentiating both sides with respect to y then gives
p0y(y) = sp0x(g(y)){g0(y)}2 + spx(g(y))g00(y).
(9)
Due to the presence of the second term on the right hand side of (9) the relationship
not be the value obtained by transforming to py(y) then maximizing with respect to
y and then transforming back to x. This causes modes of densities to be dependent
on the choice of variables. In the case of linear transformation, the second term on
the right hand side of (9) vanishes, and so the location of the maximum transforms
bx = g(by) no longer holds. Thus the value of x obtained by maximizing px(x) will
according tobx = g(by).
This effect can be illustrated with a simple example, as shown in Figure 1. We
begin by considering a Gaussian distribution px(x) over x with mean µ = 6 and
standard deviation σ = 1, shown by the red curve in Figure 1. Next we draw a
sample of N = 50, 000 points from this distribution and plot a histogram of their
values, which as expected agrees with the distribution px(x).
Now consider a non-linear change of variables from x to y given by
x = g(y) = ln(y) − ln(1 − y) + 5.
The inverse of this function is given by
y = g−1(x) =
1
1 + exp(−x + 5)
(10)
(11)