SELECTED SOLUTIONS TO
PRINCIPLES OF DIGITAL
COMMUNICATION
Cambridge Press 2008
by ROBERT G. GALLAGER
A complete set of solutions is available from Cambridge Press for instructors teaching a class
using this text. This is a subset of solutions that I feel would be valuable for those studying the
subject on their own.
Chapter 2
Exercise 2.2:
(a) V + W is a random variable, so its expectation, by definition, is
(v + w)pV W (v, w)
w pV W (v, w)
pV W (v, w)
E[V + W ] = Xv∈V Xw∈W
= Xv∈V Xw∈W
v Xw∈W
= Xv∈V
v pV (v) + Xw∈W
= Xv∈V
= E[V ] + E[W ].
v pV W (v, w) +Xv∈V Xw∈W
pV W (v, w) + Xw∈W
wXv∈V
w pW (w)
(b) Once again, working from first principles,
E[V · W ] = Xv∈V Xw∈W
= Xv∈V Xw∈W
v pV (v) Xw∈W
= Xv∈V
= E[V ] · E[W ].
(v · w)pV W (v, w)
(v · w)pV (v)pW (w)
w pW (w)
(Using independence)
(c) To discover a case where E[V · W ] 6= E[V ]· E[W ], first try the simplest kind of example where
V and W are binary with the joint pmf
pV W (0, 1) = pV W (1, 0) = 1/2;
pV W (0, 0) = pV W (1, 1) = 0.
1
Clearly, V and W are not independent. Also, E[V · W ] = 0 whereas E[V ] = E[W ] = 1/2 and
hence E[V ] · E[W ] = 1/4.
The second case requires some experimentation. One approach is to choose a joint distribution
such that E[V · W ] = 0 and E[V ] = 0. A simple solution is then given by the pmf,
pV W (−1, 0) = pV W (0, 1) = pV W (1, 0) = 1/3.
Again, V and W are not independent. Clearly, E[V · W ] = 0. Also, E[V ] = 0 (what is E[W ]?).
Hence, E[V · W ] = E[V ] · E[W ].
(d)
V +W = E[(V + W )2] − (E[V + W ])2
σ2
= E[V 2] + E[W 2] + E[2V · W ] − (E[V ] + E[W ])2
= E[V 2] + E[W 2] + 2E[V ] · E[W ] − E[V ]2 − E[W ]2 − 2E[V ] · E[W ]
= E[V 2] − E[V ]2 + E[W 2] − E[W ]2
= σ2
V + σ2
W .
Exercise 2.4:
(a) Since X1 and X2 are iid, symmetry implies that Pr(X1 > X2) = Pr(X2 > X1). These two
events are mutually exclusive and the event X1 = X2 has 0 probability. Thus Pr(X1 > X2) and
Pr(X1 < X2) sum to 1, so must each be 1/2. Thus Pr(X1 ≥ X2) = Pr(X2 ≥ X1) = 1/2.
(b) Invoking the symmetry among X1, X2 and X3, we see that each has the same probability of
being the smallest of the three (the probability of a tie is 0). These three events are mutually
exclusive and their probabilities must add up to 1. Therefore each event occurs with probability
1/3.
(c) The event {N > n} is the same as the event {X1 is the minimum among the n iid
random variables X1, X2, ··· , Xn}. By extending the argument in part (b), we see that
Pr(X1 is the smallest of X1, . . . , Xn) = 1/n. Finally, Pr {N ≥ n} = Pr {N > n − 1}= 1
n−1 for
n ≥ 2.
(d) Since N is a non-negative integer random variable (taking on values from 2 to 1), we can
use Exercise 2.3(a) as follows:
Pr {N ≥ n}
E[N] =
1Xn=1
= Pr {N ≥ 1} +
Pr {N ≥ n}
1Xn=2
= 1 +
= 1 +
1
n − 1
1
.
n
1Xn=2
1Xn=1
2
1
n diverges, we conclude that E[N] = 1.
Since the seriesP1n=1
(e) Since the alphabet has a finite number of letters,1 Pr(X1 = X2) is no longer 0 and depends
on the particular probability distribution. Thus, although, Pr(X1 ≥ X2) = Pr(X2 ≥ X1) by
symmetry, neither can be found without knowing the distribution.
Out of the alphabet letters with nonzero probability, let amin be a letter of minimum numeric
value. If X1 = amin, then no subsequent rv X2, X3, . . . can have a smaller value, so N = 1 in
this case. Since the event X1 = amin occurs with positive probability, E[N] = 1.
Exercise 2.6:
(a) Assume the contrary; i.e., there is a suffix-free code that is not uniquely decodable. Then that
code must contain two distinct sequences of source letters, say, x1, x2, . . . , xn and x01, x02, . . . , x0m
such that,
C(x1)C(x2) . . .C(xn) = C(x01)C(x02) . . .C(x0m).
Then one of the following must hold:
• C(xn) = C(x0m)
• C(xn) is a suffix of C(x0m)
• C(x0m) is a suffix of C(xn).
In the last two cases we arrive at a contradiction since the code is hypothesized to be suffix-free.
In the first case, xn must equal x0m because of the suffix freedom. Simply delete that final letter
from each sequence and repeat the argument. Since the sequences are distinct, the final letter
must differ after some number of repetitions of the above argument, and at that point one of
the latter two cases holds and a contradiction is reached.
Hence, suffix-free codes are uniquely decodable.
(b) Any prefix-free code becomes a suffix-free code if the ordering of symbols in each codeword
is reversed. About the simplest such example is {0,01,11} which can be seen to be a suffix-free
code (with codeword lengths {1, 2, 2}) but not a prefix-free code.
A codeword in the above code cannot be decoded as soon as its last bit arrives at the decoder.
To illustrate a rather extreme case, consider the following output produced by the encoder,
Assuming that source letters {a,b,c} map to {0,01,11}, we cannot distinguish between the two
possible source sequences,
0111111111 . . .
and
acccccccc . . .
bcccccccc . . . ,
1The same results can be obtained with some extra work for a countably infinite discrete alphabet.
3
till the end of the string is reached. Hence, in this case the decoder might have to wait for an
arbitrarily long time before decoding.
(c) There cannot be any code with codeword lengths (1, 2, 2) that is both prefix free and suffix
free. Without loss of generality, set C1 = 0. Then a prefix-free code cannot use either the
codewords 00 and 01 for C2 or C3, and thus must use 10 and 11, which is not suffix free.
Exercise 2.7: Consider the set of codeword lengths (1,2,2) and arrange them as (2,1,2). Then,
u1=0 is represented as 0.00. Next, u2 = 1/4 = 0.01 must be represented using 1 bit after the
binary point, which is not possible. Hence, the algorithm fails.
Exercise 2.9:
(a) Assume, as usual, that pj > 0 for each j. From Eqs. (2.8) and (2.9)
H[X] − L =
pj log
2−lj
pj ≤
MXj=1
pj∑2−lj
pj − 1∏ log e = 0.
MXj=1
As is evident from Figure 2.7, the inequality is strict unless 2−lj = pj for each j. Thus if
H[X] = L, it follows that 2−lj = pj for each j.
(b) First consider Figure 2.4, repeated below, assuming that Pr(a) = 1/2 and Pr(b) = Pr(c) =
1/4. The first order node 0 corresponds to the letter a and has probability 1/2. The first order
node 1 corresponds to the occurence of either letter b or c, and thus has probability 1/2.
✏✏✏
✏✏✏PPP
PPP
✟✟✟
b
ba
✏✏✏
❍❍❍PPP
✏✏✏
PPP
✓✓
c
ca
✏✏✏
ab
PPP
ac
aa
✟✟✟
❅❅
a
PPP
1
✓
✓
❅
0
bb
bc
cb
cc
aa → 00
ab → 011
ac → 010
ba → 110
bb → 1111
bc → 1110
ca → 100
cb → 1011
cc → 1010
Similarly, the second order node 00 corresponds to aa, which has probability 1/4, and the second
order node 01 corresponds to either ab or ac, which have cumulative probability 1/4. In the
same way, 10 amd 11 correspond to b and c, with probabilities 1/4 each. One can proceed with
higher order nodes in the same way, but what is the principle behind this?
In general, when an infinite binary tree is used to represent an unending sequence of letters
from an iid source where each letter j has probability pj and length `j = 2−j, we see that each
node corresponding to an initial sequence of letters x1, . . . , xn has a probabilityQi 2−`xi equal
to the product of the individual letter probabilities and an order equal to Pi `xi. Thus each
node labelled by a subsequence of letters has a probability 2−` where ` is the order of that node.
The other nodes (those unlabelled in the example above) have a probability equal to the sum of
the immediately following labelled nodes. This probability is again 2−` for an `th order node,
which can be established by induction if one wishes to be formal.
4
Exercise 2.11: (a) For n = 2,
2
MXj=1
2−lj
=
MXj1=1
2−lj1
MXj2=1
2−lj2 =
MXj1=1
MXj2=1
2−(lj1 +lj2 ).
The same approach works for arbitrary n.
(b) Each source n-tuple x n = (aj1aj2, . . . , ajn),
is encoded into a concatenation
C(aj1)C(aj2) . . . C(ajn) of binary digits of aggregate length l(x n) = lj1 + lj2 + ··· , +ljn. Since
there is one n-tuple x n for each choice of aj1, aj2, . . . , ajn, the result of part (a) can be rewritten
as
2−l(x n).
=Xx n
(1)
(c) Rewriting (1) in terms of the number Ai of concatenations of n codewords of aggregate length
i,
n
n
MXj=1
MXj=1
2−lj
2−lj
=
nlmaxXi=1
Ai2−i.
This uses the fact that since each codeword has length at most lmax, each concatenation has
length at most nlmax.
(d) From unique decodability, each of these concatenations must be different, so there are at
most 2i concatenations of aggregate length i, i.e., Ai ≤ 2i. Thus, since the above sum contains
at most nlmax terms,
(e) Note that
≤ nlmax.
n
MXj=1
2−lj
[nlmax]1/n = exp∑ln(nlmax)
n
∏ −→ exp(0) = 1
(2)
as n → 1. Since (2) must be satisfied for all n, the Kraft inequality must be satisfied.
Exercise 2.13:
(a) In the Huffman algorithm, we start by combining p3 and p4. Since we have p1 = p3 +p4 ≥ p2,
we can combine p1 and p2 in the next step, leading to all codewords of length 2. We can also
combine the supersymbol obtained by combining symbols 3 and 4 with symbol 2, yielding
codewords of lengths 1,2,3 and 3 respectively.
(b) Note that p3 ≤ p2 and p4 ≤ p2 so p3 + p4 ≤ 2p2. Thus
p1 = p3 + p4 ≤ 2p2 which implies p1 + p3 + p4 ≤ 4p2.
5
Since p2 = 1− p1 − p3 − p4, the latter equation implies that 1− p2 ≤ 4p2, or p2 ≥ 0.2. From the
former equation, then, p1 ≤ 2p2 ≤ 0.4 shows that p1 ≤ 0.4. These bounds can be met by also
choosing p3 = p4 = 0.2. Thus pmax = 0.4.
(c) Reasoning similarly to part (b), p2 ≤ p1 and p2 = 1 − p1 − p3 − p4 = 1 − 2p1. Thus
1 − 2p1 ≤ p1 so p1 ≥ 1/3, i.e., pmin = 1/3. This bound is achievable by choosing p1 = p2 = 1/3
and p3 = p4 = 1/6.
(d) The argument in part (b) remains the same if we assume p1 ≤ p3+p4 rather than p1 = p3+p4,
i.e., p1 ≤ p3 + p4 implies that p1 ≤ pmax. Thus assuming p1 > pmax implies that p1 > p3 + p4.
Thus the supersymbol obtained by combining symbols 3 and 4 will be combined with symbol
2 (or perhaps with symbol 1 if p2 = p1). Thus the codeword for symbol 1 (or perhaps the
codeword for symbol 2) will have length 1.
(e) The lengths of any optimal prefix free code must be either (1, 2, 3, 3) or (2, 2, 2, 2). If p1 >
pmax, then, from (b), p1 > p3 + p4, so the lengths (1, 2, 3, 3) yield a lower average length than
(2, 2, 2, 2).
(f) The argument in part (c) remains almost the same if we start with the assumption that
p1 ≥ p3 + p4. In this case p2 = 1 − p1 − p3 − p3 ≥ 1 − 2p1. Combined with p1 ≥ p2, we again
have p1 ≥ pmin. Thus if p1 < pmin, we must have p3 + p4 > p1 ≥ p2. We then must combine p1
and p2 in the second step of the Huffman algorithm, so each codeword will have length 2.
(g) It turns out that pmax is still 2/5. To see this, first note that if p1 = 2/5, p2 = p3 = 1/5
and all other symbols have an aggregate probability of 1/5, then the Huffman code construction
combines the least likely symbols until they are tied together into a supersymbol of probability
1/5. The completion of the algorithm, as in part (b), can lead to either one codeword of length
1 or 3 codewords of length 2 and the others of longer length. If p1 > 2/5, then at each stage of
the algorithm, two nodes of aggregate probability less than 2/5 are combined, leaving symbol
1 unattached until only 4 nodes remain in the reduced symbol set. The argument in (d) then
guarantees that the code will have one codeword of length 1.
Exercise 2.15:
(a) This is the same as Lemma 2.5.1.
(b) Since p1 < pM−1 + pM, we see that p1 < p0M−1, where p0M−1 is the probability of the node
in the reduced code tree corresponding to letters M − 1 and M in the original alphabet. Thus,
by part (a), l1 ≥ l0M−1 = lM − 1.
(c) Consider an arbitrary minimum-expected-length code tree. This code tree must be full (by
Lemma 2.5.2), so suppose that symbol k is the sibling of symbol M in this tree.
If k = 1,
then l1 = lM, and otherwise, p1 < pM + pk, so l1 must be at least as large as the length of the
immediate parent of M, showing that l1 ≥ lM − 1.
(d) and (e) We have shown that the shortest and longest length differ by at most 1, with some
number m ≥ 1 lengths equal to l1 and the remaining M−m lengths equal to l1+1. It follows that
2l1+1 = 2m + (M − m) = M + m. From this is follows that l1 = blog2(M)c and m = 2l1+1 − M.
Exercise 2.16:
(a) Grow a full ternary tree to a full ternary tree at each step. The smallest tree has 3 leaves. For
the next largest full tree, convert one of the leaves into an intermediate node and grow 3 leaves
6
from that node. We lose 1 leaf, but gain 2 more at each growth extension. Thus, M = 3 + 2n
(for n an integer).
(b) It is clear that for optimality, all the unused leaves in the tree must have the same length
as the longest codeword. For M even, combine the 2 lowest probabilities into a node at the
first step, then combine the 3 lowest probability nodes for all the rest of the steps until the root
node. If M is odd, a full ternary tree is possible, so combine the 3 lowest probability nodes at
each step.
(c) If {a, b, c, d, e, f} have symbol probabilities {0.3, 0.2, 0.2, 0.1, 0.1, 0.1} respectively, then the
ternary Huffman code will be {a → 0, b → 1, c → 20, d → 21, e → 220, f → 221}.
Exercise 2.18:
(a) Applying the Huffman coding algorithm to the code with M + 1 symbols with pM+1 = 0, we
combine symbol M + 1 with symbol M and the reduced code has M symbols with probabilities
p1, . . . , pM. The Huffman code for this reduced set of symbols is simply the code for the original
set of symbols with symbol M + 1 eliminated. Thus the code including symbol M + 1 is the
reduced code modified by a unit length increase in the codeword for symbol M. Thus L = L0+pM
where L0 is the expected length for the code with M symbols.
(b) All n of the zero probability symbols are combined together in the Huffman algorithm, and
the reduced code from this combination is then the same as the code with M + 1 symbols in
part (a). Thus L = L0 + pM again.
Exercise 2.19:
(a) The entropies H(X), H(Y ), and H(XY ) can be expressed as
H(XY ) = − Xx∈X ,y∈Y
H(X) = − Xx∈X ,y∈Y
H(Y ) = − Xx∈X ,y∈Y
pXY (x, y) log pXY (x, y)
pXY (x, y) log pX(x)
pXY (x, y) log pY (y).
It is assumed that all symbol pairs x, y of zero probability have been removed from this sum,
and thus all x (y) for which pX(x) = 0 ( pY (y) = 0) are consequently removed. Combining these
equations,
H(XY ) − H(X) − H(Y ) = Xx∈X ,y∈Y
(b) Using the standard inequality log x ≤ (x − 1) log e,
pXY (x, y) log pX(x)pY (y)
pXY (x, y) .
H(XY ) − H(X) − H(Y ) ≤ Xx∈X ,y∈Y
pXY (x, y)∑ pX(x)pY (y)
pXY (x, y) − 1∏ log e = 0.
Thus H(X, Y ) ≤ H(X) + H(Y ). Note that this inequality is satisfied with equality if and only if
X and Y are independent.
7
(c) For n symbols, X1, . . . , Xn, let Y be the ‘super-symbol’ X2, . . . , Xn. Then using (b),
H(X1, . . . , Xn) = H(X1, Y ) ≤ H(X1) + H(Y ) = H(X1) + H(X2, . . . , Xn).
Iterating this gives the desired result.
An alternate approach generalizes part (b) in the following way:
H(X1, . . . , Xn) −Xi
H(Xi) = Xx1,... ,xn
≤ 0,
p(x1, . . . , xn) log p(x1), . . . , ...p(xn)
p(x1, . . . , xn)
where we have used log x ≤ (x − 1) log e again.
Exercise 2.20
(a) Y is 1 if X = 1, which occurs with probability p1. Y is 0 otherwise. Thus
H(Y ) = −p1 log(p1) − (1 − p1) log(1 − p1) = Hb(p1).
(b) Given Y =1, X = 1 with probability 1, so H(X | Y =1) = 0.
(c) Given Y =0, X=1 has probability 0, so X has M−1 possible choices with non-zero probability.
The maximum entropy for an alphabet of size M−1 terms is log(M−1), so H(X|Y =0) ≤ log(M−
1). This upper bound is met with equality if Pr(X=j | X6=1) = 1
M−1 for all j 6= 1. Since
Pr(X=j|X6=1) = pj/(1 − p1), this upper bound on H(X | Y =0) is achieved when p2 = p3 =
··· = pM. Combining this with part (b),
H(X | Y ) = p1H(X | Y =1) ≤ (1−p1) log(M − 1).
(d) Note that
H(XY ) = H(Y ) + H(X|Y ) ≤ Hb(p1) + (1−p1) log(M−1)
and this is met with equality for p2 = ··· , pM. There are now two reasonable approaches. One
is to note that H(XY ) can also be expressed as H(X) + H(Y |X). Since Y is uniquely specified
by X, H(Y |X) = 0,
H(X) = H(XY ) ≤ Hb(p1) + (1 − p1) log(M − 1),
(3)
with equality when p2 = p3 = ··· = pM. The other approach is to observe that H(X) ≤ H(XY ),
which again leads (3), but this does not immediately imply that equality is met for p2 = ··· = pM.
Equation (3) is the Fano bound of information theory; it is useful when p1 is very close to 1 and
plays a key role in the noisy channel coding theorem.
(e) The same bound applies to each symbol by replacing p1 by pj for any j, 1 ≤ j ≤ M. Thus
it also applies to pmax.
Exercise 2.22: One way to generate a source code for (X1, X2, X3 is to concatenate a Huffman
code for (X1, X2) with a Huffman code of X3. The expected length of the resulting code for
3 Lmin,2 +
(X1, X2, X3) is Lmin,2 + Lmin. The expected length per source letter of this code is 2
8