logo资料库

gallager ---solution to Principles of Digital Communications.pdf

第1页 / 共73页
第2页 / 共73页
第3页 / 共73页
第4页 / 共73页
第5页 / 共73页
第6页 / 共73页
第7页 / 共73页
第8页 / 共73页
资料共73页,剩余部分请下载后查看
SELECTED SOLUTIONS TO PRINCIPLES OF DIGITAL COMMUNICATION Cambridge Press 2008 by ROBERT G. GALLAGER A complete set of solutions is available from Cambridge Press for instructors teaching a class using this text. This is a subset of solutions that I feel would be valuable for those studying the subject on their own. Chapter 2 Exercise 2.2: (a) V + W is a random variable, so its expectation, by definition, is (v + w)pV W (v, w) w pV W (v, w) pV W (v, w) E[V + W ] = Xv∈V Xw∈W = Xv∈V Xw∈W v Xw∈W = Xv∈V v pV (v) + Xw∈W = Xv∈V = E[V ] + E[W ]. v pV W (v, w) +Xv∈V Xw∈W pV W (v, w) + Xw∈W wXv∈V w pW (w) (b) Once again, working from first principles, E[V · W ] = Xv∈V Xw∈W = Xv∈V Xw∈W v pV (v) Xw∈W = Xv∈V = E[V ] · E[W ]. (v · w)pV W (v, w) (v · w)pV (v)pW (w) w pW (w) (Using independence) (c) To discover a case where E[V · W ] 6= E[V ]· E[W ], first try the simplest kind of example where V and W are binary with the joint pmf pV W (0, 1) = pV W (1, 0) = 1/2; pV W (0, 0) = pV W (1, 1) = 0. 1
Clearly, V and W are not independent. Also, E[V · W ] = 0 whereas E[V ] = E[W ] = 1/2 and hence E[V ] · E[W ] = 1/4. The second case requires some experimentation. One approach is to choose a joint distribution such that E[V · W ] = 0 and E[V ] = 0. A simple solution is then given by the pmf, pV W (−1, 0) = pV W (0, 1) = pV W (1, 0) = 1/3. Again, V and W are not independent. Clearly, E[V · W ] = 0. Also, E[V ] = 0 (what is E[W ]?). Hence, E[V · W ] = E[V ] · E[W ]. (d) V +W = E[(V + W )2] − (E[V + W ])2 σ2 = E[V 2] + E[W 2] + E[2V · W ] − (E[V ] + E[W ])2 = E[V 2] + E[W 2] + 2E[V ] · E[W ] − E[V ]2 − E[W ]2 − 2E[V ] · E[W ] = E[V 2] − E[V ]2 + E[W 2] − E[W ]2 = σ2 V + σ2 W . Exercise 2.4: (a) Since X1 and X2 are iid, symmetry implies that Pr(X1 > X2) = Pr(X2 > X1). These two events are mutually exclusive and the event X1 = X2 has 0 probability. Thus Pr(X1 > X2) and Pr(X1 < X2) sum to 1, so must each be 1/2. Thus Pr(X1 ≥ X2) = Pr(X2 ≥ X1) = 1/2. (b) Invoking the symmetry among X1, X2 and X3, we see that each has the same probability of being the smallest of the three (the probability of a tie is 0). These three events are mutually exclusive and their probabilities must add up to 1. Therefore each event occurs with probability 1/3. (c) The event {N > n} is the same as the event {X1 is the minimum among the n iid random variables X1, X2, ··· , Xn}. By extending the argument in part (b), we see that Pr(X1 is the smallest of X1, . . . , Xn) = 1/n. Finally, Pr {N ≥ n} = Pr {N > n − 1}= 1 n−1 for n ≥ 2. (d) Since N is a non-negative integer random variable (taking on values from 2 to 1), we can use Exercise 2.3(a) as follows: Pr {N ≥ n} E[N] = 1Xn=1 = Pr {N ≥ 1} + Pr {N ≥ n} 1Xn=2 = 1 + = 1 + 1 n − 1 1 . n 1Xn=2 1Xn=1 2
1 n diverges, we conclude that E[N] = 1. Since the seriesP1n=1 (e) Since the alphabet has a finite number of letters,1 Pr(X1 = X2) is no longer 0 and depends on the particular probability distribution. Thus, although, Pr(X1 ≥ X2) = Pr(X2 ≥ X1) by symmetry, neither can be found without knowing the distribution. Out of the alphabet letters with nonzero probability, let amin be a letter of minimum numeric value. If X1 = amin, then no subsequent rv X2, X3, . . . can have a smaller value, so N = 1 in this case. Since the event X1 = amin occurs with positive probability, E[N] = 1. Exercise 2.6: (a) Assume the contrary; i.e., there is a suffix-free code that is not uniquely decodable. Then that code must contain two distinct sequences of source letters, say, x1, x2, . . . , xn and x01, x02, . . . , x0m such that, C(x1)C(x2) . . .C(xn) = C(x01)C(x02) . . .C(x0m). Then one of the following must hold: • C(xn) = C(x0m) • C(xn) is a suffix of C(x0m) • C(x0m) is a suffix of C(xn). In the last two cases we arrive at a contradiction since the code is hypothesized to be suffix-free. In the first case, xn must equal x0m because of the suffix freedom. Simply delete that final letter from each sequence and repeat the argument. Since the sequences are distinct, the final letter must differ after some number of repetitions of the above argument, and at that point one of the latter two cases holds and a contradiction is reached. Hence, suffix-free codes are uniquely decodable. (b) Any prefix-free code becomes a suffix-free code if the ordering of symbols in each codeword is reversed. About the simplest such example is {0,01,11} which can be seen to be a suffix-free code (with codeword lengths {1, 2, 2}) but not a prefix-free code. A codeword in the above code cannot be decoded as soon as its last bit arrives at the decoder. To illustrate a rather extreme case, consider the following output produced by the encoder, Assuming that source letters {a,b,c} map to {0,01,11}, we cannot distinguish between the two possible source sequences, 0111111111 . . . and acccccccc . . . bcccccccc . . . , 1The same results can be obtained with some extra work for a countably infinite discrete alphabet. 3
till the end of the string is reached. Hence, in this case the decoder might have to wait for an arbitrarily long time before decoding. (c) There cannot be any code with codeword lengths (1, 2, 2) that is both prefix free and suffix free. Without loss of generality, set C1 = 0. Then a prefix-free code cannot use either the codewords 00 and 01 for C2 or C3, and thus must use 10 and 11, which is not suffix free. Exercise 2.7: Consider the set of codeword lengths (1,2,2) and arrange them as (2,1,2). Then, u1=0 is represented as 0.00. Next, u2 = 1/4 = 0.01 must be represented using 1 bit after the binary point, which is not possible. Hence, the algorithm fails. Exercise 2.9: (a) Assume, as usual, that pj > 0 for each j. From Eqs. (2.8) and (2.9) H[X] − L = pj log 2−lj pj ≤ MXj=1 pj∑2−lj pj − 1∏ log e = 0. MXj=1 As is evident from Figure 2.7, the inequality is strict unless 2−lj = pj for each j. Thus if H[X] = L, it follows that 2−lj = pj for each j. (b) First consider Figure 2.4, repeated below, assuming that Pr(a) = 1/2 and Pr(b) = Pr(c) = 1/4. The first order node 0 corresponds to the letter a and has probability 1/2. The first order node 1 corresponds to the occurence of either letter b or c, and thus has probability 1/2. ✏✏✏ ✏✏✏PPP PPP ✟✟✟ b ba ✏✏✏ ❍❍❍PPP ✏✏✏ PPP ✓✓ c ca ✏✏✏ ab PPP ac aa ✟✟✟ ❅❅ a PPP 1 ✓ ✓ ❅ 0 bb bc cb cc aa → 00 ab → 011 ac → 010 ba → 110 bb → 1111 bc → 1110 ca → 100 cb → 1011 cc → 1010 Similarly, the second order node 00 corresponds to aa, which has probability 1/4, and the second order node 01 corresponds to either ab or ac, which have cumulative probability 1/4. In the same way, 10 amd 11 correspond to b and c, with probabilities 1/4 each. One can proceed with higher order nodes in the same way, but what is the principle behind this? In general, when an infinite binary tree is used to represent an unending sequence of letters from an iid source where each letter j has probability pj and length `j = 2−j, we see that each node corresponding to an initial sequence of letters x1, . . . , xn has a probabilityQi 2−`xi equal to the product of the individual letter probabilities and an order equal to Pi `xi. Thus each node labelled by a subsequence of letters has a probability 2−` where ` is the order of that node. The other nodes (those unlabelled in the example above) have a probability equal to the sum of the immediately following labelled nodes. This probability is again 2−` for an `th order node, which can be established by induction if one wishes to be formal. 4
Exercise 2.11: (a) For n = 2, 2  MXj=1 2−lj = MXj1=1 2−lj1 MXj2=1 2−lj2 = MXj1=1 MXj2=1 2−(lj1 +lj2 ). The same approach works for arbitrary n. (b) Each source n-tuple x n = (aj1aj2, . . . , ajn), is encoded into a concatenation C(aj1)C(aj2) . . . C(ajn) of binary digits of aggregate length l(x n) = lj1 + lj2 + ··· , +ljn. Since there is one n-tuple x n for each choice of aj1, aj2, . . . , ajn, the result of part (a) can be rewritten as 2−l(x n). =Xx n (1) (c) Rewriting (1) in terms of the number Ai of concatenations of n codewords of aggregate length i, n n  MXj=1  MXj=1 2−lj 2−lj = nlmaxXi=1 Ai2−i. This uses the fact that since each codeword has length at most lmax, each concatenation has length at most nlmax. (d) From unique decodability, each of these concatenations must be different, so there are at most 2i concatenations of aggregate length i, i.e., Ai ≤ 2i. Thus, since the above sum contains at most nlmax terms, (e) Note that ≤ nlmax. n  MXj=1 2−lj [nlmax]1/n = exp∑ln(nlmax) n ∏ −→ exp(0) = 1 (2) as n → 1. Since (2) must be satisfied for all n, the Kraft inequality must be satisfied. Exercise 2.13: (a) In the Huffman algorithm, we start by combining p3 and p4. Since we have p1 = p3 +p4 ≥ p2, we can combine p1 and p2 in the next step, leading to all codewords of length 2. We can also combine the supersymbol obtained by combining symbols 3 and 4 with symbol 2, yielding codewords of lengths 1,2,3 and 3 respectively. (b) Note that p3 ≤ p2 and p4 ≤ p2 so p3 + p4 ≤ 2p2. Thus p1 = p3 + p4 ≤ 2p2 which implies p1 + p3 + p4 ≤ 4p2. 5
Since p2 = 1− p1 − p3 − p4, the latter equation implies that 1− p2 ≤ 4p2, or p2 ≥ 0.2. From the former equation, then, p1 ≤ 2p2 ≤ 0.4 shows that p1 ≤ 0.4. These bounds can be met by also choosing p3 = p4 = 0.2. Thus pmax = 0.4. (c) Reasoning similarly to part (b), p2 ≤ p1 and p2 = 1 − p1 − p3 − p4 = 1 − 2p1. Thus 1 − 2p1 ≤ p1 so p1 ≥ 1/3, i.e., pmin = 1/3. This bound is achievable by choosing p1 = p2 = 1/3 and p3 = p4 = 1/6. (d) The argument in part (b) remains the same if we assume p1 ≤ p3+p4 rather than p1 = p3+p4, i.e., p1 ≤ p3 + p4 implies that p1 ≤ pmax. Thus assuming p1 > pmax implies that p1 > p3 + p4. Thus the supersymbol obtained by combining symbols 3 and 4 will be combined with symbol 2 (or perhaps with symbol 1 if p2 = p1). Thus the codeword for symbol 1 (or perhaps the codeword for symbol 2) will have length 1. (e) The lengths of any optimal prefix free code must be either (1, 2, 3, 3) or (2, 2, 2, 2). If p1 > pmax, then, from (b), p1 > p3 + p4, so the lengths (1, 2, 3, 3) yield a lower average length than (2, 2, 2, 2). (f) The argument in part (c) remains almost the same if we start with the assumption that p1 ≥ p3 + p4. In this case p2 = 1 − p1 − p3 − p3 ≥ 1 − 2p1. Combined with p1 ≥ p2, we again have p1 ≥ pmin. Thus if p1 < pmin, we must have p3 + p4 > p1 ≥ p2. We then must combine p1 and p2 in the second step of the Huffman algorithm, so each codeword will have length 2. (g) It turns out that pmax is still 2/5. To see this, first note that if p1 = 2/5, p2 = p3 = 1/5 and all other symbols have an aggregate probability of 1/5, then the Huffman code construction combines the least likely symbols until they are tied together into a supersymbol of probability 1/5. The completion of the algorithm, as in part (b), can lead to either one codeword of length 1 or 3 codewords of length 2 and the others of longer length. If p1 > 2/5, then at each stage of the algorithm, two nodes of aggregate probability less than 2/5 are combined, leaving symbol 1 unattached until only 4 nodes remain in the reduced symbol set. The argument in (d) then guarantees that the code will have one codeword of length 1. Exercise 2.15: (a) This is the same as Lemma 2.5.1. (b) Since p1 < pM−1 + pM, we see that p1 < p0M−1, where p0M−1 is the probability of the node in the reduced code tree corresponding to letters M − 1 and M in the original alphabet. Thus, by part (a), l1 ≥ l0M−1 = lM − 1. (c) Consider an arbitrary minimum-expected-length code tree. This code tree must be full (by Lemma 2.5.2), so suppose that symbol k is the sibling of symbol M in this tree. If k = 1, then l1 = lM, and otherwise, p1 < pM + pk, so l1 must be at least as large as the length of the immediate parent of M, showing that l1 ≥ lM − 1. (d) and (e) We have shown that the shortest and longest length differ by at most 1, with some number m ≥ 1 lengths equal to l1 and the remaining M−m lengths equal to l1+1. It follows that 2l1+1 = 2m + (M − m) = M + m. From this is follows that l1 = blog2(M)c and m = 2l1+1 − M. Exercise 2.16: (a) Grow a full ternary tree to a full ternary tree at each step. The smallest tree has 3 leaves. For the next largest full tree, convert one of the leaves into an intermediate node and grow 3 leaves 6
from that node. We lose 1 leaf, but gain 2 more at each growth extension. Thus, M = 3 + 2n (for n an integer). (b) It is clear that for optimality, all the unused leaves in the tree must have the same length as the longest codeword. For M even, combine the 2 lowest probabilities into a node at the first step, then combine the 3 lowest probability nodes for all the rest of the steps until the root node. If M is odd, a full ternary tree is possible, so combine the 3 lowest probability nodes at each step. (c) If {a, b, c, d, e, f} have symbol probabilities {0.3, 0.2, 0.2, 0.1, 0.1, 0.1} respectively, then the ternary Huffman code will be {a → 0, b → 1, c → 20, d → 21, e → 220, f → 221}. Exercise 2.18: (a) Applying the Huffman coding algorithm to the code with M + 1 symbols with pM+1 = 0, we combine symbol M + 1 with symbol M and the reduced code has M symbols with probabilities p1, . . . , pM. The Huffman code for this reduced set of symbols is simply the code for the original set of symbols with symbol M + 1 eliminated. Thus the code including symbol M + 1 is the reduced code modified by a unit length increase in the codeword for symbol M. Thus L = L0+pM where L0 is the expected length for the code with M symbols. (b) All n of the zero probability symbols are combined together in the Huffman algorithm, and the reduced code from this combination is then the same as the code with M + 1 symbols in part (a). Thus L = L0 + pM again. Exercise 2.19: (a) The entropies H(X), H(Y ), and H(XY ) can be expressed as H(XY ) = − Xx∈X ,y∈Y H(X) = − Xx∈X ,y∈Y H(Y ) = − Xx∈X ,y∈Y pXY (x, y) log pXY (x, y) pXY (x, y) log pX(x) pXY (x, y) log pY (y). It is assumed that all symbol pairs x, y of zero probability have been removed from this sum, and thus all x (y) for which pX(x) = 0 ( pY (y) = 0) are consequently removed. Combining these equations, H(XY ) − H(X) − H(Y ) = Xx∈X ,y∈Y (b) Using the standard inequality log x ≤ (x − 1) log e, pXY (x, y) log pX(x)pY (y) pXY (x, y) . H(XY ) − H(X) − H(Y ) ≤ Xx∈X ,y∈Y pXY (x, y)∑ pX(x)pY (y) pXY (x, y) − 1∏ log e = 0. Thus H(X, Y ) ≤ H(X) + H(Y ). Note that this inequality is satisfied with equality if and only if X and Y are independent. 7
(c) For n symbols, X1, . . . , Xn, let Y be the ‘super-symbol’ X2, . . . , Xn. Then using (b), H(X1, . . . , Xn) = H(X1, Y ) ≤ H(X1) + H(Y ) = H(X1) + H(X2, . . . , Xn). Iterating this gives the desired result. An alternate approach generalizes part (b) in the following way: H(X1, . . . , Xn) −Xi H(Xi) = Xx1,... ,xn ≤ 0, p(x1, . . . , xn) log p(x1), . . . , ...p(xn) p(x1, . . . , xn) where we have used log x ≤ (x − 1) log e again. Exercise 2.20 (a) Y is 1 if X = 1, which occurs with probability p1. Y is 0 otherwise. Thus H(Y ) = −p1 log(p1) − (1 − p1) log(1 − p1) = Hb(p1). (b) Given Y =1, X = 1 with probability 1, so H(X | Y =1) = 0. (c) Given Y =0, X=1 has probability 0, so X has M−1 possible choices with non-zero probability. The maximum entropy for an alphabet of size M−1 terms is log(M−1), so H(X|Y =0) ≤ log(M− 1). This upper bound is met with equality if Pr(X=j | X6=1) = 1 M−1 for all j 6= 1. Since Pr(X=j|X6=1) = pj/(1 − p1), this upper bound on H(X | Y =0) is achieved when p2 = p3 = ··· = pM. Combining this with part (b), H(X | Y ) = p1H(X | Y =1) ≤ (1−p1) log(M − 1). (d) Note that H(XY ) = H(Y ) + H(X|Y ) ≤ Hb(p1) + (1−p1) log(M−1) and this is met with equality for p2 = ··· , pM. There are now two reasonable approaches. One is to note that H(XY ) can also be expressed as H(X) + H(Y |X). Since Y is uniquely specified by X, H(Y |X) = 0, H(X) = H(XY ) ≤ Hb(p1) + (1 − p1) log(M − 1), (3) with equality when p2 = p3 = ··· = pM. The other approach is to observe that H(X) ≤ H(XY ), which again leads (3), but this does not immediately imply that equality is met for p2 = ··· = pM. Equation (3) is the Fano bound of information theory; it is useful when p1 is very close to 1 and plays a key role in the noisy channel coding theorem. (e) The same bound applies to each symbol by replacing p1 by pj for any j, 1 ≤ j ≤ M. Thus it also applies to pmax. Exercise 2.22: One way to generate a source code for (X1, X2, X3 is to concatenate a Huffman code for (X1, X2) with a Huffman code of X3. The expected length of the resulting code for 3 Lmin,2 + (X1, X2, X3) is Lmin,2 + Lmin. The expected length per source letter of this code is 2 8
分享到:
收藏