gallager ---solution to Principles of Digital Communications.pdf

发布时间：2022-06-08 发布人：admin 分类：说明书资料大小：0.98M 资料格式：pdf 举报版权申诉

uubirdloong-2773503-4744300845232346942.pdf-第1页.png

第1页 / 共73页

uubirdloong-2773503-4744300845232346942.pdf-第2页.png

第2页 / 共73页

uubirdloong-2773503-4744300845232346942.pdf-第3页.png

第3页 / 共73页

uubirdloong-2773503-4744300845232346942.pdf-第4页.png

第4页 / 共73页

uubirdloong-2773503-4744300845232346942.pdf-第5页.png

第5页 / 共73页

uubirdloong-2773503-4744300845232346942.pdf-第6页.png

第6页 / 共73页

uubirdloong-2773503-4744300845232346942.pdf-第7页.png

第7页 / 共73页

uubirdloong-2773503-4744300845232346942.pdf-第8页.png

第8页 / 共73页

文本预览

SELECTED SOLUTIONS TO PRINCIPLES OF DIGITAL COMMUNICATION Cambridge Press 2008 by ROBERT G. GALLAGER A complete set of solutions is available from Cambridge Press for instructors teaching a class using this text. This is a subset of solutions that I feel would be valuable for those studying the subject on their own. Chapter 2 Exercise 2.2: (a) V + W is a random variable, so its expectation, by deﬁnition, is (v + w)pV W (v, w) w pV W (v, w) pV W (v, w) E[V + W ] = Xv∈V Xw∈W = Xv∈V Xw∈W v Xw∈W = Xv∈V v pV (v) + Xw∈W = Xv∈V = E[V ] + E[W ]. v pV W (v, w) +Xv∈V Xw∈W pV W (v, w) + Xw∈W wXv∈V w pW (w) (b) Once again, working from ﬁrst principles, E[V · W ] = Xv∈V Xw∈W = Xv∈V Xw∈W v pV (v) Xw∈W = Xv∈V = E[V ] · E[W ]. (v · w)pV W (v, w) (v · w)pV (v)pW (w) w pW (w) (Using independence) (c) To discover a case where E[V · W ] 6= E[V ]· E[W ], ﬁrst try the simplest kind of example where V and W are binary with the joint pmf pV W (0, 1) = pV W (1, 0) = 1/2; pV W (0, 0) = pV W (1, 1) = 0. 1

Clearly, V and W are not independent. Also, E[V · W ] = 0 whereas E[V ] = E[W ] = 1/2 and hence E[V ] · E[W ] = 1/4. The second case requires some experimentation. One approach is to choose a joint distribution such that E[V · W ] = 0 and E[V ] = 0. A simple solution is then given by the pmf, pV W (−1, 0) = pV W (0, 1) = pV W (1, 0) = 1/3. Again, V and W are not independent. Clearly, E[V · W ] = 0. Also, E[V ] = 0 (what is E[W ]?). Hence, E[V · W ] = E[V ] · E[W ]. (d) V +W = E[(V + W )2] − (E[V + W ])2 σ2 = E[V 2] + E[W 2] + E[2V · W ] − (E[V ] + E[W ])2 = E[V 2] + E[W 2] + 2E[V ] · E[W ] − E[V ]2 − E[W ]2 − 2E[V ] · E[W ] = E[V 2] − E[V ]2 + E[W 2] − E[W ]2 = σ2 V + σ2 W . Exercise 2.4: (a) Since X1 and X2 are iid, symmetry implies that Pr(X1 > X2) = Pr(X2 > X1). These two events are mutually exclusive and the event X1 = X2 has 0 probability. Thus Pr(X1 > X2) and Pr(X1 < X2) sum to 1, so must each be 1/2. Thus Pr(X1 ≥ X2) = Pr(X2 ≥ X1) = 1/2. (b) Invoking the symmetry among X1, X2 and X3, we see that each has the same probability of being the smallest of the three (the probability of a tie is 0). These three events are mutually exclusive and their probabilities must add up to 1. Therefore each event occurs with probability 1/3. (c) The event {N > n} is the same as the event {X1 is the minimum among the n iid random variables X1, X2, ··· , Xn}. By extending the argument in part (b), we see that Pr(X1 is the smallest of X1, . . . , Xn) = 1/n. Finally, Pr {N ≥ n} = Pr {N > n − 1}= 1 n−1 for n ≥ 2. (d) Since N is a non-negative integer random variable (taking on values from 2 to 1), we can use Exercise 2.3(a) as follows: Pr {N ≥ n} E[N] = 1Xn=1 = Pr {N ≥ 1} + Pr {N ≥ n} 1Xn=2 = 1 + = 1 + 1 n − 1 1 . n 1Xn=2 1Xn=1 2

1 n diverges, we conclude that E[N] = 1. Since the seriesP1n=1 (e) Since the alphabet has a ﬁnite number of letters,1 Pr(X1 = X2) is no longer 0 and depends on the particular probability distribution. Thus, although, Pr(X1 ≥ X2) = Pr(X2 ≥ X1) by symmetry, neither can be found without knowing the distribution. Out of the alphabet letters with nonzero probability, let amin be a letter of minimum numeric value. If X1 = amin, then no subsequent rv X2, X3, . . . can have a smaller value, so N = 1 in this case. Since the event X1 = amin occurs with positive probability, E[N] = 1. Exercise 2.6: (a) Assume the contrary; i.e., there is a suﬃx-free code that is not uniquely decodable. Then that code must contain two distinct sequences of source letters, say, x1, x2, . . . , xn and x01, x02, . . . , x0m such that, C(x1)C(x2) . . .C(xn) = C(x01)C(x02) . . .C(x0m). Then one of the following must hold: • C(xn) = C(x0m) • C(xn) is a suﬃx of C(x0m) • C(x0m) is a suﬃx of C(xn). In the last two cases we arrive at a contradiction since the code is hypothesized to be suﬃx-free. In the ﬁrst case, xn must equal x0m because of the suﬃx freedom. Simply delete that ﬁnal letter from each sequence and repeat the argument. Since the sequences are distinct, the ﬁnal letter must diﬀer after some number of repetitions of the above argument, and at that point one of the latter two cases holds and a contradiction is reached. Hence, suﬃx-free codes are uniquely decodable. (b) Any preﬁx-free code becomes a suﬃx-free code if the ordering of symbols in each codeword is reversed. About the simplest such example is {0,01,11} which can be seen to be a suﬃx-free code (with codeword lengths {1, 2, 2}) but not a preﬁx-free code. A codeword in the above code cannot be decoded as soon as its last bit arrives at the decoder. To illustrate a rather extreme case, consider the following output produced by the encoder, Assuming that source letters {a,b,c} map to {0,01,11}, we cannot distinguish between the two possible source sequences, 0111111111 . . . and acccccccc . . . bcccccccc . . . , 1The same results can be obtained with some extra work for a countably inﬁnite discrete alphabet. 3

till the end of the string is reached. Hence, in this case the decoder might have to wait for an arbitrarily long time before decoding. (c) There cannot be any code with codeword lengths (1, 2, 2) that is both preﬁx free and suﬃx free. Without loss of generality, set C1 = 0. Then a preﬁx-free code cannot use either the codewords 00 and 01 for C2 or C3, and thus must use 10 and 11, which is not suﬃx free. Exercise 2.7: Consider the set of codeword lengths (1,2,2) and arrange them as (2,1,2). Then, u1=0 is represented as 0.00. Next, u2 = 1/4 = 0.01 must be represented using 1 bit after the binary point, which is not possible. Hence, the algorithm fails. Exercise 2.9: (a) Assume, as usual, that pj > 0 for each j. From Eqs. (2.8) and (2.9) H[X] − L = pj log 2−lj pj ≤ MXj=1 pj∑2−lj pj − 1∏ log e = 0. MXj=1 As is evident from Figure 2.7, the inequality is strict unless 2−lj = pj for each j. Thus if H[X] = L, it follows that 2−lj = pj for each j. (b) First consider Figure 2.4, repeated below, assuming that Pr(a) = 1/2 and Pr(b) = Pr(c) = 1/4. The ﬁrst order node 0 corresponds to the letter a and has probability 1/2. The ﬁrst order node 1 corresponds to the occurence of either letter b or c, and thus has probability 1/2. ✏✏✏ ✏✏✏PPP PPP ✟✟✟ b ba ✏✏✏ ❍❍❍PPP ✏✏✏ PPP ✓✓ c ca ✏✏✏ ab PPP ac aa ✟✟✟ ❅❅ a PPP 1 ✓ ✓ ❅ 0 bb bc cb cc aa → 00 ab → 011 ac → 010 ba → 110 bb → 1111 bc → 1110 ca → 100 cb → 1011 cc → 1010 Similarly, the second order node 00 corresponds to aa, which has probability 1/4, and the second order node 01 corresponds to either ab or ac, which have cumulative probability 1/4. In the same way, 10 amd 11 correspond to b and c, with probabilities 1/4 each. One can proceed with higher order nodes in the same way, but what is the principle behind this? In general, when an inﬁnite binary tree is used to represent an unending sequence of letters from an iid source where each letter j has probability pj and length `j = 2−j, we see that each node corresponding to an initial sequence of letters x1, . . . , xn has a probabilityQi 2−`xi equal to the product of the individual letter probabilities and an order equal to Pi `xi. Thus each node labelled by a subsequence of letters has a probability 2−` where ` is the order of that node. The other nodes (those unlabelled in the example above) have a probability equal to the sum of the immediately following labelled nodes. This probability is again 2−` for an `th order node, which can be established by induction if one wishes to be formal. 4

Exercise 2.11: (a) For n = 2, 2  MXj=1 2−lj = MXj1=1 2−lj1 MXj2=1 2−lj2 = MXj1=1 MXj2=1 2−(lj1 +lj2 ). The same approach works for arbitrary n. (b) Each source n-tuple x n = (aj1aj2, . . . , ajn), is encoded into a concatenation C(aj1)C(aj2) . . . C(ajn) of binary digits of aggregate length l(x n) = lj1 + lj2 + ··· , +ljn. Since there is one n-tuple x n for each choice of aj1, aj2, . . . , ajn, the result of part (a) can be rewritten as 2−l(x n). =Xx n (1) (c) Rewriting (1) in terms of the number Ai of concatenations of n codewords of aggregate length i, n n  MXj=1  MXj=1 2−lj 2−lj = nlmaxXi=1 Ai2−i. This uses the fact that since each codeword has length at most lmax, each concatenation has length at most nlmax. (d) From unique decodability, each of these concatenations must be diﬀerent, so there are at most 2i concatenations of aggregate length i, i.e., Ai ≤ 2i. Thus, since the above sum contains at most nlmax terms, (e) Note that ≤ nlmax. n  MXj=1 2−lj [nlmax]1/n = exp∑ln(nlmax) n ∏ −→ exp(0) = 1 (2) as n → 1. Since (2) must be satisﬁed for all n, the Kraft inequality must be satisﬁed. Exercise 2.13: (a) In the Huﬀman algorithm, we start by combining p3 and p4. Since we have p1 = p3 +p4 ≥ p2, we can combine p1 and p2 in the next step, leading to all codewords of length 2. We can also combine the supersymbol obtained by combining symbols 3 and 4 with symbol 2, yielding codewords of lengths 1,2,3 and 3 respectively. (b) Note that p3 ≤ p2 and p4 ≤ p2 so p3 + p4 ≤ 2p2. Thus p1 = p3 + p4 ≤ 2p2 which implies p1 + p3 + p4 ≤ 4p2. 5

Since p2 = 1− p1 − p3 − p4, the latter equation implies that 1− p2 ≤ 4p2, or p2 ≥ 0.2. From the former equation, then, p1 ≤ 2p2 ≤ 0.4 shows that p1 ≤ 0.4. These bounds can be met by also choosing p3 = p4 = 0.2. Thus pmax = 0.4. (c) Reasoning similarly to part (b), p2 ≤ p1 and p2 = 1 − p1 − p3 − p4 = 1 − 2p1. Thus 1 − 2p1 ≤ p1 so p1 ≥ 1/3, i.e., pmin = 1/3. This bound is achievable by choosing p1 = p2 = 1/3 and p3 = p4 = 1/6. (d) The argument in part (b) remains the same if we assume p1 ≤ p3+p4 rather than p1 = p3+p4, i.e., p1 ≤ p3 + p4 implies that p1 ≤ pmax. Thus assuming p1 > pmax implies that p1 > p3 + p4. Thus the supersymbol obtained by combining symbols 3 and 4 will be combined with symbol 2 (or perhaps with symbol 1 if p2 = p1). Thus the codeword for symbol 1 (or perhaps the codeword for symbol 2) will have length 1. (e) The lengths of any optimal preﬁx free code must be either (1, 2, 3, 3) or (2, 2, 2, 2). If p1 > pmax, then, from (b), p1 > p3 + p4, so the lengths (1, 2, 3, 3) yield a lower average length than (2, 2, 2, 2). (f) The argument in part (c) remains almost the same if we start with the assumption that p1 ≥ p3 + p4. In this case p2 = 1 − p1 − p3 − p3 ≥ 1 − 2p1. Combined with p1 ≥ p2, we again have p1 ≥ pmin. Thus if p1 < pmin, we must have p3 + p4 > p1 ≥ p2. We then must combine p1 and p2 in the second step of the Huﬀman algorithm, so each codeword will have length 2. (g) It turns out that pmax is still 2/5. To see this, ﬁrst note that if p1 = 2/5, p2 = p3 = 1/5 and all other symbols have an aggregate probability of 1/5, then the Huﬀman code construction combines the least likely symbols until they are tied together into a supersymbol of probability 1/5. The completion of the algorithm, as in part (b), can lead to either one codeword of length 1 or 3 codewords of length 2 and the others of longer length. If p1 > 2/5, then at each stage of the algorithm, two nodes of aggregate probability less than 2/5 are combined, leaving symbol 1 unattached until only 4 nodes remain in the reduced symbol set. The argument in (d) then guarantees that the code will have one codeword of length 1. Exercise 2.15: (a) This is the same as Lemma 2.5.1. (b) Since p1 < pM−1 + pM, we see that p1 < p0M−1, where p0M−1 is the probability of the node in the reduced code tree corresponding to letters M − 1 and M in the original alphabet. Thus, by part (a), l1 ≥ l0M−1 = lM − 1. (c) Consider an arbitrary minimum-expected-length code tree. This code tree must be full (by Lemma 2.5.2), so suppose that symbol k is the sibling of symbol M in this tree. If k = 1, then l1 = lM, and otherwise, p1 < pM + pk, so l1 must be at least as large as the length of the immediate parent of M, showing that l1 ≥ lM − 1. (d) and (e) We have shown that the shortest and longest length diﬀer by at most 1, with some number m ≥ 1 lengths equal to l1 and the remaining M−m lengths equal to l1+1. It follows that 2l1+1 = 2m + (M − m) = M + m. From this is follows that l1 = blog2(M)c and m = 2l1+1 − M. Exercise 2.16: (a) Grow a full ternary tree to a full ternary tree at each step. The smallest tree has 3 leaves. For the next largest full tree, convert one of the leaves into an intermediate node and grow 3 leaves 6

from that node. We lose 1 leaf, but gain 2 more at each growth extension. Thus, M = 3 + 2n (for n an integer). (b) It is clear that for optimality, all the unused leaves in the tree must have the same length as the longest codeword. For M even, combine the 2 lowest probabilities into a node at the ﬁrst step, then combine the 3 lowest probability nodes for all the rest of the steps until the root node. If M is odd, a full ternary tree is possible, so combine the 3 lowest probability nodes at each step. (c) If {a, b, c, d, e, f} have symbol probabilities {0.3, 0.2, 0.2, 0.1, 0.1, 0.1} respectively, then the ternary Huﬀman code will be {a → 0, b → 1, c → 20, d → 21, e → 220, f → 221}. Exercise 2.18: (a) Applying the Huﬀman coding algorithm to the code with M + 1 symbols with pM+1 = 0, we combine symbol M + 1 with symbol M and the reduced code has M symbols with probabilities p1, . . . , pM. The Huﬀman code for this reduced set of symbols is simply the code for the original set of symbols with symbol M + 1 eliminated. Thus the code including symbol M + 1 is the reduced code modiﬁed by a unit length increase in the codeword for symbol M. Thus L = L0+pM where L0 is the expected length for the code with M symbols. (b) All n of the zero probability symbols are combined together in the Huﬀman algorithm, and the reduced code from this combination is then the same as the code with M + 1 symbols in part (a). Thus L = L0 + pM again. Exercise 2.19: (a) The entropies H(X), H(Y ), and H(XY ) can be expressed as H(XY ) = − Xx∈X ,y∈Y H(X) = − Xx∈X ,y∈Y H(Y ) = − Xx∈X ,y∈Y pXY (x, y) log pXY (x, y) pXY (x, y) log pX(x) pXY (x, y) log pY (y). It is assumed that all symbol pairs x, y of zero probability have been removed from this sum, and thus all x (y) for which pX(x) = 0 ( pY (y) = 0) are consequently removed. Combining these equations, H(XY ) − H(X) − H(Y ) = Xx∈X ,y∈Y (b) Using the standard inequality log x ≤ (x − 1) log e, pXY (x, y) log pX(x)pY (y) pXY (x, y) . H(XY ) − H(X) − H(Y ) ≤ Xx∈X ,y∈Y pXY (x, y)∑ pX(x)pY (y) pXY (x, y) − 1∏ log e = 0. Thus H(X, Y ) ≤ H(X) + H(Y ). Note that this inequality is satisﬁed with equality if and only if X and Y are independent. 7

(c) For n symbols, X1, . . . , Xn, let Y be the ‘super-symbol’ X2, . . . , Xn. Then using (b), H(X1, . . . , Xn) = H(X1, Y ) ≤ H(X1) + H(Y ) = H(X1) + H(X2, . . . , Xn). Iterating this gives the desired result. An alternate approach generalizes part (b) in the following way: H(X1, . . . , Xn) −Xi H(Xi) = Xx1,... ,xn ≤ 0, p(x1, . . . , xn) log p(x1), . . . , ...p(xn) p(x1, . . . , xn) where we have used log x ≤ (x − 1) log e again. Exercise 2.20 (a) Y is 1 if X = 1, which occurs with probability p1. Y is 0 otherwise. Thus H(Y ) = −p1 log(p1) − (1 − p1) log(1 − p1) = Hb(p1). (b) Given Y =1, X = 1 with probability 1, so H(X | Y =1) = 0. (c) Given Y =0, X=1 has probability 0, so X has M−1 possible choices with non-zero probability. The maximum entropy for an alphabet of size M−1 terms is log(M−1), so H(X|Y =0) ≤ log(M− 1). This upper bound is met with equality if Pr(X=j | X6=1) = 1 M−1 for all j 6= 1. Since Pr(X=j|X6=1) = pj/(1 − p1), this upper bound on H(X | Y =0) is achieved when p2 = p3 = ··· = pM. Combining this with part (b), H(X | Y ) = p1H(X | Y =1) ≤ (1−p1) log(M − 1). (d) Note that H(XY ) = H(Y ) + H(X|Y ) ≤ Hb(p1) + (1−p1) log(M−1) and this is met with equality for p2 = ··· , pM. There are now two reasonable approaches. One is to note that H(XY ) can also be expressed as H(X) + H(Y |X). Since Y is uniquely speciﬁed by X, H(Y |X) = 0, H(X) = H(XY ) ≤ Hb(p1) + (1 − p1) log(M − 1), (3) with equality when p2 = p3 = ··· = pM. The other approach is to observe that H(X) ≤ H(XY ), which again leads (3), but this does not immediately imply that equality is met for p2 = ··· = pM. Equation (3) is the Fano bound of information theory; it is useful when p1 is very close to 1 and plays a key role in the noisy channel coding theorem. (e) The same bound applies to each symbol by replacing p1 by pj for any j, 1 ≤ j ≤ M. Thus it also applies to pmax. Exercise 2.22: One way to generate a source code for (X1, X2, X3 is to concatenate a Huﬀman code for (X1, X2) with a Huﬀman code of X3. The expected length of the resulting code for 3 Lmin,2 + (X1, X2, X3) is Lmin,2 + Lmin. The expected length per source letter of this code is 2 8

分享到：

赞收藏

资料库

gallager ---solution to Principles of Digital Communications.pdf

相关推荐

课程资源

热门标签

最新资料