logo资料库

信道极化编码.pdf

第1页 / 共23页
第2页 / 共23页
第3页 / 共23页
第4页 / 共23页
第5页 / 共23页
第6页 / 共23页
第7页 / 共23页
第8页 / 共23页
资料共23页,剩余部分请下载后查看
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009 3051 Channel Polarization: A Method for Constructing Capacity-Achieving Codes for Symmetric Binary-Input Memoryless Channels Erdal Arıkan, Senior Member, IEEE Abstract—A method is proposed, called channel polarization, to construct code sequences that achieve the symmetric capacity I(W ) of any given binary-input discrete memoryless channel (B-DMC) W . The symmetric capacity is the highest rate achiev- able subject to using the input letters of the channel with equal probability. Channel polarization refers to the fact that it is pos- sible to synthesize, out of N independent copies of a given B-DMC W , a second set of N binary-input channels fW (i) N : 1  i  N g such that, as N becomes large, the fraction of indices i for which I(W (i) N ) is near 1 approaches I(W ) and the fraction for which I(W (i) N ) is near 0 approaches 1 I(W ). The polarized channels fW (i) N g are well-conditioned for channel coding: one need only send data at rate 1 through those with capacity near 1 and at rate 0 through the remaining. Codes constructed on the basis of this idea are called polar codes. The paper proves that, given any B-DMC W with I(W ) > 0 and any target rate R < I(W ), there exists a sequence of polar codes f n; n  1g such that n has block-length N = 2n, rate  R, and probability of block error under suc- cessive cancellation decoding bounded as Pe(N; R)  O(N ) independently of the code rate. This performance is achievable by encoders and decoders with complexity O(N log N ) for each. Index Terms—Capacity-achieving codes, channel capacity, channel polarization, Plotkin construction, polar codes, Reed– Muller (RM) codes, successive cancellation decoding. I. INTRODUCTION AND OVERVIEW A FASCINATING aspect of Shannon’s proof of the noisy channel coding theorem is the random-coding method that he used to show the existence of capacity-achieving code sequences without exhibiting any specific such sequence [1]. Explicit construction of provably capacity-achieving code sequences with low encoding and decoding complexities has since then been an elusive goal. This paper is an attempt to meet this goal for the class of binary-input discrete memoryless channels (B-DMCs). We will give a description of the main ideas and results of the paper in this section. First, we give some definitions and state some basic facts that are used throughout the paper. Manuscript received October 14, 2007; revised August 13, 2008. Current ver- sion published June 24, 2009. This work was supported in part by The Scien- tific and Technological Research Council of Turkey (TÜB˙ITAK) under Project 107E216 and in part by the European Commission FP7 Network of Excellence NEWCOM++ under Contract 216715. The material in this paper was presented in part at the IEEE International Symposium on Information Theory (ISIT), Toronto, ON, Canada, July 2008. The author is with the Department of Electrical-Electronics Engineering, Bilkent University, Ankara, 06800, Turkey (e-mail: arikan@ee.bilkent.edu.tr). Communicated by Y. Steinberg, Associate Editor for Shannon Theory. Color versions of Figures 4 and 7 in this paper are available online at http:// ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIT.2009.2021379 A. Preliminaries We write input alphabet , output alphabet to denote a generic B-DMC with , and transition probabilities . The input alphabet will always be , the output alphabet and the transition probabilities may to denote the channel corresponding with be arbitrary. We write ; thus, to . uses of Given a B-DMC , there are two channel parameters of pri- mary interest in this paper: the symmetric capacity and the Bhattacharyya parameter These parameters are used as measures of rate and reliability, is the highest rate at which reliable commu- respectively. with equal nication is possible across is an upper bound on the probability of max- frequency. imum-likelihood (ML) decision error when is used only once to transmit a using the inputs of or . It is easy to see that we will use base- values in will be bits. takes values in . Throughout, will also take . The unit for code rates and channel capacities logarithms; hence, Intuitively, one would expect that and the Appendix, make this precise. iff , . The following bounds, proved in iff Proposition 1: For any B-DMC , we have (1) (2) The symmetric capacity when exists a permutation equals the Shannon capacity is a symmetric channel, i.e., a channel for which there such that i) . The bi- nary symmetric channel (BSC) and the binary erasure channel (BEC) are examples of symmetric channels. A BSC is a B-DMC of the output alphabet for all and ii) with . A B-DMC and is called a BEC if for each or , either . In the latter case, 0018-9448/$25.00 © 2009 IEEE
3052 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009 Fig. 1. The channel W . erasure symbols is said to be an erasure symbol. The sum of over all is called the erasure probability of the BEC. We denote random variables (RVs) by upper case letters, such , and their realizations (sample values) by the corre- as sponding lower case letters, such as denotes the probability assignment on of RVs We use the standard notation mutual information and its conditional form, respectively. . For a joint ensemble denotes the joint probability assignment. to denote the an RV, . For We use the notation as shorthand for denoting a row vector . Given such a vector , to denote the subvector and as void. Given the subvector with odd indices note the subvector with even indices For example, for . We write , , we write is regarded ; if , we write to denote to denote the subvector to de- even . odd . We write , we have the all-zero vector. . The notation is used to denote Code constructions in this paper will be carried out in vector . Unless specified otherwise, spaces over the binary field GF all vectors, matrices, and operations on them will be over we write GF sum. The and an vectors over GF to denote their componentwise mod- . In particular, for -by- matrix Kronecker product of an -by- matrix Fig. 2. The channel W and its relation to W and W . channel , where can be any power of two, . The recursion begins at the th level with only one copy of . The first level of the recursion combines two independent copies of and we set as shown in Fig. 1 and obtains the channel with the transition probabilities The next level of the recursion is shown in Fig. 2 where two are combined to create the channel independent copies of with transition probabilities . (3) In Fig. 2, is the permutation operation that maps an input to from the input of can be written as . The mapping to the input of with is defined as ... ... ... Thus, we have the relation tween the transition probabilities of and those of . be- which is an defined as convention that We write -by- matrix. The Kronecker power is . We will follow the for all . write equals to denote the number of elements in a set ; thus, to denote the indicator function of a set if otherwise. and We use the standard Landau notation denote the asymptotic behavior of functions. B. Channel Polarization . We to independent copies of a given B-DMC channels Channel polarization is an operation by which one manufac- a tures out of second set of that show a becomes large, the polarization effect in the sense that, as symmetric capacity terms for all but a vanishing fraction of indices . This operation consists of a channel combining phase and a channel splitting phase. tend towards or 1) Channel Combining: This phase combines copies of a in a recursive manner to produce a vector given B-DMC The general form of the recursion is shown in Fig. 3 where are combined to produce the is first transformed . The input vector two independent copies of channel into to and so that for . The operator the reverse shuffle operation, and acts on its input in the figure is a permutation, known as to produce , which becomes the input to the two copies of as shown in the figure. We observe that the mapping It follows by induction that the overall mapping from the input of the synthesized channel the underlying raw channels represented by a matrix the generator matrix of size the two channels so that and is linear over GF . , to the input of , is also linear and may be . We call . The transition probabilities of are related by (4) . We will show in Section VII that is a . , where for all equals for any permutation matrix known as bit-reversal and
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES 3053 . This recursion is valid only for BECs with and it is proved in Section III. No efficient algorithm is known for calculation of for a general B-DMC . Fig. 4 shows that tends to be near for small for large . However, and near shows an erratic be- havior for an intermediate range of . For general B-DMCs, de- termining the subset of indices is above a given threshold is an important computational problem that will be addressed in Section IX. for which 4) Rate of Polarization: For proving coding theorems, the speed with which the polarization effect takes hold as a function is important. Our main result in this regard is given in terms of of the parameters Fig. 3. Recursive construction of W from two copies of W . (7) Note that the channel combining operation is fully specified by have the same set of the matrix rows, but in a different (bit-reversed) order; we will discuss this topic more fully in Section VII. . Also note that and out of 2) Channel Splitting: Having synthesized the vector channel , the next step of channel polarization is to binary-input coordinate channels , defined by the transition back into a set of , split probabilities Theorem 2: For any B-DMC any fixed , with , and there exists a sequence of sets , such that and for all . This theorem is proved in Section IV-B. We stated the polarization result in Theorem 2 in terms because this form is better suited to the coding results that we will develop. A rate of polarization result in terms of can be obtained from Theorem 2 with the help of Proposition 1. rather than (5) C. Polar Coding where denotes the output of and its input. To gain an intuitive understanding of the channels , consider a genie-aided successive cancellation decoder in which the th decision element estimates and (supplied correctly by the genie the past channel inputs is a regardless of any decision errors at earlier stages). If priori uniform on is the effective channel seen by the th decision element in this scenario. after observing , then 3) Channel Polarization: Theorem 1: For any B-DMC , the channels larize in the sense that, for any fixed to infinity through powers of two, the fraction of indices , as for which the fraction for which goes to goes to . po- goes and This theorem is proved in Section IV. The polarization effect is illustrated in Fig. 4 for the case . The numbers is a BEC with erasure probability have been computed using the recursive relations (6) We take advantage of the polarization effect to construct by a codes that achieve the symmetric channel capacity method we call polar coding. The basic idea of polar coding is to create a coding system where one can access each coordinate channel individually and send data only through those for which is near . 1) -Coset Codes: We first describe a class of block codes that contain polar codes—the codes of main interest—as a spe- for this class are restricted to cial case. The block lengths , each powers of two, code in the class is encoded in the same manner, namely . For a given for some where For is the generator matrix of order , defined above. an arbitrary subset of , we may write (8) as (8) (9) formed by the rows denotes the submatrix of where with indices in If we now fix . , but leave obtain a mapping from source blocks as a free variable, we to codeword blocks . This mapping is a coset code: it is a coset of the linear block and
3054 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009 Fig. 4. Plot of I(W ) versus i = 1; . . . ; N = 2 for a BEC with  = 0:5. code with generator matrix by the fixed vector of codes collectively as codes will be identified by a parameter vector where ratio formation set and to For example, the , with the coset determined . We will refer to this class -coset , .1 The as the in- is the code dimension and specifies the size of is called the code rate. We will refer to -coset codes. Individual as frozen bits or vector. code has the encoder mapping (10) For a source block , the coded block is . Polar codes will be specified shortly by giving a particular rule for the selection of the information set . encoded into a codeword 2) A Successive Cancellation Decoder: Consider a -coset code with parameter be be sent over the channel , let be received. The decoder’s , given knowledge . Since the decoder can avoid errors in the , the real decoding task is to , and let a channel output task is to generate an estimate of frozen part by setting generate an estimate . Let and of of . The coding results in this paper will be given with respect to a specific successive cancellation (SC) decoder, unless some other decoder is mentioned. Given any -coset code, we will use an SC decoder that generates its decision by computing if if (11) in the order , where are decision functions defined as from to if otherwise , (12) The decision functions for all block error occurred if . We will say that a decoder or equivalently if . defined above resemble ML de- cision functions but are not exactly so, because they treat the as RVs, rather than future frozen bits can be as known bits. In exchange for this suboptimality, computed efficiently using recursive formulas, as we will show in Section II. Apart from algorithmic efficiency, the recursive structure of the decision functions is important because it ren- ders the performance analysis of the decoder tractable. Fortu- nately, the loss in performance due to not using true ML decision functions happens to be negligible: is still achievable. 3) Code Performance: The notation will denote the probability of block error for an code, assuming that each data vector probability More precisely is sent with and decoding is done by the above SC decoder. The average of be denoted by , i.e., over all choices for will 1We include the redundant parameter K in the parameter set because often we consider an ensemble of codes with K fixed and A free. A key bound on block error probability under SC decoding is the following.
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES 3055 Proposition 2: For any B-DMC and any choice of the parameters (13) Hence, for each that , there exists a frozen vector such (14) This is proved in Section V-B. This result suggests choosing from among all so as to minimize the right-hand side (RHS) of (13). This idea leads to the defini- tion of polar codes. -subsets of 4) Polar Codes: Given a B-DMC , a -coset code with parameter if the information set such that . will be called a polar code for is chosen as a -element subset of for all Polar codes are channel-specific designs: a polar code for one channel may not be a polar code for another. The main result of this paper will be to show that polar coding achieves the sym- metric capacity of any given B-DMC . as a specify -element subset of An alternative rule for polar code definition would be to such that . This alternative . However, the rule based on the rule would also achieve Bhattacharyya parameters has the advantage of being connected with an explicit bound on block error probability. for all The polar code definition does not specify how the frozen is to be chosen; it may be chosen at will. This de- vector simplifies the performance gree of freedom in the choice of analysis of polar codes by allowing averaging over an ensemble. However, it is not for analytical convenience alone that we do , but also because not specify a precise rule for selecting it appears that the code performance is relatively insensitive to that choice. In fact, we prove in Section VI-B that, for symmetric channels, any choice for is as good as any other. 5) Coding Theorems: Fix a B-DMC and a number . selected in be defined as Let accordance with the polar coding rule for is the probability of block error under SC decoding for polar , averaged over coding over . The main coding result of all choices for the frozen bits this paper is the following. with block length with . Thus, and rate Theorem 3: For any given B-DMC , block error probability for polar coding under successive can- cellation decoding satisfies and fixed (15) This theorem follows as an easy corollary to Theorem 2 and the bound (13), as we show in Section V-B. For symmetric chan- nels, we have the following stronger version of Theorem 3. Theorem 4: For any symmetric B-DMC , consider any sequence of and any fixed -coset codes with increasing to infinity, chosen in accordance with the polar coding rule for fixed arbitrarily. The block error probability under successive cancellation decoding satisfies , and (16) This is proved in Section VI-B. Note that for symmetric chan- nels equals the Shannon capacity of . 6) Complexity: An important issue about polar coding is the complexity of encoding, decoding, and code construction. The recursive structure of the channel polarization construction leads to low-complexity encoding and decoding algorithms for -coset codes, and in particular, for polar codes. the class of Theorem 5: For the class of -coset codes, the complexity of encoding and the complexity of successive cancellation as functions of code block decoding are both length . This theorem is proved in Sections VII and VIII. Notice that the complexity bounds in Theorem 5 are independent of the code rate and the way the frozen vector is chosen. The bounds , but clearly this has no practical hold even at rates above significance. As for code construction, we have found no low-complexity algorithms for constructing polar codes. One exception is the case of a BEC for which we have a polar code construction al- . We discuss the code construc- gorithm with complexity tion problem further in Section IX and suggest a low-complexity statistical algorithm for approximating the exact polar code con- struction. D. Relations To Previous Work This paper is an extension of work begun in [2], where channel combining and splitting were used to show that im- provements can be obtained in the sum cutoff rate for some specific DMCs. However, no recursive method was suggested there to reach the ultimate limit of such improvements. As the present work progressed, it became clear that polar coding had much in common with Reed–Muller (RM) coding [3], [4]. Indeed, recursive code construction and SC decoding, which are two essential ingredients of polar coding, appear to have been introduced into coding theory by RM codes. According to one construction of RM codes, for any and dimension whose generator matrix of the rows of , denoted , an RM code with block length and , is defined as a linear code is obtained by deleting so that none of the deleted rows ’s in that row) than has a larger Hamming weight (number of rows. For instance any of the remaining and This construction brings out the similarities between RM have the same , it is -coset codes. codes and polar codes. Since set of rows (only in a different order) for any clear that RM codes belong to the class of and
3056 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009 For example, is the of a -coset code of a given size -coset code with parameter . So, RM coding and polar coding may be regarded as two alternative rules for selecting the infor- . mation set Unlike polar coding, RM coding selects the information set in a channel-independent manner; it is not as fine-tuned to the channel polarization phenomenon as polar coding is. We will show in Section X that, at least for the class of BECs, the RM rule for information set selection leads to asymptotically unreliable codes under SC decoding. So, polar coding goes beyond RM coding in a nontrivial manner by paying closer attention to channel polarization. Another connection to existing work can be established by codes, which noting that polar codes are multilevel are a class of codes originating from Plotkin’s method for code combining [5]. This connection is not surprising in view of the codes [6, pp. fact that RM codes are also multilevel 114–125]. However, unlike typical multilevel code construc- tions, where one begins with specific small codes to build larger ones, in polar coding the multilevel code is obtained by expur- , with respect gating rows of a full-order generator matrix en- to a channel-specific criterion. The special structure of sures that, no matter how expurgation is done, the resulting code code. In essence, polar coding enjoys is a multilevel the freedom to pick a multilevel code from an ensemble of such codes so as to suit the channel at hand, while conventional ap- proaches to multilevel coding do not have this degree of flexi- bility. Finally, we wish to mention a “spectral” interpretation of polar codes which is similar to Blahut’s treatment of Bose–Chaudhuri–Hocquenghem (BCH) codes [7, Ch. 9]; this type of similarity has already been pointed out by Forney [8, Ch. 11] in connection with RM codes. From the spectral viewpoint, the encoding operation (8) is regarded as a transform to a “time” of a “frequency” domain information vector domain codeword vector . The transform is invertible with . The decoding operation is regarded as a spec- tral estimation problem in which one is given a time domain , and asked to observation estimate . To aid the estimation task, one is allowed to freeze . This spectral a certain number of spectral components of interpretation of polar coding suggests that it may be possible to treat polar codes and BCH codes in a unified framework. The spectral interpretation also opens the door to the use of various signal processing techniques in polar coding; indeed, in Section VII, we exploit some fast transform techniques in designing encoders for polar codes. , which is a noisy version of E. Paper Outline The rest of the paper is organized as follows. Section II ex- plores the recursive properties of the channel splitting operation. In Section III, we focus on how get trans- formed through a single step of channel combining and split- ting. We extend this to an asymptotic analysis in Section IV and complete the proofs of Theorems 1 and 2. This completes the part of the paper on channel polarization; the rest of the paper is mainly about polar coding. Section V develops an upper bound on the block error probability of polar coding under SC and decoding and proves Theorem 3. Section VI considers polar coding for symmetric B-DMCs and proves Theorem 4. Sec- , which tion VII gives an analysis of the encoder mapping results in efficient encoder implementations. In Section VIII, we give an implementation of SC decoding with complexity . In Section IX, we discuss the code construction statistical algorithm for complexity and propose an approximate code construction. In Section X, we explain why RM codes have a poor asymptotic performance under SC de- coding. In Section XI, we point out some generalizations of the present work, give some complementary remarks, and state some open problems. II. RECURSIVE CHANNEL TRANSFORMATIONS We have defined a blockwise channel combining and split- independent ting operation by (4) and (5) which transformed copies of . The goal in this section is to show that this blockwise channel transformation can be broken recursively into single-step channel transformations. into We say that a pair of binary-input channels and are obtained by a single-step transformation of two independent copies of a binary-input channel and write iff there exists a one-to-one mapping such that for all . According to this, we can write any given B-DMC because (17) (18) for (19) (20) which are in the form of (17) and (18) by taking mapping. It turns out we can write more generally as the identity This follows as a corollary to the following. Proposition 3: For any , (21) (22)
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES 3057 transformation (21). By understanding the local behavior, we will be able to reach conclusions about the overall transforma- . Proofs of the results in tion from this section are given in the Appendix. to A. Local Transformation of Rate and Reliability Proposition 4: Suppose of binary-input channels. Then for some set (24) (25) with equality iff equals or . The equality (24) indicates that the single-step channel trans- form preserves the symmetric capacity. The inequality (25) to- gether with (24) implies that the symmetric capacity remains unchanged under a single-step transform, , iff is either a perfect channel or a completely noisy is neither perfect nor completely noisy, the single-step one. If transform moves the symmetric capacity away from the center , thus helping polar- in the sense that ization. Fig. 5. The channel transformation process with N = 8 channels. and Proposition 5: Suppose of binary-input channels. Then (23) for some set (26) (27) (28) This proposition is proved in the Appendix. The transform relationship (21) can now be justified by noting that (22) and (23) are identical in form to (17) and (18), respectively, after the following substitutions: to Thus, we have shown that the blockwise channel transforma- tion from breaks at a local level into single-step channel transformations of the form (21). The full set of such transformations form a fabric as shown in Fig. 5 for . Reading from right to left, the figure starts with four copies of the transformation and con- tinues in butterfly patterns, each representing a channel trans- formation of the form . The two channels at the right endpoints of the butterflies are always identical and independent. At the rightmost level there are eight ; at the next level to the left, there are independent copies of four independent copies of each; and so on. Each step to the left doubles the number of channel types, but halves the number of independent copies. and Equality holds in (27) iff iff equals or is a BEC. We have , or equivalently, iff equals or . This result shows that reliability can only improve under a single-step channel transform in the sense that with equality iff is a BEC. Since the BEC plays a special role with respect to (w.r.t.) extremal behavior of reliability, it deserves special attention. (29) Proposition 6: Consider . If probability , then the channels and erasure probabilities if is a BEC, then or channel transformation the is a BEC with some erasure are BECs with , respectively. Conversely, and is BEC. B. Rate and Reliability for We now return to the context at the end of Section II. Proposition 7: For any B-DMC the transformation is rate-preserving and reliability-improving in the sense that III. TRANSFORMATION OF RATE AND RELIABILITY We now investigate how the rate and reliability parameters, , change through a local (single-step) and (30) (31)
3058 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009 with equality in (31) iff the rate and reliability away from the center in the sense that is a BEC. Channel splitting moves (32) (33) with equality in (32) and (33) iff ability terms further satisfy equals or . The reli- (34) (35) with equality in (34) iff reliability satisfy is a BEC. The cumulative rate and Fig. 6. The tree process for the recursive channel construction. (36) (37) with equality in (37) iff is a BEC. This result follows from Propositions 4 and 5 as a special case and no separate proof is needed. The cumulative relations (36) and (37) follow by repeated application of (30) and (31), respectively. The conditions for equality in Proposition 4 are ; this is possible because stated in terms of i) by Proposition 4, ; and ii) is a BEC, which follows from Proposition 6 rather than iff is a BEC iff by induction. For the special case that is a BEC with an erasure proba- bility , it follows from Propositions 4 and 6 that the parameters can be computed through the recursion (38) with sure probability of the channel follow from (38) by the fact that . The parameter equals the era- . The recursive relations (6) for a BEC. IV. CHANNEL POLARIZATION We prove the main results on channel polarization in this sec- tion. The analysis is based on the recursive relationships de- picted in Fig. 5; however, it will be more convenient to re-sketch Fig. 5 as a binary tree as shown in Fig. 6. The root node of the gives birth to tree is associated with the channel an upper channel , which are as- sociated with the two nodes at level in turn , and so on. The channel gives birth to channels and a lower channel . The channel . The root and is located at level of the tree at node number counting from the top. There is a natural indexing of nodes of the tree in Fig. 6 by bit sequences. The root node is indexed with the null sequence. and the lower node is indexed with The upper node at level with . Given a node at level with index , the upper and the lower node emanating from it has the label node is situated at the node denote the channel as . According to this labeling, the channel . We alternatively located at node with . or . For any , given that with probability We define a random tree process, denoted , in connection with Fig. 6. The process begins at the root of the tree with each. Thus, equals through the channel tree may be thought the path taken by of as being driven by a sequence of independent and identically distributed (i.i.d.) Bernoulli RVs has equals or with equal probability. Given that , the random channel process taken on a sample value takes the value . In order to keep track of the rate and reliability parameters of the random sequence of channels where , we define the random processes and . For a more precise formulation of the problem, we consider is the space of all binary is the Borel field (BF) the probability space sequences generated by the cylinder sets where probability measure defined on such that , and is the . For each , we define as the trivial BF consisting of the null set and as the BF generated by the . We only. cylinder sets define Clearly, . defined as follows. For The random processes described above can now be formally , define and and . For , define
分享到:
收藏