IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
3051
Channel Polarization: A Method for Constructing
Capacity-Achieving Codes for Symmetric
Binary-Input Memoryless Channels
Erdal Arıkan, Senior Member, IEEE
Abstract—A method is proposed, called channel polarization,
to construct code sequences that achieve the symmetric capacity
I(W ) of any given binary-input discrete memoryless channel
(B-DMC) W . The symmetric capacity is the highest rate achiev-
able subject to using the input letters of the channel with equal
probability. Channel polarization refers to the fact that it is pos-
sible to synthesize, out of N independent copies of a given B-DMC
W , a second set of N binary-input channels fW (i)
N : 1 i N g
such that, as N becomes large, the fraction of indices i for which
I(W (i)
N ) is near 1 approaches I(W ) and the fraction for which
I(W (i)
N ) is near 0 approaches 1 I(W ). The polarized channels
fW (i)
N g are well-conditioned for channel coding: one need only
send data at rate 1 through those with capacity near 1 and at rate 0
through the remaining. Codes constructed on the basis of this idea
are called polar codes. The paper proves that, given any B-DMC
W with I(W ) > 0 and any target rate R < I(W ), there exists a
sequence of polar codes f n; n 1g such that n has block-length
N = 2n, rate R, and probability of block error under suc-
cessive cancellation decoding bounded as Pe(N; R) O(N )
independently of the code rate. This performance is achievable by
encoders and decoders with complexity O(N log N ) for each.
Index Terms—Capacity-achieving codes, channel capacity,
channel polarization, Plotkin construction, polar codes, Reed–
Muller (RM) codes, successive cancellation decoding.
I. INTRODUCTION AND OVERVIEW
A FASCINATING aspect of Shannon’s proof of the noisy
channel coding theorem is the random-coding method
that he used to show the existence of capacity-achieving code
sequences without exhibiting any specific such sequence [1].
Explicit construction of provably capacity-achieving code
sequences with low encoding and decoding complexities has
since then been an elusive goal. This paper is an attempt to
meet this goal for the class of binary-input discrete memoryless
channels (B-DMCs).
We will give a description of the main ideas and results of the
paper in this section. First, we give some definitions and state
some basic facts that are used throughout the paper.
Manuscript received October 14, 2007; revised August 13, 2008. Current ver-
sion published June 24, 2009. This work was supported in part by The Scien-
tific and Technological Research Council of Turkey (TÜB˙ITAK) under Project
107E216 and in part by the European Commission FP7 Network of Excellence
NEWCOM++ under Contract 216715. The material in this paper was presented
in part at the IEEE International Symposium on Information Theory (ISIT),
Toronto, ON, Canada, July 2008.
The author is with the Department of Electrical-Electronics Engineering,
Bilkent University, Ankara, 06800, Turkey (e-mail: arikan@ee.bilkent.edu.tr).
Communicated by Y. Steinberg, Associate Editor for Shannon Theory.
Color versions of Figures 4 and 7 in this paper are available online at http://
ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TIT.2009.2021379
A. Preliminaries
We write
input alphabet
, output alphabet
to denote a generic B-DMC with
, and transition probabilities
. The input alphabet will always be
, the output alphabet and the transition probabilities may
to denote the channel corresponding
with
be arbitrary. We write
; thus,
to
.
uses of
Given a B-DMC , there are two channel parameters of pri-
mary interest in this paper: the symmetric capacity
and the Bhattacharyya parameter
These parameters are used as measures of rate and reliability,
is the highest rate at which reliable commu-
respectively.
with equal
nication is possible across
is an upper bound on the probability of max-
frequency.
imum-likelihood (ML) decision error when
is used only once
to transmit a
using the inputs of
or
.
It is easy to see that
we will use base-
values in
will be bits.
takes values in
. Throughout,
will also take
. The unit for code rates and channel capacities
logarithms; hence,
Intuitively, one would expect that
and
the Appendix, make this precise.
iff
,
. The following bounds, proved in
iff
Proposition 1: For any B-DMC
, we have
(1)
(2)
The symmetric capacity
when
exists a permutation
equals the Shannon capacity
is a symmetric channel, i.e., a channel for which there
such that i)
. The bi-
nary symmetric channel (BSC) and the binary erasure channel
(BEC) are examples of symmetric channels. A BSC is a B-DMC
of the output alphabet
for all
and ii)
with
. A B-DMC
and
is called a BEC if for each
or
, either
. In the latter case,
0018-9448/$25.00 © 2009 IEEE
3052
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
Fig. 1. The channel W .
erasure symbols
is said to be an erasure symbol. The sum of
over all
is called the erasure probability of the BEC.
We denote random variables (RVs) by upper case letters, such
, and their realizations (sample values) by the corre-
as
sponding lower case letters, such as
denotes the probability assignment on
of RVs
We use the standard notation
mutual information and its conditional form, respectively.
. For a joint ensemble
denotes the joint probability assignment.
to denote the
an RV,
. For
We use the notation
as shorthand for denoting a row vector
. Given such a vector
, to denote the subvector
and
as void. Given
the subvector
with odd indices
note the subvector with even indices
For example, for
. We write
,
, we write
is regarded
; if
, we write
to denote
to denote the subvector
to de-
even .
odd . We write
, we have
the all-zero vector.
. The notation
is used to denote
Code constructions in this paper will be carried out in vector
. Unless specified otherwise,
spaces over the binary field GF
all vectors, matrices, and operations on them will be over
we write
GF
sum. The
and an
vectors over GF
to denote their componentwise mod-
. In particular, for
-by- matrix
Kronecker product of an
-by- matrix
Fig. 2. The channel W and its relation to W and W .
channel
, where
can be any power of two,
. The recursion begins at the th level
with only one copy of
. The first level
of the recursion combines two independent copies of
and we set
as shown in Fig. 1 and obtains the channel
with the transition probabilities
The next level of the recursion is shown in Fig. 2 where two
are combined to create the channel
independent copies of
with transition probabilities
.
(3)
In Fig. 2,
is the permutation operation that maps an input
to
from the input of
can be written as
. The mapping
to the input of
with
is defined as
...
...
...
Thus, we have the relation
tween the transition probabilities of
and those of
.
be-
which is an
defined as
convention that
We write
-by- matrix. The Kronecker power
is
. We will follow the
for all
.
write
equals
to denote the number of elements in a set
; thus,
to denote the indicator function of a set
if
otherwise.
and
We use the standard Landau notation
denote the asymptotic behavior of functions.
B. Channel Polarization
. We
to
independent copies of a given B-DMC
channels
Channel polarization is an operation by which one manufac-
a
tures out of
second set of
that show a
becomes large, the
polarization effect in the sense that, as
symmetric capacity terms
for all
but a vanishing fraction of indices . This operation consists of
a channel combining phase and a channel splitting phase.
tend towards or
1) Channel Combining: This phase combines copies of a
in a recursive manner to produce a vector
given B-DMC
The general form of the recursion is shown in Fig. 3 where
are combined to produce the
is first transformed
. The input vector
two independent copies of
channel
into
to
and
so that
for
. The operator
the reverse shuffle operation, and acts on its input
in the figure is a permutation, known as
to produce
, which becomes the
input to the two copies of
as shown in the figure.
We observe that the mapping
It follows by induction that the overall mapping
from the input of the synthesized channel
the underlying raw channels
represented by a matrix
the generator matrix of size
the two channels
so that
and
is linear over GF
.
,
to the input of
, is also linear and may be
. We call
. The transition probabilities of
are related by
(4)
. We will show in Section VII that
is a
.
, where
for all
equals
for any
permutation matrix known as bit-reversal and
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES
3053
. This recursion is valid only for BECs
with
and it is proved in Section III. No efficient algorithm is known
for calculation of
for a general B-DMC
.
Fig. 4 shows that
tends to be near
for small
for large . However,
and near
shows an erratic be-
havior for an intermediate range of . For general B-DMCs, de-
termining the subset of indices
is above a
given threshold is an important computational problem that will
be addressed in Section IX.
for which
4) Rate of Polarization: For proving coding theorems, the
speed with which the polarization effect takes hold as a function
is important. Our main result in this regard is given in terms
of
of the parameters
Fig. 3. Recursive construction of W from two copies of W .
(7)
Note that the channel combining operation is fully specified by
have the same set of
the matrix
rows, but in a different (bit-reversed) order; we will discuss this
topic more fully in Section VII.
. Also note that
and
out of
2) Channel Splitting: Having synthesized the vector channel
, the next step of channel polarization is to
binary-input coordinate channels
, defined by the transition
back into a set of
,
split
probabilities
Theorem 2: For any B-DMC
any fixed
,
with
, and
there exists a sequence of sets
, such that
and
for all
.
This theorem is proved in Section IV-B.
We stated the polarization result in Theorem 2 in terms
because this form is better
suited to the coding results that we will develop. A rate of
polarization result in terms of
can be obtained from
Theorem 2 with the help of Proposition 1.
rather than
(5)
C. Polar Coding
where
denotes the output of
and
its input.
To gain an intuitive understanding of the channels
,
consider a genie-aided successive cancellation decoder in which
the th decision element estimates
and
(supplied correctly by the genie
the past channel inputs
is a
regardless of any decision errors at earlier stages). If
priori uniform on
is the effective channel seen
by the th decision element in this scenario.
after observing
, then
3) Channel Polarization:
Theorem 1: For any B-DMC
, the channels
larize in the sense that, for any fixed
to infinity through powers of two, the fraction of indices
, as
for which
the fraction for which
goes to
goes to
.
po-
goes
and
This theorem is proved in Section IV.
The polarization effect is illustrated in Fig. 4 for the case
. The numbers
is a BEC with erasure probability
have been computed using the recursive relations
(6)
We take advantage of the polarization effect to construct
by a
codes that achieve the symmetric channel capacity
method we call polar coding. The basic idea of polar coding is
to create a coding system where one can access each coordinate
channel
individually and send data only through those for
which
is near
.
1)
-Coset Codes: We first describe a class of block codes
that contain polar codes—the codes of main interest—as a spe-
for this class are restricted to
cial case. The block lengths
, each
powers of two,
code in the class is encoded in the same manner, namely
. For a given
for some
where
For
is the generator matrix of order
, defined above.
an arbitrary subset of
, we may write (8) as
(8)
(9)
formed by the rows
denotes the submatrix of
where
with indices in
If we now fix
.
, but leave
obtain a mapping from source blocks
as a free variable, we
to codeword blocks
. This mapping is a coset code: it is a coset of the linear block
and
3054
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
Fig. 4. Plot of I(W ) versus i = 1; . . . ; N = 2
for a BEC with = 0:5.
code with generator matrix
by the fixed vector
of codes collectively as
codes will be identified by a parameter vector
where
ratio
formation set and to
For example, the
, with the coset determined
. We will refer to this class
-coset
,
.1 The
as the in-
is the code dimension and specifies the size of
is called the code rate. We will refer to
-coset codes. Individual
as frozen bits or vector.
code has the encoder
mapping
(10)
For a source block
, the coded block is
.
Polar codes will be specified shortly by giving a particular
rule for the selection of the information set
.
encoded into a codeword
2) A Successive Cancellation Decoder: Consider a
-coset code with parameter
be
be sent over the channel
, let
be received. The decoder’s
, given knowledge
. Since the decoder can avoid errors in the
, the real decoding task is to
, and let a channel output
task is to generate an estimate
of
frozen part by setting
generate an estimate
. Let
and
of
of
.
The coding results in this paper will be given with respect
to a specific successive cancellation (SC) decoder, unless
some other decoder is mentioned. Given any
-coset code, we will use an SC decoder that generates its
decision
by computing
if
if
(11)
in the order
, where
are decision functions defined as
from to
if
otherwise
,
(12)
The decision functions
for all
block error occurred if
. We will say that a decoder
or equivalently if
.
defined above resemble ML de-
cision functions but are not exactly so, because they treat the
as RVs, rather than
future frozen bits
can be
as known bits. In exchange for this suboptimality,
computed efficiently using recursive formulas, as we will show
in Section II. Apart from algorithmic efficiency, the recursive
structure of the decision functions is important because it ren-
ders the performance analysis of the decoder tractable. Fortu-
nately, the loss in performance due to not using true ML decision
functions happens to be negligible:
is still achievable.
3) Code Performance: The notation
will
denote the probability of block error for an
code, assuming that each data vector
probability
More precisely
is sent with
and decoding is done by the above SC decoder.
The average of
be denoted by
, i.e.,
over all choices for
will
1We include the redundant parameter K in the parameter set because often
we consider an ensemble of codes with K fixed and A free.
A key bound on block error probability under SC decoding is
the following.
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES
3055
Proposition 2: For any B-DMC
and any choice of the
parameters
(13)
Hence, for each
that
, there exists a frozen vector
such
(14)
This is proved in Section V-B. This result suggests choosing
from among all
so as to minimize
the right-hand side (RHS) of (13). This idea leads to the defini-
tion of polar codes.
-subsets of
4) Polar Codes: Given a B-DMC
, a
-coset code with
parameter
if the information set
such that
.
will be called a polar code for
is chosen as a
-element subset of
for all
Polar codes are channel-specific designs: a polar code for one
channel may not be a polar code for another. The main result of
this paper will be to show that polar coding achieves the sym-
metric capacity
of any given B-DMC
.
as a
specify
-element subset of
An alternative rule for polar code definition would be to
such that
. This alternative
. However, the rule based on the
rule would also achieve
Bhattacharyya parameters has the advantage of being connected
with an explicit bound on block error probability.
for all
The polar code definition does not specify how the frozen
is to be chosen; it may be chosen at will. This de-
vector
simplifies the performance
gree of freedom in the choice of
analysis of polar codes by allowing averaging over an ensemble.
However, it is not for analytical convenience alone that we do
, but also because
not specify a precise rule for selecting
it appears that the code performance is relatively insensitive to
that choice. In fact, we prove in Section VI-B that, for symmetric
channels, any choice for
is as good as any other.
5) Coding Theorems: Fix a B-DMC
and a number
.
selected in
be defined as
Let
accordance with the polar coding rule for
is the probability of block error under SC decoding for polar
, averaged over
coding over
. The main coding result of
all choices for the frozen bits
this paper is the following.
with block length
with
. Thus,
and rate
Theorem 3: For any given B-DMC
,
block error probability for polar coding under successive can-
cellation decoding satisfies
and fixed
(15)
This theorem follows as an easy corollary to Theorem 2 and
the bound (13), as we show in Section V-B. For symmetric chan-
nels, we have the following stronger version of Theorem 3.
Theorem 4: For any symmetric B-DMC
, consider any sequence of
and any fixed
-coset codes
with
increasing to infinity,
chosen in accordance with the polar coding rule for
fixed arbitrarily. The block error probability under successive
cancellation decoding satisfies
, and
(16)
This is proved in Section VI-B. Note that for symmetric chan-
nels
equals the Shannon capacity of
.
6) Complexity: An important issue about polar coding is
the complexity of encoding, decoding, and code construction.
The recursive structure of the channel polarization construction
leads to low-complexity encoding and decoding algorithms for
-coset codes, and in particular, for polar codes.
the class of
Theorem 5: For the class of
-coset codes, the complexity
of encoding and the complexity of successive cancellation
as functions of code block
decoding are both
length
.
This theorem is proved in Sections VII and VIII. Notice that
the complexity bounds in Theorem 5 are independent of the
code rate and the way the frozen vector is chosen. The bounds
, but clearly this has no practical
hold even at rates above
significance.
As for code construction, we have found no low-complexity
algorithms for constructing polar codes. One exception is the
case of a BEC for which we have a polar code construction al-
. We discuss the code construc-
gorithm with complexity
tion problem further in Section IX and suggest a low-complexity
statistical algorithm for approximating the exact polar code con-
struction.
D. Relations To Previous Work
This paper is an extension of work begun in [2], where
channel combining and splitting were used to show that im-
provements can be obtained in the sum cutoff rate for some
specific DMCs. However, no recursive method was suggested
there to reach the ultimate limit of such improvements.
As the present work progressed, it became clear that polar
coding had much in common with Reed–Muller (RM) coding
[3], [4]. Indeed, recursive code construction and SC decoding,
which are two essential ingredients of polar coding, appear to
have been introduced into coding theory by RM codes.
According to one construction of RM codes, for any
and
dimension
whose generator matrix
of the rows of
, denoted
, an RM code with block length
and
, is defined as a linear code
is obtained by deleting
so that none of the deleted rows
’s in that row) than
has a larger Hamming weight (number of
rows. For instance
any of the remaining
and
This construction brings out the similarities between RM
have the same
, it is
-coset codes.
codes and polar codes. Since
set of rows (only in a different order) for any
clear that RM codes belong to the class of
and
3056
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
For example,
is the
of a
-coset code of a given size
-coset code with parameter
. So, RM coding and polar coding may
be regarded as two alternative rules for selecting the infor-
.
mation set
Unlike polar coding, RM coding selects the information set
in a channel-independent manner; it is not as fine-tuned to
the channel polarization phenomenon as polar coding is. We
will show in Section X that, at least for the class of BECs, the
RM rule for information set selection leads to asymptotically
unreliable codes under SC decoding. So, polar coding goes
beyond RM coding in a nontrivial manner by paying closer
attention to channel polarization.
Another connection to existing work can be established by
codes, which
noting that polar codes are multilevel
are a class of codes originating from Plotkin’s method for code
combining [5]. This connection is not surprising in view of the
codes [6, pp.
fact that RM codes are also multilevel
114–125]. However, unlike typical multilevel code construc-
tions, where one begins with specific small codes to build larger
ones, in polar coding the multilevel code is obtained by expur-
, with respect
gating rows of a full-order generator matrix
en-
to a channel-specific criterion. The special structure of
sures that, no matter how expurgation is done, the resulting code
code. In essence, polar coding enjoys
is a multilevel
the freedom to pick a multilevel code from an ensemble of such
codes so as to suit the channel at hand, while conventional ap-
proaches to multilevel coding do not have this degree of flexi-
bility.
Finally, we wish to mention a “spectral” interpretation
of polar codes which is similar to Blahut’s treatment of
Bose–Chaudhuri–Hocquenghem (BCH) codes [7, Ch. 9]; this
type of similarity has already been pointed out by Forney
[8, Ch. 11] in connection with RM codes. From the spectral
viewpoint, the encoding operation (8) is regarded as a transform
to a “time”
of a “frequency” domain information vector
domain codeword vector
. The transform is invertible with
. The decoding operation is regarded as a spec-
tral estimation problem in which one is given a time domain
, and asked to
observation
estimate
. To aid the estimation task, one is allowed to freeze
. This spectral
a certain number of spectral components of
interpretation of polar coding suggests that it may be possible
to treat polar codes and BCH codes in a unified framework.
The spectral interpretation also opens the door to the use of
various signal processing techniques in polar coding; indeed,
in Section VII, we exploit some fast transform techniques in
designing encoders for polar codes.
, which is a noisy version of
E. Paper Outline
The rest of the paper is organized as follows. Section II ex-
plores the recursive properties of the channel splitting operation.
In Section III, we focus on how
get trans-
formed through a single step of channel combining and split-
ting. We extend this to an asymptotic analysis in Section IV
and complete the proofs of Theorems 1 and 2. This completes
the part of the paper on channel polarization; the rest of the
paper is mainly about polar coding. Section V develops an upper
bound on the block error probability of polar coding under SC
and
decoding and proves Theorem 3. Section VI considers polar
coding for symmetric B-DMCs and proves Theorem 4. Sec-
, which
tion VII gives an analysis of the encoder mapping
results in efficient encoder implementations. In Section VIII,
we give an implementation of SC decoding with complexity
. In Section IX, we discuss the code construction
statistical algorithm for
complexity and propose an
approximate code construction. In Section X, we explain why
RM codes have a poor asymptotic performance under SC de-
coding. In Section XI, we point out some generalizations of
the present work, give some complementary remarks, and state
some open problems.
II. RECURSIVE CHANNEL TRANSFORMATIONS
We have defined a blockwise channel combining and split-
independent
ting operation by (4) and (5) which transformed
copies of
. The goal in this section is to
show that this blockwise channel transformation can be broken
recursively into single-step channel transformations.
into
We say that a pair of binary-input channels
and
are obtained by a single-step transformation
of two independent copies of a binary-input channel
and write
iff there exists a one-to-one mapping
such that
for all
.
According to this, we can write
any given B-DMC
because
(17)
(18)
for
(19)
(20)
which are in the form of (17) and (18) by taking
mapping.
It turns out we can write more generally
as the identity
This follows as a corollary to the following.
Proposition 3: For any
,
(21)
(22)
ARIKAN: A METHOD FOR CONSTRUCTING CAPACITY-ACHIEVING CODES
3057
transformation (21). By understanding the local behavior, we
will be able to reach conclusions about the overall transforma-
. Proofs of the results in
tion from
this section are given in the Appendix.
to
A. Local Transformation of Rate and Reliability
Proposition 4: Suppose
of binary-input channels. Then
for some set
(24)
(25)
with equality iff
equals
or
.
The equality (24) indicates that the single-step channel trans-
form preserves the symmetric capacity. The inequality (25) to-
gether with (24) implies that the symmetric capacity remains
unchanged under a single-step transform,
, iff
is either a perfect channel or a completely noisy
is neither perfect nor completely noisy, the single-step
one. If
transform moves the symmetric capacity away from the center
, thus helping polar-
in the sense that
ization.
Fig. 5. The channel transformation process with N = 8 channels.
and
Proposition 5: Suppose
of binary-input channels. Then
(23)
for some set
(26)
(27)
(28)
This proposition is proved in the Appendix. The transform
relationship (21) can now be justified by noting that (22) and
(23) are identical in form to (17) and (18), respectively, after the
following substitutions:
to
Thus, we have shown that the blockwise channel transforma-
tion from
breaks at a local level into
single-step channel transformations of the form (21). The full
set of such transformations form a fabric as shown in Fig. 5 for
. Reading from right to left, the figure starts with four
copies of the transformation
and con-
tinues in butterfly patterns, each representing a channel trans-
formation of the form
. The
two channels at the right endpoints of the butterflies are always
identical and independent. At the rightmost level there are eight
; at the next level to the left, there are
independent copies of
four independent copies of
each; and so on.
Each step to the left doubles the number of channel types, but
halves the number of independent copies.
and
Equality holds in (27) iff
iff
equals or
is a BEC. We have
, or equivalently, iff
equals
or
.
This result shows that reliability can only improve under a
single-step channel transform in the sense that
with equality iff
is a BEC.
Since the BEC plays a special role with respect to (w.r.t.)
extremal behavior of reliability, it deserves special attention.
(29)
Proposition 6: Consider
. If
probability , then the channels
and
erasure probabilities
if
is a BEC, then
or
channel
transformation
the
is a BEC with some erasure
are BECs with
, respectively. Conversely,
and
is BEC.
B. Rate and Reliability for
We now return to the context at the end of Section II.
Proposition 7: For any B-DMC
the transformation
is rate-preserving and reliability-improving in the sense that
III. TRANSFORMATION OF RATE AND RELIABILITY
We now investigate how the rate and reliability parameters,
, change through a local (single-step)
and
(30)
(31)
3058
IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 55, NO. 7, JULY 2009
with equality in (31) iff
the rate and reliability away from the center in the sense that
is a BEC. Channel splitting moves
(32)
(33)
with equality in (32) and (33) iff
ability terms further satisfy
equals
or
. The reli-
(34)
(35)
with equality in (34) iff
reliability satisfy
is a BEC. The cumulative rate and
Fig. 6. The tree process for the recursive channel construction.
(36)
(37)
with equality in (37) iff
is a BEC.
This result follows from Propositions 4 and 5 as a special
case and no separate proof is needed. The cumulative relations
(36) and (37) follow by repeated application of (30) and (31),
respectively. The conditions for equality in Proposition 4 are
; this is possible because
stated in terms of
i) by Proposition 4,
; and ii)
is a BEC, which follows from Proposition 6
rather than
iff
is a BEC iff
by induction.
For the special case that
is a BEC with an erasure proba-
bility , it follows from Propositions 4 and 6 that the parameters
can be computed through the recursion
(38)
with
sure probability of the channel
follow from (38) by the fact that
. The parameter
equals the era-
. The recursive relations (6)
for
a BEC.
IV. CHANNEL POLARIZATION
We prove the main results on channel polarization in this sec-
tion. The analysis is based on the recursive relationships de-
picted in Fig. 5; however, it will be more convenient to re-sketch
Fig. 5 as a binary tree as shown in Fig. 6. The root node of the
gives birth to
tree is associated with the channel
an upper channel
, which are as-
sociated with the two nodes at level
in turn
, and so on. The channel
gives birth to channels
and a lower channel
. The channel
. The root
and
is located at level
of the tree at node number counting
from the top.
There is a natural indexing of nodes of the tree in Fig. 6 by
bit sequences. The root node is indexed with the null sequence.
and the lower node
is indexed with
The upper node at level
with . Given a node at level with index
, the upper
and the lower
node emanating from it has the label
node
is situated at the node
denote the channel
as
. According to this labeling, the channel
. We
alternatively
located at node
with
.
or
. For any
, given that
with probability
We define a random tree process, denoted
, in
connection with Fig. 6. The process begins at the root of the tree
with
each. Thus,
equals
through the channel tree may be thought
the path taken by
of as being driven by a sequence of independent and identically
distributed (i.i.d.) Bernoulli RVs
has
equals or with equal probability. Given that
, the random channel process
taken on a sample value
takes the value
. In order to keep track of the rate
and reliability parameters of the random sequence of channels
where
, we define the random processes
and
.
For a more precise formulation of the problem, we consider
is the space of all binary
is the Borel field (BF)
the probability space
sequences
generated by the cylinder sets
where
probability measure defined on
such that
, and
is the
. For each
, we define
as the trivial BF consisting of the null set and
as the BF generated by the
. We
only.
cylinder sets
define
Clearly,
.
defined as follows. For
The random processes described above can now be formally
, define
and
and
. For
, define