Low-Density Parity-Check Codes
Robert G. Gallager
1963
Preface
The Noisy Channel Coding Theorem discovered by C. E. Shannon in 1948 of
fered communication engineers the possibility of reducing error rates on noisy
channels to negligible levels w ithout sacrificing data rates. The prim ary obstacle
to the practical use of this theorem has been the equipment complexity and the
com putation tim e required to decode the noisy received data.
This monograph presents a technique for achieving high data rates and neg
ligible error probabilities on noisy channels w ith a reasonable amount of equip
ment. The advantages and disadvantages of this technique over other techniques
for the same purpose are neither simple nor clear-cut, and depend prim arily
upon the channel and the type of service required. More im portant than the
particular technique, however, is the hope th at the concepts here w ill lead to
new and better coding procedures.
The chapters of the monograph are arranged in such a way th at w ith the
exception of Chapter 5 each chapter can be read independently of the others.
Chapter 1 sets the background of the study, summarizes the results, and briefly
compares low-density coding w ith other coding schemes. Chapter 2 analyzes
the distances between code words in low-density codes and Chapter 3 applies
these results to the problem of bounding the probability of decoding error that
can be achieved for these codes on a broad class of binary-input channels. The
results of Chapter 3 can be im m ediately applied to any code or class of codes
for which the distance properties can be bounded. Chapter 4 presents a simple
decoding algorithm for these codes and analyzes the resulting error probabil
ity. Chapter 5 briefly extends all the previous results to m ulti-in pu t channels,
and Chapter 6 presents the results of computer simulation of the low-density
decoding algorithm .
The work reported here is an expanded and revised version of my doctoral
dissertation, completed in 1960 in the Department of Electrical Engineering,
M .I.T .
I am grateful to my thesis supervisor, Professor Peter Elias, and to
my thesis readers, Professors Robert M . Fano and John M . Wozencraft, for
assistance and encouragement both during the course of the thesis and later.
This research was made possible in part by support extended by the Research
Laboratory of Electronics of the Massachusetts Institu te of Technology, which is
supported in part by the U.S. Arm y, the A ir Force Office of Scientific Research,
and the Office of Naval Research; additional support was received through the
National Science Foundation (Grant G-16526) and the National Institu te of
Health (Grant MH-04737-03).
Much of Chapter 4 is reprinted w ith permission of the editors from an article
by the author in the Transactions of the I.R .E., IT -9 , pages 21 to 28.
The experimental results in Chapter 6 were obtained in part through the
support of the Rome A ir Development Center and in part through the support
of the M .I.T . Com putation Center.
Cambridge, Mass.
July, 1963
Robert G. Gallager
Contents
1
In tr o d u c tio n
1.1 Coding for D igital Data T ra n s m is s io n ..............................................
1.2 Low-Density Parity-Check C o d e s ........................................................
1.3 Summary of R e s u lts ...............................................................................
1.4 Comparison w ith Other S chem es........................................................
2 D is ta n c e F u n c tio n s
2.1 Equiprobable Ensemble of Parity-Check C o d e s ..............................
2.2 Distance Properties of Low-Density C od es........................................
4
4
7
7
9
11
11
13
3 P ro b a b ility o f D e c o d in g E r r o r
21
3.1 Symmetric Binary Input C h a n n e ls ................................................ 21
21
3.2 Distance P roperties..................................................................................
23
3.3 Upper Bound on P robability of Decoding E r r o r ..............................
25
3.4 Chernov B o u n d s ......................................................................................
3.5 Pe for Codes and Ensembles of Codes
..............................................
28
31
3.6 E rror P robability for Equiprobable Code E nsem ble.......................
34
3.7 B inary Symmetric C h a n n e l..................................................................
3.8 Upper Bound on Rate for Low-Density C o d e s .................................
37
4 D e c o d in g
4.1
In tro d u c tio n ................................................................................................
4.2 Probabilistic D e c o d in g ............................................................................
4.3 Probability of E rror Using Probabilistic D ecoding..........................
5 L o w -D e n s ity C odes w ith A r b itr a r y A lp h a b e t Sizes
5.1 Distance F u n c tio n s ..................................................................................
5.2 Probability of Decoding E r r o r ...............................................................
5.3 Probabilistic D e c o d in g ............................................................................
5.4 Probability of E rror Using Probabilistic D ecoding..........................
6 E x p e rim e n ta l R e s u lts
6.1 Code G e n e ra tio n ......................................................................................
6.2 B inary Symmetric C h a n n e l..................................................................
6.3 W hite Gaussian Noise C h a n n e l...........................................................
6.4 Rayleigh Fading Channel
.....................................................................
A P ro p e rtie s o f th e F u n c tio n B(X)
B M isce lla n e o u s M a th e m a tic a l D e riv a tio n s fo r C h a p te r 3
C A n a ly s is o f N u m b e r o f In d e p e n d e n t D e c o d in g Ite ra tio n s
R eferences
In d e x
39
39
40
44
49
49
51
52
54
57
57
57
60
62
67
72
81
89
90
Introduction
1
1.1 Coding for Digital Data Transmission
The need for efficient and reliable digital data communication systems has been
rising rapidly in recent years. This need has been brought on by a variety of
reasons, among them being the increase in autom atic data processing equipment
and the increased need for long range communication. Attem pts to develop data
systems through the use of conventional m odulation and voice transmission
techniques have generally resulted in systems w ith relatively low date rates and
high error probabilities.
A more fundamental approach to the problems of efficiency and reliab ility
in communication systems is contained in the Noisy Channel Coding theorem
developed by C. E. Shannon [15, 4] in 1948. In order to understand the meaning
of this theorem, consider Figure 1.1. The source produces binary digits, or
binits, at some fixed tim e rate R f. The encoder is a device th at performs data
Figure 1.1: Block diagram of a communication system.
processing, m odulation, and anything else th at m ight be necessary to prepare
the data for transmission over the channel. We shall assume, however, that
the encoder separates the source sequence into blocks of v binits and operates
on only one block at a time. The encoder output is then transm itted over the
channel and changed by some sort of random disturbance or noise. The decoder
processes the channel output and produces a delayed replica of the source binits.
The coding theorem states th at for a large variety of channel models, encoders
and decoders exist such th at the probability of the decoder reproducing a source
b in it in error Pe is bounded by
e -i^[EL (R t )+0(i^)] <
< e - v E { R t )
The functions E {R t) and E i( R t) depend upon the channel but not upon v\ they
are positive when R t = 0, and decrease w ith R t u n til they become 0 at some
tim e rate Ct known as the channel capacity. The exact nature of these functions
and the particular class of channels for which this theorem has been proved need
not concern us here. The im portant result is th at the coding constraint length
" is a fundamental parameter of a communication system. If a channel is to be
used efficiently, th at is w ith R t close to Ct, then v must be made correspondingly
large to achieve a satisfactory error probability.
The obvious response of an engineer to such a theorem is: “ Splendid, but how
does one build encoders and decoders th at behave in this way when v is large?”
It is rather sobering to observe th at if an encoder stores a waveform or code
4
word for each possible block of v binits, then the storage requirement must be
proportional to 2V, which is obviously im practical when v is large. Fortunately,
Elias [3] and Reiffen [14] have proved th at for a wide variety of channel models,
the results of the Noisy Channel Coding theorem can be achieved w ith little
equipment complexity at the encoder by the use of parity-check coding. This
w ill be described in more detail later.
U nfortunately, the problem of decoding sim ply but effectively when v is
large appears to be much more difficult than the problem of encoding. Enough
approaches to this problem have been developed to assure one th at the Cod
ing theorem has engineering importance. On the other hand these approaches
have not been carried far enough for the design of an efficient, reliable data
communication system to become a m atter of routine engineering.
This monograph contains a detailed study of one of the three or four most
promising approaches to simple decoding for long constraint length codes. The
purpose of publishing this work is p rim arily to show how such a coding and
decoding scheme would work and where it m ight be useful. Also, naturally,
it is hoped th at this w ill stim ulate further research on the subject. Further
m athem atical analysis w ill probably be fruitless, but there are many interesting
modifications of the scheme th at m ight be made and much experimental work
th at should be done.
In order the prove m athem atically some results about low-density parity-
check codes, we shall assume th at the codes are to be used on a somewhat
restricted and idealized class of channels. It is obvious th at results using such
channel models can be applied only to channels th at closely approximate the
model. However, when studying the probability of decoding error, we are in
terested p rim arily in the extremely atypical events th at cause errors. It is not
easy to find models th at approximate both these atypical events and typical
events. Consequently the analysis of codes on idealized channels can provide
only lim ited insight about real channels, and such insight should be used w ith
caution.
The channel models to be considered here are called symmetric binary-input
channels. By this we mean a time-discrete channel for which the input is a
sequence of the binary digits 0 and 1 and the output is a corresponding sequence
of letters from a discrete or continuous alphabet. The channel is memoryless in
the sense th at given the input at a given tim e, the output at the corresponding
tim e is statistically independent of all other inputs and outputs. The symmetry
requirement w ill be defined precisely in Chapter 3, but roughly it means th at the
outputs can be paired in such a way th at the probability of one output given an
input is the same as th at of the other output of the pair given the other input.
The binary symmetric channel, abbreviated BSC, is a member of this class of
channels in which there are only two output symbols, one corresponding to each
input. The BSC can be entirely specified by the probability of a crossover from
one input to the other output.
I f a symmetric binary-input channel were to be used w ithout coding, a se
quence of binary digits would be transm itted through the channel and the re
ceiver would guess the transm itted symbols one at a tim e from the received
5
symbols.
If coding were to be used, however, the coder would first take se
quences of binary digits carrying the inform ation from the source and would
map these sequences into longer redundant sequences, called code words, for
transmission over the channel. We define the rate R of such codes to be the
ratio of the length of the inform ation sequence to the length of the code word
sequence.
I f the code words are of length n, then there are 2nR possible se
quences from the source th at are mapped into n-length code words. Thus only
a fraction 2—n(1-句 of the 2n different n-length sequences can be used as code
words.
A t the receiver, the decoder, w ith its knowledge of which sequences are code
words, can separate the transm itted n-length code word from the channel noise.
Thus the code word is mapped back into the n R inform ation digits. Many
decoding schemes find the transm itted code word by first making a decision on
each received digit and then using a knowledge of the code words to correct the
errors. This intermediate decision, however, destroys a considerable amount of
inform ation about the transm itted message, as discussed in detail for several
channels in Reference [1]. The decoding scheme to be described here avoids this
intermediate decision and operates directly w ith the a posteriori probabilities
of the input symbols conditional on the corresponding received symbols.
The codes to be discussed here are special examples of parity-check codes1.
The code words of a parity-check code are formed by combining a block of
binary-inform ation digits w ith a block of check digits. Each check digit is the
m odulo 2 sum2 of a pre-specified set of inform ation digits. These form ation rules
for the check digits can be represented conveniently by a parity-check m atrix,
as in Figure 1.2. This m atrix represents a set of linear homogeneous modulo 2
equations called parity-check equations, and the set of code words is the set of
solutions of these equations. We call the set of digits contained in a parity-check
equation a parity-check set. For example, the first parity-check set in Figure 1.2
is the set of digits (1, 2,3,5).
X i
1
1
1
X 2
1
1
0
X 4
0
1
1
x 5
1
0
0
X q
0
1
0
x 7
0
0
1
1
0
1
n ( l — R)
x5 == X! + X2 + X3
X q := Xi + X2 + X4
X 7 ==Xi + X3 + X4
Figure 1.2: Example of a parity-check m atrix.
The use of-parity check codes makes coding (as distinguished from decoding)
relatively simple to implement. Also, as Elias [3] has shown, if a typical parity-
check code of long block length is used on a BSC, and if the code rate is between
critical rate and channel capacity, then the probability of decoding error w ill be
almost as small as th at for the best possible code of th at rate and block length.
xFor a m ore detailed discussion o f parity-check codes, see Peterson [12].
2T he m odu lo 2 sum is 1 if the ordinary sum is odd and 0 if the ordinary sum is even.
Unfortunately, the decoding of parity-check codes is not inherently simple
to implement; thus we must look for special classes of parity-check codes, such
as described in Section 1.2, for which reasonable decoding procedures exist.
1.2 Low*Density Parity*Check Codes
Low-density parity-check codes are codes specified by a m atrix containing m ostly
0,s and relatively few Vs. In particular, an (n, j , k) low-density code is a code of
block length n w ith a m atrix like th at of Figure 2.1, where each column contains
a small fixed number j of l ,s and each row contains a small fixed number k of
Vs. Note th at this type of m atrix does not have the check digits appearing
in diagonal form as do those in Figure 1.2. However, for coding purposes, the
equations represented by these matrices can always be solved to give the check
digits as explicit sums of inform ation digits.
Low density codes are not optim um in the somewhat artificial sense of m in
im izing the probability of decoding error for a given block length, and it can
be shown th at the m axim um rate at which they can be used is bounded below
channel capacity. However, the existence of a simple decoding scheme more
than compensates for these disadvantages.
1.3 Summary of Results
An ensemble of (n, j, k) codes w ill be formed in Chapter 2, and this ensemble
w ill be used to analyze the distance properties of (n, j, k) codes. The distance
between two words in a code is simply the number of digits in which they differ.
Clearly an im portant parameter in a code is the set of distances separating one
code word from all the other code words. In a parity-check code, it can be shown
th at all code words have the same set of distances to the other code words [12].
Thus the distance properties for the ensemble can be summarized by the typical
number of code words at each distance from the all-zero code word. It is found
th at the typical ( n ,j, fc) code for j > 3 has a m inim um distance th at increases
linearly w ith the block length for j and k constant. Figure 2.4 plots the ratio
of m inim um distance to block length for several values of j and k and compares
the ratio w ith the same ratio for ordinary parity-check codes. The (n, j, k)
codes w ith j = 2 exhibit m arkedly different behavior, and it is shown th at the
m inim um distance of an (n, 2, k) code can increase at most logarithm ically w ith
the block length.
In Chapter 3, a general upper bound on the probability of decoding error for
symmetric binary-input channels w ith m axim um -likelihood decoding is derived
for both individual codes and for a rb itra ry ensembles of codes. The bound is
a function of the code only through its distance properties. The assumption of
m axim um -likelihood decoding is made p artly for analytic convenience and p artly
so as to be able to evaluate codes independently of their decoding algorithms.
Any practical decoding algorithm , such as th at described in Chapter 4, involves
a trade-off between error probability and sim plicity; the m axim um -likelihood
7
decoder minimizes the error probability but is to ta lly im practical if the block
length is large.
It is shown in Chapter 3 th at if the distance properties of the code are
exponentially related to the block length, and if the code rate is sufficiently low,
then the bound to P(e) is an exponentially decreasing function of the block
length. For the appropriate ensemble of codes, these bounds reduce to the
usual random coding bounds [3, 4].
For the special case of the binary symmetric channel, a particularly simple
bound to P(e) is found; this is used to show th at over a range of channel
crossover probabilities, a typical low-density code has the same error behavior
as the optim um code of a slightly higher rate. Figure 3.5 illustrates this loss of
effective rate associated w ith low-density codes.
In Chapter 4, two decoding schemes are described. In the first, which is par
ticu la rly simple, the decoder first makes a decision on each digit, then computes
the p arity checks and changes any digit th at is contained in more than some
fixed number of unsatisfied parity-check equations. The process is repeated,
each tim e using the changed digits, u n til the sequence is decoded. The second
decoding scheme is based on a procedure for computing the conditional proba
b ility th at an input symbol is 1; this is conditioned on all the received symbols
th at are in any of the parity-check sets containing the digit in question. Once
again, the procedure is iterated u n til the sequence is decoded. The computation
per d igit per iteration in each scheme is independent of the code length. The
probabilistic, or second scheme, entails slightly more com putation than the first
scheme, but decodes w ith a lower error probability.
A m athem atical analysis of the probability of decoding error using proba
bilistic decoding is difficult because of statistical dependencies. However, for a
BSC w ith sufficiently small cross-over probabilities and for codes w ith j > 4,
a very weak upper bound the probability of error is derived th at decreases
exponentially w ith a root of the code length. Figure 3.5 plots cross-over proba
bilities for which the probability of decoding error is guaranteed to approach 0
w ith increasing code length. It is hypothesized th at the probability of decoding
error actually decreases exponentially w ith block length, while the number of
iterations necessary to decode increases logarithm ically.
Chapter 5 extends all the m ajor results of Chapters 2, 3, and 4 to non
binary low-density parity-check codes. Although the theory generalizes in a
very natural way, the expressions for m inim um distance, error probability, and
probabilistic decoding performance error are sufficiently complicated th at little
insight is gained into the advantages or disadvantages of a m ultilevel system over
a binary system. Some experimental w ork would be helpful here in evaluating
these codes.
Some experimental results for binary low-density codes are presented in
Chapter 6. An IB M 7090 computer was used to simulate both probabilistic
decoding and the noise generated by several different types of channels. Due
to lim ita tion on computer tim e, the only situations investigated were those in
which the channel was sufficiently noisy to yield a probability of decoding er
ror greater than 10—4. The most spectacular data from these experiments are