Elements of Information Theory
Thomas M. Cover, Joy A. Thomas
Copyright 1991 John Wiley & Sons, Inc.
Print ISBN 0-471-06259-6 Online ISBN 0-471-20061-1
Elements of Information
Theory
Elements of Information Theory
Thomas M. Cover, Joy A. Thomas
Copyright 1991 John Wiley & Sons, Inc.
Print ISBN 0-471-06259-6 Online ISBN 0-471-20061-1
WILEY SERIES IN
TELECOMMUNICATIONS
Donald L. Schilling, Editor
City College of New York
Digital Telephony, 2nd Edition
John Bellamy
Elements of Information Theory
Thomas M. Cover and Joy A. Thomas
Telecommunication System Engineering, 2nd Edition
Roger L. Freeman
Telecommunication Transmission Handbook, 3rd Edition
Roger L. Freeman
Introduction
Robert M. Gagliardi
to Communications Engineering, 2nd Edition
Expert System Applications
Jay Liebowitz
to Telecommunications
Synchronization
Heinrich Meyr and Gerd Ascheid
in Digital Communications, Volume 1
Synchronization
Heinrich Meyr and Gerd Ascheid (in preparation)
in Digital Communications, Volume 2
Computational Methods of Signal Recovery and Recognition
Richard J. Mammone (in preparation)
Business Earth Stations for Telecommunications
Walter L. Morgan and Denis Rouffet
Satellite Communications: The First Quarter Century of Service
David W. E. Rees
Worldwide Telecommunications Guide for the Business Manager
Walter L. Vignault
Elements of Information
Theory
THOMAS M. COVER
Stanford University
Stanford, California
JOY A. THOMAS
IBM T. 1. Watson Research Center
Yorktown Heights, New York
A Wiley-Interscience Publication
JOHN WILEY & SONS, INC.
New York
/ Chichester
/ Brisbane
I Toronto
I Singapore
Copyright 1991 by John Wiley & Sons, Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system or transmitted in
any form or by any means, electronic or mechanical, including uploading, downloading,
printing, decompiling, recording or otherwise, except as permitted under Sections 107 or 108 of
the 1976 United States Copyright Act, without the prior written permission of the Publisher.
Requests to the Publisher for permission should be addressed to the Permissions Department,
John Wiley & Sons, Inc., 605 Third Avenue, New York, NY 10158-0012, (212) 850-6011, fax
(212) 850-6008, E-Mail: PERMREQ@WILEY.COM.
This publication is designed to provide accurate and authoritative information in regard to the
subject matter covered. It is sold with the understanding that the publisher is not engaged in
rendering professional services. If professional advice or other expert assistance is required, the
services of a competent professional person should be sought.
ISBN 0-471-20061-1.
This title is also available in print as ISBN 0-471-06259-6
For more information about Wiley products, visit our web site at www.Wiley.com.
Library of Congress Cataloging in Publication Data:
Cover, T. M., 1938 —
Elements of Information theory / Thomas M. Cover, Joy A. Thomas.
p. cm. — (Wiley series in telecommunications)
“A Wiley-Interscience publication.”
Includes bibliographical references and index.
ISBN 0-471-06259-6
1. Information theory.
I. Thomas, Joy A.
II. Title.
III. Series.
Q360.C68
1991
0030.54 — dc20
Printed in the United States of America
20 19 18 17 16 15 14 13
90-45484
CIP
To my father
Tom Cover
To my parents
Joy Thomas
Preface
to be a simple and accessible book on information
This is intended
theory. As Einstein said, “Everything should be made as simple as
possible, but no simpler.” Although we have not verified the quote (first
found in a fortune cookie), this point of view drives our development
throughout
the book. There are a few key ideas and techniques that,
when mastered, make the subject appear simple and provide great
intuition on new questions.
This book has arisen from over ten years of lectures in a two-quarter
sequence of a senior and first-year graduate level course in information
theory, and is intended as an introduction
theory for
students of communication
to information
theory, computer science and statistics.
inherent
theory. First, certain quantities
There are two points to be made about the simplicities
in
information
like entropy and mutual
information arise as the answers to fundamental questions. For exam-
ple, entropy is the minimum descriptive complexity of a random vari-
able, and mutual information
is the communication rate in the presence
of noise. Also, as we shall point out, mutual information corresponds to
the increase in the doubling rate of wealth given side information.
Second, the answers to information
theoretic questions have a natural
algebraic structure. For example, there is a chain rule for entropies, and
entropy and mutual
to
problems
in data compression and communication admit extensive
interpretation. We all know the feeling that follows when one investi-
gates a problem, goes through a large amount of algebra and finally
investigates
the answer to find that the entire problem is illuminated,
not by the analysis, but by the inspection of the answer. Perhaps the
outstanding examples of this
laws and
vii
information are related. Thus
in physics are Newton’s
the answers
. . .
Vlll
PREFACE
Schrodinger’s wave equation. Who could have foreseen the awesome
philosophical
interpretations of Schrodinger’s wave equation?
In the text we often investigate properties of the answer before we
look at the question. For example, in Chapter 2, we define entropy,
relative entropy and mutual
information and study the relationships
and a few interpretations of them, showing how the answers fit together
in various ways. Along the way we speculate on the meaning of the
increase? The
second law of thermodynamics. Does entropy always
answer is yes and no. This is the sort of result
that should please
experts in the area but might be overlooked as standard by the novice.
In fact, that brings up a point that often occurs in teaching. It is fun
to find new proofs or slightly new results that no one else knows. When
one presents these ideas along with
in class,
the response is “sure, sure, sure.” But the excitement of teaching the
material
is greatly enhanced. Thus we have derived great pleasure from
investigating a number of new ideas in this text book.
the established material
the joint
Examples of some of the new material in this text include the chapter
on the relationship of information
theory to gambling, the work on the
universality of the second law of thermodynamics
in the context of
Markov chains,
typicality proofs of the channel capacity
theorem, the competitive optimality of Huffman codes and the proof of
Burg’s theorem on maximum entropy spectral density estimation. AIso
the chapter on Kolmogorov complexity has no counterpart
in other
theory texts. We have also taken delight in relating Fisher
information
information, mutual
and en-
tropy power inequalities. To our surprise, many of the classical results
on determinant
inequalities are most easily proved using information
theory.
information, and the Brunn-Minkowski
Even though the field of information
theory has grown considerably
since Shannon’s original paper, we have strived to emphasize its coher-
ence. While it is clear that Shannon was motivated by problems in
communication
theory, we treat
information
theory as a field of its own with applications to communica-
tion theory and statistics.
theory when he developed information
We were drawn to the field of information
communication
apparent
mation.
theory, probability
impossibility
of capturing
theory from backgrounds in
theory and statistics, because of the
the intangible concept of infor-
Since most of the results
in the book are given as theorems and
proofs, we expect the elegance of the results to speak for themselves. In
many cases we actually describe the properties of the solutions before
introducing
in them-
selves and provide a natural rhythm
the problems. Again, the properties are interesting
for the proofs that follow.
One innovation
inequalities, with no intervening
in the presentation
text,
is our use of long chains of
followed
immediately by the
PREFACE
ix
explanations. By the time the reader comes to many of these proofs, we
expect that he or she will be able to follow most of these steps without
any explanation and will be able to pick out the needed explanations.
These chains of inequalities serve as pop quizzes in which the reader
can be reassured of having the knowledge needed to prove some im-
portant theorems. The natural
flow of these proofs is so compelling that
it prompted us to flout one of the cardinal rules of technical writing. And
the absence of verbiage makes the logical necessity of the ideas evident
and the key ideas perspicuous. We hope that by the end of the book the
reader will share our appreciation of the elegance, simplicity and
naturalness of information
theory.
Throughout
the book we use the method of weakly typical sequences,
which has its origins in Shannon’s original 1948 work but was formally
developed in the early 1970s. The key idea here is the so-called asymp-
totic equipartition
property, which can be roughly paraphrased as
is almost equally probable.”
“Almost everything
Chapter 2, which is the true first chapter of the subject, includes the
basic algebraic relationships of entropy, relative entropy and mutual
information as well as a discussion of the second law of thermodynamics
and sufficient statistics. The asymptotic equipartition property (AKP) is
given central prominence
in Chapter 3. This leads us to discuss the
entropy rates of stochastic processes and data compression in Chapters
4 and 5. A gambling sojourn is taken in Chapter 6, where the duality of
data compression and the growth rate of wealth is developed.
for information
The fundamental
idea of Kolmogorov complexity as an intellectual
foundation
theory is explored in Chapter 7. Here we
replace the goal of finding a description that is good on the average with
the goal of finding the universally shortest description. There is indeed a
universal notion of the descriptive complexity of an object. Here also the
wonderful number ti is investigated. This number, which is the binary
expansion of the probability
that a Turing machine will halt, reveals
many of the secrets of mathematics.
Channel capacity, which is the fundamental
theorem in information
theory, is established in Chapter 8. The necessary material on differen-
tial entropy is developed in Chapter 9, laying the groundwork
for the
extension of previous capacity theorems to continuous noise channels.
The capacity of the fundamental Gaussian channel is investigated
in
Chapter 10.
information
The relationship between
first
studied by Kullback in the early 195Os, and relatively neglected since, is
developed in Chapter 12. Rate distortion
theory requires a little more
background
its noiseless data compression counterpart, which
accounts for its placement as late as Chapter 13 in the text,
theory and statistics,
than
The huge subject of network information
the simultaneously achievable flows of information
theory, which is the study of
in the presence of