Cornell University
Boston San Francisco NewYork
London Toronto Sydney Tokyo Singapore Madrid
Mexico City Munich Paris Cape Toxvn Hong Kong Montreal
Acquisitions Editor: Matt Goldstein
Project Editor: Maite Suarez-Rivus
Production Supervisor: MariIyn Lloyd
Marketing Manager: MichelIe Brown
Marketing Coordinator: Yake Zavracky
Project Management: Windfall Sofi-tvare
Composition: Windfall Software, using ZzTEX
Copyeditor: Carol Leyba
Technical Illustration: Dartmouth Publishing
Proofreader: Jennifer McClain
Indexer: Ted Laux
Cover Design: Yoyce Cosentino Wells
Cover Photo: © 2005 Tim Laman / National Geographic. A pair of weaverbirds work
together on their nest in Africa.
Prepress and Manufacturing: Caroline Fell
Printer: Courier West~ord
Access the latest information about Addison-Wesley rifles from our World Wide Web
site: http://www.aw-bc.com/computing
Many of the designations used by manufacturers and sellers to distinguish their
products are claimed as trademarks. Where those designations appear in this book,
and Addison-Wesley was aware of a trademark claim, the designations have been
printed in initial caps or all caps.
The programs and applications presented in this book have been included for their
instructional value. They have been tested with care, but are not guaranteed for any
particular purpose. The publisher does not offer any warranties or representations, nor
does it accept any liabilities with respect to the programs or applications.
Library of Congress Cataloging-in-Publication Data
Kleinberg, Jon.
Algorithm design / Jon Kleinberg, l~va Tardos.--lst ed.
p. cm.
Includes bibliographical references and index.
ISBN 0-321-29535-8 (alk. paper)
1. Computer algorithms. 2. Data structures (Computer science) I. Tardos, l~va.
II. Title.
QA76.9.A43K54 2005
005.1--dc22
2005000401
Copyright © 2006 by Pearson Education, Inc.
For information on obtaining permission for use of material in this work, please
submit a written request to Pearson Education, Inc., Rights and Contract Department,
75 Arlington Street, Suite 300, Boston, MA 02116 or fax your request to (617) 848-7047.
All rights reserved. No part of this publication may be reproduced, stored in a
retrieval system, or transmitted, in any form or by any means, electronic, mechanical,
photocopying, recording, or any toher media embodiments now known or hereafter to
become known, without the prior written permission of the publisher. Printed in the
United States of America.
ISBN 0-321-29535-8
2 3 4 5 6 7 8 9 10-CRW-08 07 06 05
About the Authors
3on Kleinberg is a professor of Computer Science at
Cornell University. He received his Ph.D. from M.I.T.
in 1996. He is the recipient of an NSF Career Award,
an ONR Young Investigator Award, an IBM Outstand-
ing Innovation Award, the National Academy of Sci-
ences Award for Initiatives in Research, research fel-
lowships from the Packard and Sloan Foundations,
and teaching awards from the Cornell Engineering
College and Computer Science Department.
Kleinberg’s research is centered around algorithms, particularly those con-
cerned with the structure of networks and information, and with applications
to information science, optimization, data mining, and computational biol-
ogy. His work on network analysis using hubs and authorities helped form the
foundation for the current generation of Internet search engines.
fiva Tardos is a professor of Computer Science at Cor-
nell University. She received her Ph.D. from E6tv6s
University in Budapest, Hungary in 1984. She is a
member of the American Academy of Arts and Sci-
ences, and an ACM Fellow; she is the recipient of an
NSF Presidential Young Investigator Award, the Fulk-
erson Prize, research fellowships from the Guggen-
helm, Packard, and Sloan Foundations, and teach-
ing awards from the Cornell Engineering College and
Computer Science Department.
Tardos’s research interests are focused on the design and analysis of
algorithms for problems on graphs or networks. She is most known for her
work on network-flow algorithms and approximation algorithms for network
problems. Her recent work focuses on algorithmic game theory, an emerging
area concerned with designing systems and algorithms for selfish users.
Contents
About the Authors
Preface
v
Introduction: Some Representative Problems
I. 1 A First Problem: Stable Matching,
1).
19
Exercises
Notes and Further Reading )‘8
2 Basics
2.1
2.2
2.3
2.4
2.5
of Algorithm Analysis
Computational Tractability 29
Asymptotic Order of Growth 35
Implementing the Stable Matching Algorithm Using Lists and
Arrays 42
A Survey of Common Running Times 47
29
57
65
Exercises 67
Notes and Fm-ther Reading 70
3 Graphs
Basic Definitions and Applications 73
Graph Connectivity and Graph Traversal 78
Implementing Graph Traversal Using Queues and Stacks 87
Testing Bipartiteness: An Application of Breadth-First Search
Connectivity in Directed Graphs 97
73
94
Contents
3.6
Directed Acyclic Graphs and Topological Ordering
99
104
Exercises 107
Notes and Further Reading 112
4
Greedy Algorithms
116
131
125
Interval Scheduling: The Greedy Algorithm Stays Ahead
Scheduling to Minimize Lateness: An Exchange Argument
Optimal Caching: A More Complex Exchange Argument
Shortest Paths in a Graph 137
The Minimum Spanning Tree ProbJem 142
Implementing Kruskal’s Algorithm: The Union-Find Data
Structure 151
Clustering 157
Huffman Codes and Data Compression 161
Minimum-Cost Arborescences: A Multi-Phase Greedy
Algorithm 177
183
4.7
4.8
*4.9
Exercises 188
Notes and Further Reading 205
5 Divide
5.1
5.2
5.3
5.4
5.5
5.6
and Conquer
A First Recurrence: The Mergesort Algorithm 210
Further Recurrence Relations 214
Counting Inversions 221
Finding the Closest Pair of Points 225
Integer Multiplication 231
234
242
Exercises 246
Notes and Further Reading 249
209
2S1
6
6.1 Weighted Interval Scheduling: A Recursive Procedure 252
6.2
Principles of Dynamic Programming: Memoization or Iteration
over Subproblems 258
6.3 Segmented Least Squares: Multi-way Choices 26~
* The star indicates an optional section. (See the Preface for more information about the relationships
among the chapters and sections.)
Contents
7
6.4
6.5
6.6
6.7
6.8
6.9
* 6.10
7.2
7.3
* 7.4
7.5
7.6
7.7
7.8
7.9
7.!0
7.11
7.12
"7.!3
Subset Sums and Knapsacks: Adding a.,~able 266
RNA Secondary Structure: Dynarmc~gramming over
Intervals 272
Sequence Alignment 278
Sequence Alignment in Linear Space via Divide and
Conquer 284
Shortest Paths in a Graph 290
297
Negative Cycles in a Graph 301
307
Exercises 312
Notes and Further Reading 335
337
The Maximum-Flow Problem and the Ford-FulkersOn
Algorithm 338
Maximum Flows and Minimum Cuts in a Network 346
Choosing Good Augmenting Paths 352
The Preflow-Push Maximum-Flow Algorithm:, 357
A First Application: The Bipartite Matching Problem 367
373
378
Survey Design 384
Airline Scheduling 387
Image Segmentation 391
\
Baseball Elimination 400
A Further Direction:
Solved Exercises 411
Exercises 415
Notes and Further Reading 448
Adding Costs to the Matching Problem,~) 404
Polynomial-Time Reductions 452
Reductions via "Gadgets": The Satisfiabflity Problem
Efficient Certification and the Definition of NP
463
8.1
8.2
8.3
8.4 NP-Complete Problems 466
8.5
Sequencing,,Problems 473
8.6
Partitioning Problems 481
8.7 Graph Coloring 485
451
459
X
Contents
8.8
8.9
8.10
Numerical Problems 490
Co-NP and the Asymmetry of NP 495
A Partial Taxonomy of Hard Problems
497
500
Exercises 505
Notes and Further Reading 529
9
PSPACE: A Class of Problems beyond NP
531
PSPACE 531
Some Hard Problems in PSPACE 533
Solving Quantified Problems and Games in Polynomia!
Space 536
Solving the Planning Problem in Polynomial Space
538
543
9.4
547
Exercises 550
Notes and Further Reading 551
10
Extending the Limits of Tractability
10.!
10.2
10.3
* 10.4
* 10.5
Finding Smal! Vertex Covers 554
Solving NP-Hard Problems on Trees 558
Coloring a Set of Circular Arcs 563
Tree Decompositions of Graphs 572
584
591
Exercises 594
Notes and Further Reading 598
11
Approximation Algorithms
11.1 Greedy Algorithms and Bounds on the Optimum: A Load
599
Balancing Problem 600
606
11.3
11.4
11.5
11.6
* 11.7
Set Cover: A General Greedy Heuristic 612
The Pricing Method: Vertex Cover 618
Maximization via the Pricing Method: The Disioint Paths
Problem 624
Linear Programming and Rounding: An Application to Vertex
Cover 630
Load Balancing Revisited: A More Advanced LP Application 637
Contents
11.8 Arbitrarily Good Approximations: The Knapsack Problem
644
649
Exercises 651
Notes and Further Reading 659
12 Local
12.1
12.2
12.3
12.4
12.5
12.6
12.7
13
Search
The Landscape of an Optimization Problem 662
The Metropolis Algorithm and Simulated Annealing 666
An Application of Local Se_arch to Hopfield Neural Networks
661
Choosing a Neighbor Relation 679
Classification via Local Search 681
700
Exercises 702
Notes and Further Reading 705
676
690
707
553
13.1 A First Application: Contention Resolution 708
13.2 Finding the Global Minimum Cut 714
13.3 Random Variables and Their Expectations 719
13.4 A Randomized Approximation Algorithm for MAX 3-SAT 724
13.5 Randomized Divide and Conquer: Median-Finding and
Quicksort 727
13.6 Hashing: A Randomized Implementation of Dictionaries 734
13.7 Finding the Closest Pair of Points: A Randomized Approach
13.8 Randomized Caching 750
13.9 Chernoff Bounds 758
13.10 Load Balancing 760
13.1! Packet Routing 762
13.12 Background: Some Basic ProbabiLity Definitions
769
741
776
Exercises 782
Notes and Further Reading 793
795
805
815
Algorithmic !deas are pervasive, and their reach is apparent in examples both
within computer science and beyond. Some of the major shifts in Internet
routing standards can be viewed as debates over the deficiencies of one
shortest-path algorithm and the relative advantages of another. The basic
notions used by biologists to express similarities among genes and genomes
have algorithmic definitions. The concerns voiced by economists over the
feasibility of combinatorial auctions in practice are rooted partly in the fact that
these auctions contain computationally intractable search problems as special
cases. And algorithmic notions aren’t just restricted to well-known and long-
standing problems; one sees the reflections of these ideas on a regular basis,
in novel issues arising across a wide range of areas. The scientist from Yahoo!
who told us over lunch one day about their system for serving ads to users was
describing a set of issues that, deep down, could be modeled as a network flow
problem. So was the former student, now a management consultant working
on staffing protocols for large hospitals, whom we happened to meet on a trip
to New York City.
The point is not simply that algorithms have many applications. The
deeper issue is that the subject of algorithms is a powerful lens through which
to view the field of computer science in general. Algorithmic problems form
the heart of computer science, but they rarely arrive as cleanly packaged,
mathematically precise questions. Rather, they tend to come bundled together
with lots of messy, application-specific detail, some of,it essential, some of it
extraneous. As a result, the algorithmic enterprise consists of two fundamental
components: the task of getting to the mathematically clean core of a problem,
and then the task of identifying the appropriate algorithm design techniques,
based on the structure of the problem. These two components interact: the
more comfortable one is with the full array of possible design techniques,
the more one starts to recognize the clean formulations that lie within messy
xiv
Preface
Preface
XV
problems out in the world. At their most effective, then, algorithmic ideas do
not just provide solutions to _well-posed problems; they form the language that
lets you cleanly express the underlying questions.
The goal of our book is to convey this approach to algorithms, as a design
process that begins with problems arising across the full range of computing
applications, builds on an understanding of algorithm design techniques, and
results in the development of efficient solutions to these problems. We seek
to explore the role of algorithmic ideas in computer science generally, and
relate these ideas to the range of precisely formulated problems for which we
can design and analyze algorithms. In other words, what are the underlying
issues that motivate these problems, and how did we choose these particular
ways of formulating them? How did we recognize which design principles were
appropriate in different situations?
In keeping with this, our goal is to offer advice on how to identify clean
algorithmic problem formulations in complex issues from different areas of
computing and, from this, how to design efficient algorithms for the resulting
problems. Sophisticated algorithms are often best understood by reconstruct-
ing the sequence of ideas--including false starts and dead ends--that led from
simpler initial approaches to the eventual solution. The result is a style of ex-
position that does not take the most direct route from problem statement to
algorithm, but we feel it better reflects the way that we and our colleagues
genuinely think about these questions.
Overview
The book is intended for students who have completed a programming-
based two-semester introductory computer science sequence (the standard
"CS1/CS2" courses) in which they have written programs that implement
basic algorithms, manipulate discrete structures such as trees and graphs, and
apply basic data structures such as arrays, lists, queues, and stacks. Since
the interface between CS1/CS2 and a first algorithms course is not entirely
standard, we begin the book with self-contained coverage of topics that at
some institutions a_re familiar to students from CS1/CS2, but which at other
institutions are included in the syllabi of the first algorithms course. This
material can thus be treated either as a review or as new material; by including
it, we hope the book can be used in a broader array of courses, and with more
flexibility in the prerequisite knowiedge that is assumed.
In keeping with the approach outlined above, we develop the basic algo-
rithm design techniques by drawing on problems from across many areas of
computer science and related fields. To mention a few representative examples
here, we include fairly detailed discussions of applications from systems and
networks (caching, switching, interdomain routing on the Internet), artificial
intelligence (planning, game playing, Hopfield networks), computer vision
(image segmentation), data mining (change-point detection, clustering), op-
erations research (airline scheduling), and computational biology (sequence
alignment, RNA secondary structure).
The notion of computational intractability, and NP-completeness in par-
ticular, plays a large role in the book. This is consistent with how we think
about the overall process of algorithm design. Some of the time, an interest-
ing problem arising in an application area will be amenable to an efficient
solution, and some of the time it will be provably NP-complete; in order to
fully address a new algorithmic problem, one should be able to explore both
of these ol)tions with equal familiarity. Since so many natural problems in
computer science are NP-complete, the development of methods to deal with
intractable problems has become a crucial issue in the study of algorithms,
and our book heavily reflects this theme. The discovery that a problem is NP-
complete should not be taken as the end of the story, but as an invitation to
begin looking for approximation algorithms, heuristic local search techniques,
or tractable special cases. We include extensive coverage of each of these three
approaches.
Problems and Solved Exercises
An important feature of the book is the collection of problems. Across all
chapters, the book includes over 200 problems, almost a!l of them developed
and class-tested in homework or exams as part of our teaching of the course
at Cornell. We view the problems as a crucial component of the book, and
they are structured in keeping with our overall approach to the material. Most
of them consist of extended verbal descriptions of a problem arising in an
application area in computer science or elsewhere out in the world, and part of
the problem is to practice what we discuss in the text: setting up the necessary
notation and formalization, designing an algorithm, and then analyzing it and
proving it correct. (We view a complete answer to one of these problems as
consisting of all these components: a fl~y explained algorithm, an analysis of
the nmning time, and a proof of correctness.) The ideas for these problems
come in large part from discussions we have had over the years with people
working in different areas, and in some cases they serve the dual purpose of
recording an interesting (though manageable) application of algorithms that
we haven’t seen written down anywhere else.
To help with the process of working on these problems, we include in
each chapter a section entitled "Solved Exercises," where we take one or more
problems and describe how to go about formulating a solution. The discussion
devoted to each solved exercise is therefore significantly longer than what
would be needed simply to write a complete, correct solution (in other words,