logo资料库

Database System The Complete Book 2nd Edition.pdf

第1页 / 共1241页
第2页 / 共1241页
第3页 / 共1241页
第4页 / 共1241页
第5页 / 共1241页
第6页 / 共1241页
第7页 / 共1241页
第8页 / 共1241页
资料共1241页,剩余部分请下载后查看
Table of Contents
Chapter 1 The Worlds of Database
1.2.3 Storage and Buffer Management
1.2.4 Transaction Processing
Chapter 2 The Relational Model of Data
2.1.3 The Relational Model in Brief
2.1.4 The Semistructured Model in Brief
2.1.6 Comparison of Modeling Approaches
2.2 Basics of the Relational Model
2.2.1 Attributes
2.2.2 Schemas
2.2.3 Tuples
2.2.4 Domains
2.2.7 Keys of Relations
2.4.2 What is an Algebra?
2.4.3 Overview of Relational Algebra
2.4.4 Set Operations on Relations
2.4.5 Projection
2.4.6 Selection
2.4.7 Cartesian Product
2.4.8 Natural Joins
2.4.9 Theta-Joins
2.4.10 Combining Operations to Form Queries
2.4.12 Relationships Among Operations
2.4.13 A Linear Notation for Algebraic Expressions
2.6 Summary of Chapter 2
Chapter 3 Design Theory for Relational Databases
Chapter 4 High-Level Database Models
Chapter 5 Algebraic and Logical Query Languages
5.1 Relational Operations on Bags
5.2 Extended Operators of Relational Algebra
5.2.4 The Grouping Operator
Chapter 6 The Database Language SQL
6.4.2 Duplicates in Unions, Intersections, and Differences
Chapter 7 Constraints and Triggers
Chapter 8 Views and Indexes
Chapter 9 SQL in a Server Environment
Chapter 10 Advanced Topics in Relational Databases
10.1 Security and User Authorization in SQL
10.2 Recursion in SQL
10.3 The Object-Relational Model
10.6.4 Star Schemas
Chapter 11 The Semistructured-Data Model
Chapter 12 Programming Language for XML
Chapter 13 Secondary Storage Management
13.4.7 Parity Blocks
13.4.8 An Improvement: RAID 5
13.4.9 Coping With Multiple Disk Crashes
13.5 Arranging Data on Disk
13.5.1 Fixed-Length Records
Chapter 14 Index Structures
14.1 Index-Structure Basics
14.2 B-Trees
14.2.1 The Structure of B-trees
14.2.2 Applications of B-trees
14.2.3 Lookup in B-Trees
14.2.4 Range Queries
14.2.5 Insertion Into B-Trees
14.2.6 Deletion From B-Trees
14.2.7 Efficiency of B-Trees
14.3 Hash Tables
14.3.1 Secondary-Storage Hash Tables
14.3.2 Insertion Into a Hash Table
14.3.3 Hash-Table Deletion
14.3.4 Efficiency of Hash Table Indexes
14.3.5 Extensible Hash Tables
14.3.6 Insertion Into Extensible Hash Tables
14.3.7 Linear Hash Tables
14.3.8 Insertion Into Linear Hash Tables
Chapter 15 Query Execution
15.1 Introduction to Physical-Query-PlanOperators
15.2 One-Pass Algorithms
15.3 Nested-Loop Joins
15.4 Two-Pass Algorithms Based on Sorting
15.5 Two-Pass Algorithm s Based on Hashing
15.6 Index-Based Algorith
15.7 Buffer Management
15.8 Algorithms Using More Than Two Passes
Chapter 16 The Query Compiler
16.1 Parsing and Preprocessing
16.1.1 Syntax Analysis and Parse Trees
16.1.2 A Grammar for a Simple Subset of SQL
16.1.3 The Preprocessor
16.1.4 Preprocessing Queries Involving Views
16.1.5 Exercises for Section 16.1
16.2 Algebraic Laws for Improving Query Plans
16.2.1 Commutative and Associative Laws
16.2.2 Laws Involving Selection
16.2.3 Pushing Selections
16.2.4 Laws Involving Projection
16.2.5 Laws About Joins and Products
16.2.6 Laws Involving Duplicate Elimination
16.2.7 Laws Involving Grouping and Aggregation
16.2.8 Exercises for Section 16.2
16.3 From Parse Trees to Logical Query Plans
16.3.1 Conversion to Relational Algebra
16.3.2 Removing Subqueries From Conditions
16.3.3 Improving the Logical Query Plan
16.3.4 Grouping Associative/Comm utative Operators
16.3.5 Exercises for Section 16.3
16.4 Estimating the Cost of Operations
16.4.1 Estimating Sizes of Intermediate Relations
16.4.2 Estimating the Size of a Projection
16.4.4 Estimating the Size of a Join
16.4.5 Natural Joins With Multiple Join Attributes
16.4.6 Joins o f Many Relations
16.4.7 Estimating Sizes for Other Operations
16.5 Introduction to Cost-Based Plan Selection
16.5.1 Obtaining Estimates for Size Parameters
16.5.2 Computation of Statistics
16.5.3 Heuristics for Reducing the Cost of Logical QueryPlans
16.5.4 Approaches to Enumerating Physical Plans
16.5.5 Exercises for Section 16.5
16.6 Choosing an Order for Joins
16.7 Completing the Physical-Query-Plan
16.7.1 Choosing a Selection Method
16.7.2 Choosing a Join Method
16.7.4 Pipelining Unary Operations
16.7.5 Pipelining Binary Operations
16.7.6 Notation for Physical Query Plans
16.7.7 Ordering of Physical Operations
16.7.8 Exercises for Section 16.7
Chapter 17 Coping With System Failures
Chapter 18 Concurrency Control
Chapter 19 More About Transaction Management
Chapter 20 Parallel and Distributed Databases
20.1 Parallel Algorithms on Relations
20.1.1 Models of Parallelism
20.1.2 Tuple-at-a-Time Operations in Parallel
20.1.3 Parallel Algorithms for Full-Relation Operations
20.2 The Map-Reduce Parallelism Framework
20.3 Distributed Databases
20.8 Summary of Chapter 20
Chapter 21 Information Intergration
Chapter 22 Data Mining
Chapter 23 Database Systems and the Internet
DATABASE SYSTEMS The Complete Book
DATABASE SYSTEMS The Complete Book Second Edition Hector Garcia-Molina Jeffrey D. Ullman Jennifer Widom Department of Computer Science Stanford University Upper Saddle River, New Jersey 07458
CD• ' l N O T I C E n This work is protected by U.S. copyright laws and is provided solely for the use of college instructors in reviewing course materials for classroom use. Dissemination or sale of this work, or any part • (including on the World Wide Web), is not permitted. Editorial Director, Computer Science and Engineering: Marcia J. Horton Executive Editor Tracy Dunkelberger Editorial Assistant: Melinda Haggerty Director of Marketing: Margaret Waples Marketing Manager: Christopher Kelly Senior Managing Editor: Scott Disanno Production Editor: Irwin Zucker Art Director: Jayne Conte Cover Designer: Margaret Kenselaar Cover Art: Tamara L Newman Manufacturing Buyer: Lisa McDowell Manufacturing Manager: Alan Fischer P E A R S O N P re n tic o H a ll © 2009,2002 by Pearson Education Inc. Pearson Prentice Hall Pearson Education, Inc. Upper Saddle River, NJ 07458 All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher. Pearson Prentice Hall™ is a trademark of Pearson Education, Inc. The author and publisher of this book have used their best efforts in preparing this book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The author and publisher make no warranty of any kind, expressed or implied, with regard to these programs or the documentation contained in this book. The author and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 ISBN D-13-bQb?Dl-fl 178-0-13-b0b?01-b Pearson Education Ltd., London Pearson Education Australia Pty. Ltd., Sydney Pearson Education Singapore, Pte. Ltd. Pearson Education North Asia Ltd., Hong Kong Pearson Education Canada, Inc., Toronto Pearson Educaci6n de Mexico, S.A. de C.V. Pearson Education—Japan, Tokyo Pearson Education Malaysia, Pte. Ltd. Pearson Education, Inc., Upper Saddle River, New Jersey
Preface This book covers the core of the material taught in the database sequence at Stanford. The introductory course, CS145, uses the first twelve chapters, and is designed for all students — those who want to use database systems as well as those who want to get involved in database implementation. The second course, CS245 on database implementation, covers most of the rest of the book. However, some material is covered in more detail in special topics courses. These include CS346 (implementation project), which concentrates on query optimization as in Chapters 15 and 16. Also, CS345A, on data mining and Web mining, covers the material in the last two chapters. W hat’s New in the Second Edition After a brief introduction in Chapter 1, we cover relational modeling in Chapters 2-4. Chapter 4 is devoted to high-level modeling. There, in addition to the E /R model, we now cover UML (Unified Modeling Language). We also have moved to Chapter 4 a shorter version of the material on ODL, treating it as a design language for relational database schemas. The material on functional and multivalued dependencies has been mod­ ified and remains in Chapter 3. We have changed our viewpoint, so that a functional dependency is assumed to have a set of attributes on the right. We have also given explicitly certain algorithms, including the “chase,” that allow us to manipulate dependencies. We have augmented our discussion of third normal form to include the 3NF synthesis algorithm and to make clear what the tradeoff between 3NF and BCNF is. Chapter 5 contains the coverage of relational algebra from the previous edition, and is joined by (part of) the treatment of Datalog from the old Chap­ ter 10. The discussion of recursion in Datalog is either moved to the book’s Web site or combined with the treatment of recursive SQL in Chapter 10 of this edition. Chapters 6-10 are devoted to aspects of SQL programming, and they repre­ sent a reorganization and augmentation of the earlier book’s Chapters 6, 7, 8, and parts of 10. The material on views and indexes has been moved to its own chapter, number 8, and this material has been augmented with a discussion of
vi PREFACE important new topics, including materialized views, and automatic selection of indexes. The new Chapter 9 is based on the old Chapter 8 (embedded SQL). It is introduced by a new section on 3-tier architecture. It also includes an expanded discussion of JDBC and new coverage of PHP. Chapter 10 collects a number of advanced SQL topics. The discussion of authorization from the old Chapter 8 has been moved here, as has the discussion of recursive SQL from the old Chapter 10. Data cubes, from the old Chapter 20, are now covered here. The rest of the chapter is devoted to the nested-relation model (from the old Chapter 4) and object-relational features of SQL (from the old Chapter 9). Then, Chapters 11 and 12 cover XML and systems based on XML. Ex­ cept for material at the end of the old Chapter 4, which has been moved to Chapter 11, this material is all new. Chapter 11 covers modeling; it includes expanded coverage of DTD’s, along with new material on XML Schema. Chap­ ter 12 is devoted to programming, and it includes sections on XPath, XQuery, and XSLT. Chapter 13 begins the study of database implementation. It covers disk storage and the file structures that are built on disks. This chapter is a con­ densation of material that, in the first edition, occupied Chapters 11 and 12. Chapter 14 covers index structures, including B-trees, hashing, and struc­ tures for multidimensional indexes. This material also condenses two chapters, 13 and 14, from the first edition. Chapters 15 and 16 cover query execution and query optimization, respec­ tively. They are similar to the old chapters of the same numbers. Chapter 17 covers logging, and Chapter 18 covers concurrency control; these chapters are also similar to the old chapters with the same numbers. Chapter 19 contains additional topics on concurrency: recovery, deadlocks, and long transactions. This material is a subset of the old Chapter 19. Chapter 20 is on parallel and distributed databases. In addition to material on parallel query execution from the old Chapter 15 and material on distributed locking and commitment from the old Chapter 19, there are several new sec­ tions on distributed query execution: the map-reduce framework for parallel computation, peer-to-peer databases and their implementation of distributed hash tables. Chapter 21 covers information integration. In addition to material on this subject from the old Chapter 20, we have added a section on local-as-view medi­ ators and a section on entity resolution (finding records from several databases that refer to the same entity, e.g., a person). Chapter 22 is on data mining. Although there was some material on the subject in the old Chapter 20, almost all of this chapter is new. It covers asso­ ciation rules and frequent itemset mining, including both the famous A-Priori Algorithm and certain efficiency improvements. Chapter 22 includes the key techniques of shingling, minhashing, and locality-sensitive hashing for finding similar items in massive databases, e.g., Web pages that quote substantially
PREFACE vii from other Web pages. The chapter concludes with a study of clustering, espe­ cially for massive datasets. Chapter 23, all new, addresses two important ways in which the Internet has impacted database technology. First is search engines, where we discuss algorithms for crawling the Web, the well-known PageRank algorithm for eval­ uating the importance of Web pages, and its extensions. This chapter also covers data-stream-management systems. We discuss the stream data model and SQL language extensions, and conclude with several interesting algorithms for executing queries on streams. Prerequisites We have used the book at the “mezzanine” level, in a sequence of courses taken both by undergraduates and by beginning graduate students. The formal prerequisites for the course are Sophomore-level treatments of: 1. Data structures, algorithms, and discrete math, and 2. Software systems, software engineering, and programming languages. Of this material, it is important that students have at least a rudimentary un­ derstanding of such topics as: algebraic expressions and laws, logic, basic data structures, object-oriented programming concepts, and programming environ­ ments. However, we believe that adequate background is acquired by the Junior year of a typical computer science program. Exercises The book contains extensive exercises, with some for almost every section. We indicate harder exercises or parts of exercises with an exclamation point. The hardest exercises have a double exclamation point. Support on the World W ide Web The book’s home page is http://infolab.Stanford.edu/~ullman/dscb.html You will find errata as we learn of them, and backup materials, including home- works, projects, and exams. We shall also make available there the sections from the first edition that have been removed from the second. In addition, there is an accompanying set of on-line homeworks and pro­ gramming labs using a technology developed by Gradiance Corp. See the sec­ tion following the Preface for details about the GOAL system. GOAL service
分享到:
收藏