Database System The Complete Book 2nd Edition.pdf

发布时间：2022-06-13 发布人：admin 分类：说明书资料大小：28.49M 资料格式：pdf 举报版权申诉

8bbf8623-040b-4a1e-81fe-8e84e4b44158.pdf-第1页.png

第1页 / 共1241页

8bbf8623-040b-4a1e-81fe-8e84e4b44158.pdf-第2页.png

第2页 / 共1241页

8bbf8623-040b-4a1e-81fe-8e84e4b44158.pdf-第3页.png

第3页 / 共1241页

8bbf8623-040b-4a1e-81fe-8e84e4b44158.pdf-第4页.png

第4页 / 共1241页

8bbf8623-040b-4a1e-81fe-8e84e4b44158.pdf-第5页.png

第5页 / 共1241页

8bbf8623-040b-4a1e-81fe-8e84e4b44158.pdf-第6页.png

第6页 / 共1241页

8bbf8623-040b-4a1e-81fe-8e84e4b44158.pdf-第7页.png

第7页 / 共1241页

8bbf8623-040b-4a1e-81fe-8e84e4b44158.pdf-第8页.png

第8页 / 共1241页

Table of Contents

Chapter 1 The Worlds of Database

1.2.3 Storage and Buffer Management

1.2.4 Transaction Processing

Chapter 2 The Relational Model of Data

2.1.3 The Relational Model in Brief

2.1.4 The Semistructured Model in Brief

2.1.6 Comparison of Modeling Approaches

2.2 Basics of the Relational Model

2.2.1 Attributes

2.2.2 Schemas

2.2.3 Tuples

2.2.4 Domains

2.2.7 Keys of Relations

2.4.2 What is an Algebra?

2.4.3 Overview of Relational Algebra

2.4.4 Set Operations on Relations

2.4.5 Projection

2.4.6 Selection

2.4.7 Cartesian Product

2.4.8 Natural Joins

2.4.9 Theta-Joins

2.4.10 Combining Operations to Form Queries

2.4.12 Relationships Among Operations

2.4.13 A Linear Notation for Algebraic Expressions

2.6 Summary of Chapter 2

Chapter 3 Design Theory for Relational Databases

Chapter 4 High-Level Database Models

Chapter 5 Algebraic and Logical Query Languages

5.1 Relational Operations on Bags

5.2 Extended Operators of Relational Algebra

5.2.4 The Grouping Operator

Chapter 6 The Database Language SQL

6.4.2 Duplicates in Unions, Intersections, and Differences

Chapter 7 Constraints and Triggers

Chapter 8 Views and Indexes

Chapter 9 SQL in a Server Environment

Chapter 10 Advanced Topics in Relational Databases

10.1 Security and User Authorization in SQL

10.2 Recursion in SQL

10.3 The Object-Relational Model

10.6.4 Star Schemas

Chapter 11 The Semistructured-Data Model

Chapter 12 Programming Language for XML

Chapter 13 Secondary Storage Management

13.4.7 Parity Blocks

13.4.8 An Improvement: RAID 5

13.4.9 Coping With Multiple Disk Crashes

13.5 Arranging Data on Disk

13.5.1 Fixed-Length Records

Chapter 14 Index Structures

14.1 Index-Structure Basics

14.2 B-Trees

14.2.1 The Structure of B-trees

14.2.2 Applications of B-trees

14.2.3 Lookup in B-Trees

14.2.4 Range Queries

14.2.5 Insertion Into B-Trees

14.2.6 Deletion From B-Trees

14.2.7 Efficiency of B-Trees

14.3 Hash Tables

14.3.1 Secondary-Storage Hash Tables

14.3.2 Insertion Into a Hash Table

14.3.3 Hash-Table Deletion

14.3.4 Efficiency of Hash Table Indexes

14.3.5 Extensible Hash Tables

14.3.6 Insertion Into Extensible Hash Tables

14.3.7 Linear Hash Tables

14.3.8 Insertion Into Linear Hash Tables

Chapter 15 Query Execution

15.1 Introduction to Physical-Query-PlanOperators

15.2 One-Pass Algorithms

15.3 Nested-Loop Joins

15.4 Two-Pass Algorithms Based on Sorting

15.5 Two-Pass Algorithm s Based on Hashing

15.6 Index-Based Algorith

15.7 Buffer Management

15.8 Algorithms Using More Than Two Passes

Chapter 16 The Query Compiler

16.1 Parsing and Preprocessing

16.1.1 Syntax Analysis and Parse Trees

16.1.2 A Grammar for a Simple Subset of SQL

16.1.3 The Preprocessor

16.1.4 Preprocessing Queries Involving Views

16.1.5 Exercises for Section 16.1

16.2 Algebraic Laws for Improving Query Plans

16.2.1 Commutative and Associative Laws

16.2.2 Laws Involving Selection

16.2.3 Pushing Selections

16.2.4 Laws Involving Projection

16.2.5 Laws About Joins and Products

16.2.6 Laws Involving Duplicate Elimination

16.2.7 Laws Involving Grouping and Aggregation

16.2.8 Exercises for Section 16.2

16.3 From Parse Trees to Logical Query Plans

16.3.1 Conversion to Relational Algebra

16.3.2 Removing Subqueries From Conditions

16.3.3 Improving the Logical Query Plan

16.3.4 Grouping Associative/Comm utative Operators

16.3.5 Exercises for Section 16.3

16.4 Estimating the Cost of Operations

16.4.1 Estimating Sizes of Intermediate Relations

16.4.2 Estimating the Size of a Projection

16.4.4 Estimating the Size of a Join

16.4.5 Natural Joins With Multiple Join Attributes

16.4.6 Joins o f Many Relations

16.4.7 Estimating Sizes for Other Operations

16.5 Introduction to Cost-Based Plan Selection

16.5.1 Obtaining Estimates for Size Parameters

16.5.2 Computation of Statistics

16.5.3 Heuristics for Reducing the Cost of Logical QueryPlans

16.5.4 Approaches to Enumerating Physical Plans

16.5.5 Exercises for Section 16.5

16.6 Choosing an Order for Joins

16.7 Completing the Physical-Query-Plan

16.7.1 Choosing a Selection Method

16.7.2 Choosing a Join Method

16.7.4 Pipelining Unary Operations

16.7.5 Pipelining Binary Operations

16.7.6 Notation for Physical Query Plans

16.7.7 Ordering of Physical Operations

16.7.8 Exercises for Section 16.7

Chapter 17 Coping With System Failures

Chapter 18 Concurrency Control

Chapter 19 More About Transaction Management

Chapter 20 Parallel and Distributed Databases

20.1 Parallel Algorithms on Relations

20.1.1 Models of Parallelism

20.1.2 Tuple-at-a-Time Operations in Parallel

20.1.3 Parallel Algorithms for Full-Relation Operations

20.2 The Map-Reduce Parallelism Framework

20.3 Distributed Databases

20.8 Summary of Chapter 20

Chapter 21 Information Intergration

Chapter 22 Data Mining

Chapter 23 Database Systems and the Internet

DATABASE SYSTEMS The Complete Book

DATABASE SYSTEMS The Complete Book Second Edition Hector Garcia-Molina Jeffrey D. Ullman Jennifer Widom Department of Computer Science Stanford University Upper Saddle River, New Jersey 07458

CD• ' l N O T I C E n This work is protected by U.S. copyright laws and is provided solely for the use of college instructors in reviewing course materials for classroom use. Dissemination or sale of this work, or any part • (including on the World Wide Web), is not permitted. Editorial Director, Computer Science and Engineering: Marcia J. Horton Executive Editor Tracy Dunkelberger Editorial Assistant: Melinda Haggerty Director of Marketing: Margaret Waples Marketing Manager: Christopher Kelly Senior Managing Editor: Scott Disanno Production Editor: Irwin Zucker Art Director: Jayne Conte Cover Designer: Margaret Kenselaar Cover Art: Tamara L Newman Manufacturing Buyer: Lisa McDowell Manufacturing Manager: Alan Fischer P E A R S O N P re n tic o H a ll © 2009,2002 by Pearson Education Inc. Pearson Prentice Hall Pearson Education, Inc. Upper Saddle River, NJ 07458 All rights reserved. No part of this book may be reproduced, in any form or by any means, without permission in writing from the publisher. Pearson Prentice Hall™ is a trademark of Pearson Education, Inc. The author and publisher of this book have used their best efforts in preparing this book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The author and publisher make no warranty of any kind, expressed or implied, with regard to these programs or the documentation contained in this book. The author and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. Printed in the United States of America 10 9 8 7 6 5 4 3 2 1 ISBN D-13-bQb?Dl-fl 178-0-13-b0b?01-b Pearson Education Ltd., London Pearson Education Australia Pty. Ltd., Sydney Pearson Education Singapore, Pte. Ltd. Pearson Education North Asia Ltd., Hong Kong Pearson Education Canada, Inc., Toronto Pearson Educaci6n de Mexico, S.A. de C.V. Pearson Education—Japan, Tokyo Pearson Education Malaysia, Pte. Ltd. Pearson Education, Inc., Upper Saddle River, New Jersey

Preface This book covers the core of the material taught in the database sequence at Stanford. The introductory course, CS145, uses the first twelve chapters, and is designed for all students — those who want to use database systems as well as those who want to get involved in database implementation. The second course, CS245 on database implementation, covers most of the rest of the book. However, some material is covered in more detail in special topics courses. These include CS346 (implementation project), which concentrates on query optimization as in Chapters 15 and 16. Also, CS345A, on data mining and Web mining, covers the material in the last two chapters. W hat’s New in the Second Edition After a brief introduction in Chapter 1, we cover relational modeling in Chapters 2-4. Chapter 4 is devoted to high-level modeling. There, in addition to the E /R model, we now cover UML (Unified Modeling Language). We also have moved to Chapter 4 a shorter version of the material on ODL, treating it as a design language for relational database schemas. The material on functional and multivalued dependencies has been mod ified and remains in Chapter 3. We have changed our viewpoint, so that a functional dependency is assumed to have a set of attributes on the right. We have also given explicitly certain algorithms, including the “chase,” that allow us to manipulate dependencies. We have augmented our discussion of third normal form to include the 3NF synthesis algorithm and to make clear what the tradeoff between 3NF and BCNF is. Chapter 5 contains the coverage of relational algebra from the previous edition, and is joined by (part of) the treatment of Datalog from the old Chap ter 10. The discussion of recursion in Datalog is either moved to the book’s Web site or combined with the treatment of recursive SQL in Chapter 10 of this edition. Chapters 6-10 are devoted to aspects of SQL programming, and they repre sent a reorganization and augmentation of the earlier book’s Chapters 6, 7, 8, and parts of 10. The material on views and indexes has been moved to its own chapter, number 8, and this material has been augmented with a discussion of

vi PREFACE important new topics, including materialized views, and automatic selection of indexes. The new Chapter 9 is based on the old Chapter 8 (embedded SQL). It is introduced by a new section on 3-tier architecture. It also includes an expanded discussion of JDBC and new coverage of PHP. Chapter 10 collects a number of advanced SQL topics. The discussion of authorization from the old Chapter 8 has been moved here, as has the discussion of recursive SQL from the old Chapter 10. Data cubes, from the old Chapter 20, are now covered here. The rest of the chapter is devoted to the nested-relation model (from the old Chapter 4) and object-relational features of SQL (from the old Chapter 9). Then, Chapters 11 and 12 cover XML and systems based on XML. Ex cept for material at the end of the old Chapter 4, which has been moved to Chapter 11, this material is all new. Chapter 11 covers modeling; it includes expanded coverage of DTD’s, along with new material on XML Schema. Chap ter 12 is devoted to programming, and it includes sections on XPath, XQuery, and XSLT. Chapter 13 begins the study of database implementation. It covers disk storage and the file structures that are built on disks. This chapter is a con densation of material that, in the first edition, occupied Chapters 11 and 12. Chapter 14 covers index structures, including B-trees, hashing, and struc tures for multidimensional indexes. This material also condenses two chapters, 13 and 14, from the first edition. Chapters 15 and 16 cover query execution and query optimization, respec tively. They are similar to the old chapters of the same numbers. Chapter 17 covers logging, and Chapter 18 covers concurrency control; these chapters are also similar to the old chapters with the same numbers. Chapter 19 contains additional topics on concurrency: recovery, deadlocks, and long transactions. This material is a subset of the old Chapter 19. Chapter 20 is on parallel and distributed databases. In addition to material on parallel query execution from the old Chapter 15 and material on distributed locking and commitment from the old Chapter 19, there are several new sec tions on distributed query execution: the map-reduce framework for parallel computation, peer-to-peer databases and their implementation of distributed hash tables. Chapter 21 covers information integration. In addition to material on this subject from the old Chapter 20, we have added a section on local-as-view medi ators and a section on entity resolution (finding records from several databases that refer to the same entity, e.g., a person). Chapter 22 is on data mining. Although there was some material on the subject in the old Chapter 20, almost all of this chapter is new. It covers asso ciation rules and frequent itemset mining, including both the famous A-Priori Algorithm and certain efficiency improvements. Chapter 22 includes the key techniques of shingling, minhashing, and locality-sensitive hashing for finding similar items in massive databases, e.g., Web pages that quote substantially

PREFACE vii from other Web pages. The chapter concludes with a study of clustering, espe cially for massive datasets. Chapter 23, all new, addresses two important ways in which the Internet has impacted database technology. First is search engines, where we discuss algorithms for crawling the Web, the well-known PageRank algorithm for eval uating the importance of Web pages, and its extensions. This chapter also covers data-stream-management systems. We discuss the stream data model and SQL language extensions, and conclude with several interesting algorithms for executing queries on streams. Prerequisites We have used the book at the “mezzanine” level, in a sequence of courses taken both by undergraduates and by beginning graduate students. The formal prerequisites for the course are Sophomore-level treatments of: 1. Data structures, algorithms, and discrete math, and 2. Software systems, software engineering, and programming languages. Of this material, it is important that students have at least a rudimentary un derstanding of such topics as: algebraic expressions and laws, logic, basic data structures, object-oriented programming concepts, and programming environ ments. However, we believe that adequate background is acquired by the Junior year of a typical computer science program. Exercises The book contains extensive exercises, with some for almost every section. We indicate harder exercises or parts of exercises with an exclamation point. The hardest exercises have a double exclamation point. Support on the World W ide Web The book’s home page is http://infolab.Stanford.edu/~ullman/dscb.html You will find errata as we learn of them, and backup materials, including home- works, projects, and exams. We shall also make available there the sections from the first edition that have been removed from the second. In addition, there is an accompanying set of on-line homeworks and pro gramming labs using a technology developed by Gradiance Corp. See the sec tion following the Preface for details about the GOAL system. GOAL service

分享到：

赞收藏

资料库

Database System The Complete Book 2nd Edition.pdf

相关推荐

课程资源

热门标签

最新资料