Principles of Distributed Database Systems， 3rd Edition.pdf

发布时间：2022-06-24 发布人：admin 分类：说明书资料大小：11.09M 资料格式：pdf 举报版权申诉

attao-10022096-4744302543447949796.pdf-第1页.png

第1页 / 共866页

attao-10022096-4744302543447949796.pdf-第2页.png

第2页 / 共866页

attao-10022096-4744302543447949796.pdf-第3页.png

第3页 / 共866页

attao-10022096-4744302543447949796.pdf-第4页.png

第4页 / 共866页

attao-10022096-4744302543447949796.pdf-第5页.png

第5页 / 共866页

attao-10022096-4744302543447949796.pdf-第6页.png

第6页 / 共866页

attao-10022096-4744302543447949796.pdf-第7页.png

第7页 / 共866页

attao-10022096-4744302543447949796.pdf-第8页.png

第8页 / 共866页

Cover

Principles of Distributed Database Systems, Third Edition

ISBN 9781441988331

Preface

Contents

Chapter 1: Introduction

1.1 Distributed Data Processing

1.2 What is a Distributed Database System?

1.3 Data Delivery Alternatives

1.4 Promises of DDBSs

1.4.1 Transparent Management of Distributed and Replicated Data

1.4.2 Reliability Through Distributed Transactions

1.4.3 Improved Performance

1.4.4 Easier System Expansion

1.5 Complications Introduced by Distribution

1.6 Design Issues

1.6.1 Distributed Database Design

1.6.2 Distributed Directory Management

1.6.3 Distributed Query Processing

1.6.4 Distributed Concurrency Control

1.6.5 Distributed Deadlock Management

1.6.6 Reliability of Distributed DBMS

1.6.7 Replication

1.6.8 Relationship among Problems

1.6.9 Additional Issues

1.7 Distributed DBMS Architecture

1.7.1 ANSI/SPARC Architecture

1.7.2 A Generic Centralized DBMS Architecture

1.7.3 Architectural Models for Distributed DBMSs

1.7.4 Autonomy

1.7.5 Distribution

1.7.6 Heterogeneity

1.7.7 Architectural Alternatives

1.7.8 Client/Server Systems

1.7.9 Peer-to-Peer Systems

1.7.10 Multidatabase System Architecture

1.8 Bibliographic Notes

Chapter 2: Background

2.1 Overview of Relational DBMS

2.1.1 Relational Database Concepts

2.1.2 Normalization

2.1.3 Relational Data Languages

2.1.3.1 Relational Algebra

2.1.3.2 Relational Calculus

2.2 Review of Computer Networks

2.2.1 Types of Networks

2.2.2 Communication Schemes

2.2.3 Data Communication Concepts

2.2.4 Communication Protocols

2.3 Bibliographic Notes

Chapter 3: Distributed Database Design

3.1 Top-Down Design Process

3.2 Distribution Design Issues

3.2.1 Reasons for Fragmentation

3.2.2 Fragmentation Alternatives

3.2.3 Degree of Fragmentation

3.2.4 Correctness Rules of Fragmentation

3.2.5 Allocation Alternatives

3.2.6 Information Requirements

3.3 Fragmentation

3.3.1 Horizontal Fragmentation

3.3.1.1 Information Requirements of Horizontal Fragmentation

3.3.1.2 Primary Horizontal Fragmentation

3.3.1.3 Derived Horizontal Fragmentation

3.3.1.4 Checking for Correctness

3.3.2 Vertical Fragmentation

3.3.3 Hybrid Fragmentation

3.4 Allocation

3.4.1 Allocation Problem

3.4.2 Information Requirements

3.4.3 Allocation Model

3.4.4 Solution Methods

3.5 Data Directory

3.6 Conclusion

3.7 Bibliographic Notes

Chapter 4: Database Integration

4.1 Bottom-Up Design Methodology

4.2 Schema Matching

4.2.1 Schema Heterogeneity

4.2.2 Linguistic Matching Approaches

4.2.3 Constraint-based Matching Approaches

4.2.4 Learning-based Matching

4.2.5 Combined Matching Approaches

4.3 Schema Integration

4.4 Schema Mapping

4.4.1 Mapping Creation

4.4.2 Mapping Maintenance

4.5 Data Cleaning

4.6 Conclusion

4.7 Bibliographic Notes

Chapter 5: Data and Access Control

5.1 View Management

5.1.1 Views in Centralized DBMSs

5.1.2 Views in Distributed DBMSs

5.1.3 Maintenance of Materialized Views

5.2 Data Security

5.2.1 Discretionary Access Control

5.2.2 Multilevel Access Control

5.2.3 Distributed Access Control

5.3 Semantic Integrity Control

5.3.1 Centralized Semantic Integrity Control

5.3.1.1 Specification of Integrity Constraints

5.3.1.2 Integrity Enforcement

5.3.2 Distributed Semantic Integrity Control

5.3.2.1 Definition of Distributed Integrity Constraints

5.3.2.2 Enforcement of Distributed Integrity Assertions

5.3.2.3 Summary of Distributed Integrity Control

5.4 Conclusion

5.5 Bibliographic Notes

Chapter 6: Overview of Query Processing

6.1 Query Processing Problem

6.2 Objectives of Query Processing

6.3 Complexity of Relational Algebra Operations

6.4 Characterization of Query Processors

6.4.1 Languages

6.4.2 Types of Optimization

6.4.3 Optimization Timing

6.4.4 Statistics

6.4.5 Decision Sites

6.4.6 Exploitation of the Network Topology

6.4.7 Exploitation of Replicated Fragments

6.4.8 Use of Semijoins

6.5 Layers of Query Processing

Query Decomposition

Data Localization

Global Query Optimization

Distributed Query Execution

6.6 Conclusion

6.7 Bibliographic Notes

Chapter 7: Query Decomposition and Data Localization

7.1 Query Decomposition

7.1.1 Normalization

7.1.2 Analysis

7.1.3 Elimination of Redundancy

7.1.4 Rewriting

7.2 Localization of Distributed Data

7.2.1 Reduction for Primary Horizontal Fragmentation

7.2.2 Reduction for Vertical Fragmentation

7.2.3 Reduction for Derived Fragmentation

7.2.4 Reduction for Hybrid Fragmentation

7.3 Conclusion

7.4 Bibliographic NOTES

Chapter 8: Optimization of Distributed Queries

8.1 Query Optimization

8.1.1 Search Space

8.1.2 Search Strategy

8.1.3 Distributed Cost Model

8.2 Centralized Query Optimization

8.2.1 Dynamic Query Optimization

8.2.2 Static Query Optimization

8.2.3 Hybrid Query Optimization

8.3 Join Ordering in Distributed Queries

8.3.1 Join Ordering

8.3.2 Semijoin Based Algorithms

8.3.3 Join versus Semijoin

8.4 Distributed Query Optimization

8.4.1 Dynamic Approach

8.4.2 Static Approach

8.4.3 Semijoin-based Approach

8.4.4 Hybrid Approach

8.5 Conclusion

8.6 Bibliographic Notes

Chapter 9: Multidatabase Query Processing

9.1 Issues in Multidatabase Query Processing

9.2 Multidatabase Query Processing Architecture

9.3 Query Rewriting Using Views

9.3.1 Datalog Terminology

9.3.2 Rewriting in GAV

9.3.3 Rewriting in LAV

9.4 Query Optimization and Execution

9.4.1 Heterogeneous Cost Modeling

9.4.2 Heterogeneous Query Optimization

9.4.3 Adaptive Query Processing

9.5 Query Translation and Execution

9.6 Conclusion

9.7 Bibliographic Notes

Chapter 10: Introduction to Transaction Management

10.1 Definition of a Transaction

10.1.1 Termination Conditions of Transactions

10.1.2 Characterization of Transactions

10.1.3 Formalization of the Transaction Concept

10.2 Properties of Transactions

10.2.1 Atomicity

10.2.2 Consistency

10.2.3 Isolation

10.2.4 Durability

10.3 Types of Transactions

10.3.1 Flat Transactions

10.3.2 Nested Transactions

10.3.3 Workflows

10.4 Architecture Revisited

10.5 Conclusion

10.6 Bibliographic Notes

Chapter 11: Distributed Concurrency Control

11.1 Serializability Theory

11.2 Taxonomy of Concurrency Control Mechanisms

11.3 Locking-Based Concurrency Control Algorithms

11.3.1 Centralized 2PL

11.3.2 Distributed 2PL

11.4 Timestamp-Based Concurrency Control Algorithms

11.4.1 Basic TO Algorithm

11.4.2 Conservative TO Algorithm

11.4.3 Multiversion TO Algorithm

11.5 Optimistic Concurrency Control Algorithms

11.6 Deadlock Management

11.6.1 Deadlock Prevention

11.6.2 Deadlock Avoidance

11.6.3 Deadlock Detection and Resolution

11.7 “Relaxed” Concurrency Control

11.7.1 Non-Serializable Histories

11.7.2 Nested Distributed Transactions

11.8 Conclusion

11.9 Bibliographic Notes

Chapter 12: Distributed DBMS Reliability

12.1 Reliability Concepts and Measures

12.1.1 System, State, and Failure

12.1.2 Reliability and Availability

12.1.3 Mean Time between Failures/Mean Time to Repair

12.2 Failures in Distributed DBMS

12.2.1 Transaction Failures

12.2.2 Site (System) Failures

12.2.3 Media Failures

12.2.4 Communication Failures

12.3 Local Reliability Protocols

12.3.1 Architectural Considerations

12.3.2 Recovery Information

12.3.3 Execution of LRM Commands

12.3.4 Checkpointing

12.3.5 Handling Media Failures

12.4 Distributed Reliability Protocols

12.4.1 Components of Distributed Reliability Protocols

12.4.2 Two-Phase Commit Protocol

12.4.3 Variations of 2PC

12.5 Dealing with Site Failures

12.5.1 Termination and Recovery Protocols for 2PC

12.5.1.1 Termination Protocols

12.5.1.2 Recovery Protocols

12.5.2 Three-Phase Commit Protocol

12.6 Network Partitioning

12.6.1 Centralized Protocols

12.6.2 Voting-based Protocols

12.7 Architectural Considerations

12.8 Conclusion

12.9 Bibliographic Notes

Chapter 13: Data Replication

13.1 Consistency of Replicated Databases

13.1.1 Mutual Consistency

13.1.2 Mutual Consistency versus Transaction Consistency

13.2 Update Management Strategies

13.2.1 Eager Update Propagation

13.2.2 Lazy Update Propagation

13.2.3 Centralized Techniques

13.2.4 Distributed Techniques

13.3 Replication Protocols

13.3.1 Eager Centralized Protocols

13.3.1.1 Single Master with Limited Replication Transparency

13.3.1.2 Single Master with Full Replication Transparency

13.3.1.3 Primary Copy with Full Replication Transparency

13.3.2 Eager Distributed Protocols

13.3.3 Lazy Centralized Protocols

13.3.3.1 Single Master with Limited Transparency

13.3.3.2 Single Master or Primary Copy with Full Replication Transparency

13.3.4 Lazy Distributed Protocols

13.4 Group Communication

13.5 Replication and Failures

13.5.1 Failures and Lazy Replication

13.5.2 Failures and Eager Replication

13.6 Replication Mediator Service

13.7 Conclusion

13.8 Bibliographic Notes

Chapter 14: Parallel Database Systems

14.1 Parallel Database System Architectures

14.1.1 Objectives

14.1.2 Functional Architecture

14.1.3 Parallel DBMS Architectures

14.1.3.1 Shared-Memory

14.1.3.2 Shared-Disk

14.1.3.3 Shared-Nothing

14.1.3.4 Hybrid Architectures

14.1.3.5 Discussion

14.2 Parallel Data Placement

14.3 Parallel Query Processing

14.3.1 Query Parallelism

14.3.1.1 Intra-operator Parallelism

14.3.1.2 Inter-operator Parallelism

14.3.2 Parallel Algorithms for Data Processing

14.3.3 Parallel Query Optimization

14.4 Load Balancing

14.4.1 Parallel Execution Problems

14.4.2 Intra-Operator Load Balancing

14.4.3 Inter-Operator Load Balancing

14.4.4 Intra-Query Load Balancing

14.5 Database Clusters

14.5.1 Database Cluster Architecture

14.5.2 Replication

14.5.3 Load Balancing

14.5.4 Query Processing

14.5.5 Fault-tolerance

14.6 Conclusion

14.7 Bibliographic Notes

Chapter 15: Distributed Object Database Management

15.1 Fundamental Object Concepts and Object Models

15.1.1 Object

15.1.2 Types and Classes

15.1.3 Composition (Aggregation)

15.1.4 Subclassing and Inheritance

15.2 Object Distribution Design

15.2.1 Horizontal Class Partitioning

15.2.2 Vertical Class Partitioning

15.2.3 Path Partitioning

15.2.4 Class Partitioning Algorithms

15.2.4.1 Affinity-based Approach

15.2.4.2 Cost-Driven Approach

15.2.5 Allocation

15.2.6 Replication

15.3 Architectural Issues

15.3.1 Alternative Client/Server Architectures

15.3.2 Cache Consistency

15.4 Object Management

15.4.1 Object Identifier Management

15.4.2 Pointer Swizzling

15.4.3 Object Migration

15.5 Distributed Object Storage

15.5.0.1 Object Clustering

15.5.0.2 Distributed Garbage Collection

15.6 Object Query Processing

15.6.1 Object Query Processor Architectures

15.6.2 Query Processing Issues

15.6.3 Query Execution

15.6.3.1 Path Indexes

15.6.3.2 Set Matching

Transaction Management

15.7.1 Correctness Criteria

15.7.1.1 Commutativity

15.7.1.2 Invalidation

15.7.1.3 Recoverability

15.7.2 Transaction Models and Object Structures

15.7.3 Transactions Management in Object DBMSs

15.7.4 Transactions as Objects

15.8 Conclusion

15.9 Bibliographic Notes

Chapter 16: Peer-to-Peer Data Management

16.1 Infrastructure

16.1.1 Unstructured P2P Networks

16.1.2 Structured P2P Networks

16.1.3 Super-peer P2P Networks

16.1.4 Comparison of P2P Networks

16.2 Schema Mapping in P2P Systems

16.2.1 Pairwise Schema Mapping

16.2.2 Mapping based on Machine Learning Techniques

16.2.3 Common Agreement Mapping

16.2.4 Schema Mapping using IR Techniques

16.3 Querying Over P2P Systems

16.3.1 Top-k Queries

16.3.2 Join Queries

16.3.3 Range Queries

16.4 Replica Consistency

Basic Support in DHTs

Data Currency in DHTs

Replica Reconciliation

16.5 Conclusion

16.6 Bibliographic Notes

Chapter 17: Web Data Management

17.1 Web Graph Management

17.1.1 Compressing Web Graphs

17.1.2 Storing Web Graphs as S-Nodes

17.2 Web Search

17.2.1 Web Crawling

17.2.2 Indexing

17.2.2.1 Structure Index

17.2.2.2 Text Index

17.2.3 Ranking and Link Analysis

17.2.4 Evaluation of Keyword Search

17.3 Web Querying

17.3.1 Semistructured Data Approach

17.3.2 Web Query Language Approach

17.3.3 Question Answering

17.3.4 Searching and Querying the Hidden Web

17.4 Distributed XML Processing

17.4.1 Overview of XML

17.4.2 XML Query Processing Techniques

17.4.3 Fragmenting XML Data

17.4.4 Optimizing Distributed XML Processing

17.4.4.1 Data Shipping versus Query Shipping

17.4.4.2 Localization and Pruning

17.5 Conclusion

17.6 Bibliographic Notes

Chapter 18: Computing Current Issues: Streaming Data and Cloud

18.1 Data Stream Management

18.1.1 Stream Data Models

18.1.2 Stream Query Languages

18.1.3 Streaming Operators and their Implementation

18.1.4 Query Processing

18.1.4.1 Queuing and Scheduling

18.1.4.2 Determining When Tuples Expire

18.1.4.3 Continuous Query Processing over SlidingWindows

18.1.4.4 Periodic Query Processing Over SlidingWindows

18.1.4.5 Query Processing over Windows Stored on Disk.

18.1.5 DSMS Query Optimization

18.1.5.1 Cost Metrics and Statistics

18.1.5.2 Query Rewriting and Adaptive Query Optimization

18.1.6 Load Shedding and Approximation

18.1.7 Multi-Query Optimization

18.1.8 Stream Mining

18.2 Cloud Data Management

18.2.1 Taxonomy of Clouds

18.2.2 Grid Computing

18.2.3 Cloud architectures

18.2.4 Data management in the cloud

18.2.4.1 Distributed File Management

18.2.4.2 Distributed Database Management

18.2.4.3 Parallel Data Processing

18.3 Conclusion

18.4 Bibliographic Notes

References

Index

Principles of Distributed Database Systems

M. Tamer Özsu • Patrick Valduriez Principles of Distributed Database Systems Third Edition

M. Tamer Özsu David R. Cheriton School of Computer Science University of Waterloo Waterloo Ontario Canada N2L 3G1 Tamer.Ozsu@uwaterloo.ca Patrick Valduriez INRIA LIRMM 161 rue Ada 34392 Montpellier Cedex France Patrick.Valduriez@inria.fr This book was previously published by: Pearson Education, Inc. e-ISBN 978-1-4419-8834-8 ISBN 978-1-4419-8833-1 DOI 10.1007/978-1-4419-8834-8 Springer New York Dordrecht Heidelberg London Library of Congress Control Number: 2011922491 © Springer Science+Business Media, LLC 2011 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY 10013, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer, software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights. Printed on acid-free paper Springer is part of Springer Science+Business Media (www.springer.com)

To my family and my parents M.T. ¨O. To Esther, my daughters Anna, Juliette and Sarah, and my parents P.V.

Preface It has been almost twenty years since the ﬁrst edition of this book appeared, and ten years since we released the second edition. As one can imagine, in a fast changing area such as this, there have been signiﬁcant changes in the intervening period. Distributed data management went from a potentially signiﬁcant technology to one that is common place. The advent of the Internet and the World Wide Web have certainly changed the way we typically look at distribution. The emergence in recent years of different forms of distributed computing, exempliﬁed by data streams and cloud computing, has regenerated interest in distributed data management. Thus, it was time for a major revision of the material. We started to work on this edition ﬁve years ago, and it has taken quite a while to complete the work. The end result, however, is a book that has been heavily revised – while we maintained and updated the core chapters, we have also added new ones. The major changes are the following: 1. Database integration and querying is now treated in much more detail, re- ﬂecting the attention these topics have received in the community in the past decade. Chapter 4 focuses on the integration process, while Chapter 9 discusses querying over multidatabase systems. 2. The previous editions had only brief discussion of data replication protocols. This topic is now covered in a separate chapter (Chapter 13) where we provide an in-depth discussion of the protocols and how they can be integrated with transaction management. 3. Peer-to-peer data management is discussed in depth in Chapter 16. These systems have become an important and interesting architectural alternative to classical distributed database systems. Although the early distributed database systems architectures followed the peer-to-peer paradigm, the modern incar- nation of these systems have fundamentally different characteristics, so they deserve in-depth discussion in a chapter of their own. 4. Web data management is discussed in Chapter 17. This is a difﬁcult topic to cover since there is no unifying framework. We discuss various aspects vii

分享到：

赞收藏

资料库

Principles of Distributed Database Systems， 3rd Edition.pdf

相关推荐

数据库

热门标签

最新资料