Cover
Principles of Distributed Database Systems, Third Edition
ISBN 9781441988331
Preface
Contents
Chapter 1: Introduction
1.1 Distributed Data Processing
1.2 What is a Distributed Database System?
1.3 Data Delivery Alternatives
1.4 Promises of DDBSs
1.4.1 Transparent Management of Distributed and Replicated Data
1.4.2 Reliability Through Distributed Transactions
1.4.3 Improved Performance
1.4.4 Easier System Expansion
1.5 Complications Introduced by Distribution
1.6 Design Issues
1.6.1 Distributed Database Design
1.6.2 Distributed Directory Management
1.6.3 Distributed Query Processing
1.6.4 Distributed Concurrency Control
1.6.5 Distributed Deadlock Management
1.6.6 Reliability of Distributed DBMS
1.6.7 Replication
1.6.8 Relationship among Problems
1.6.9 Additional Issues
1.7 Distributed DBMS Architecture
1.7.1 ANSI/SPARC Architecture
1.7.2 A Generic Centralized DBMS Architecture
1.7.3 Architectural Models for Distributed DBMSs
1.7.4 Autonomy
1.7.5 Distribution
1.7.6 Heterogeneity
1.7.7 Architectural Alternatives
1.7.8 Client/Server Systems
1.7.9 Peer-to-Peer Systems
1.7.10 Multidatabase System Architecture
1.8 Bibliographic Notes
Chapter 2:
Background
2.1 Overview of Relational DBMS
2.1.1 Relational Database Concepts
2.1.2 Normalization
2.1.3 Relational Data Languages
2.1.3.1 Relational Algebra
2.1.3.2 Relational Calculus
2.2 Review of Computer Networks
2.2.1 Types of Networks
2.2.2 Communication Schemes
2.2.3 Data Communication Concepts
2.2.4 Communication Protocols
2.3 Bibliographic Notes
Chapter 3:
Distributed Database Design
3.1 Top-Down Design Process
3.2 Distribution Design Issues
3.2.1 Reasons for Fragmentation
3.2.2 Fragmentation Alternatives
3.2.3 Degree of Fragmentation
3.2.4 Correctness Rules of Fragmentation
3.2.5 Allocation Alternatives
3.2.6 Information Requirements
3.3 Fragmentation
3.3.1 Horizontal Fragmentation
3.3.1.1 Information Requirements of Horizontal Fragmentation
3.3.1.2 Primary Horizontal Fragmentation
3.3.1.3 Derived Horizontal Fragmentation
3.3.1.4 Checking for Correctness
3.3.2 Vertical Fragmentation
3.3.3 Hybrid Fragmentation
3.4 Allocation
3.4.1 Allocation Problem
3.4.2 Information Requirements
3.4.3 Allocation Model
3.4.4 Solution Methods
3.5 Data Directory
3.6 Conclusion
3.7 Bibliographic Notes
Chapter 4:
Database Integration
4.1 Bottom-Up Design Methodology
4.2 Schema Matching
4.2.1 Schema Heterogeneity
4.2.2 Linguistic Matching Approaches
4.2.3 Constraint-based Matching Approaches
4.2.4 Learning-based Matching
4.2.5 Combined Matching Approaches
4.3 Schema Integration
4.4 Schema Mapping
4.4.1 Mapping Creation
4.4.2 Mapping Maintenance
4.5 Data Cleaning
4.6 Conclusion
4.7 Bibliographic Notes
Chapter 5:
Data and Access Control
5.1 View Management
5.1.1 Views in Centralized DBMSs
5.1.2 Views in Distributed DBMSs
5.1.3 Maintenance of Materialized Views
5.2 Data Security
5.2.1 Discretionary Access Control
5.2.2 Multilevel Access Control
5.2.3 Distributed Access Control
5.3 Semantic Integrity Control
5.3.1 Centralized Semantic Integrity Control
5.3.1.1 Specification of Integrity Constraints
5.3.1.2 Integrity Enforcement
5.3.2 Distributed Semantic Integrity Control
5.3.2.1 Definition of Distributed Integrity Constraints
5.3.2.2 Enforcement of Distributed Integrity Assertions
5.3.2.3 Summary of Distributed Integrity Control
5.4 Conclusion
5.5 Bibliographic Notes
Chapter 6:
Overview of Query Processing
6.1 Query Processing Problem
6.2 Objectives of Query Processing
6.3 Complexity of Relational Algebra Operations
6.4 Characterization of Query Processors
6.4.1 Languages
6.4.2 Types of Optimization
6.4.3 Optimization Timing
6.4.4 Statistics
6.4.5 Decision Sites
6.4.6 Exploitation of the Network Topology
6.4.7 Exploitation of Replicated Fragments
6.4.8 Use of Semijoins
6.5 Layers of Query Processing
Query Decomposition
Data Localization
Global Query Optimization
Distributed Query Execution
6.6 Conclusion
6.7 Bibliographic Notes
Chapter 7:
Query Decomposition and Data Localization
7.1 Query Decomposition
7.1.1 Normalization
7.1.2 Analysis
7.1.3 Elimination of Redundancy
7.1.4 Rewriting
7.2 Localization of Distributed Data
7.2.1 Reduction for Primary Horizontal Fragmentation
7.2.2 Reduction for Vertical Fragmentation
7.2.3 Reduction for Derived Fragmentation
7.2.4 Reduction for Hybrid Fragmentation
7.3 Conclusion
7.4 Bibliographic NOTES
Chapter 8:
Optimization of Distributed Queries
8.1 Query Optimization
8.1.1 Search Space
8.1.2 Search Strategy
8.1.3 Distributed Cost Model
8.2 Centralized Query Optimization
8.2.1 Dynamic Query Optimization
8.2.2 Static Query Optimization
8.2.3 Hybrid Query Optimization
8.3 Join Ordering in Distributed Queries
8.3.1 Join Ordering
8.3.2 Semijoin Based Algorithms
8.3.3 Join versus Semijoin
8.4 Distributed Query Optimization
8.4.1 Dynamic Approach
8.4.2 Static Approach
8.4.3 Semijoin-based Approach
8.4.4 Hybrid Approach
8.5 Conclusion
8.6 Bibliographic Notes
Chapter 9:
Multidatabase Query Processing
9.1 Issues in Multidatabase Query Processing
9.2 Multidatabase Query Processing Architecture
9.3 Query Rewriting Using Views
9.3.1 Datalog Terminology
9.3.2 Rewriting in GAV
9.3.3 Rewriting in LAV
9.4 Query Optimization and Execution
9.4.1 Heterogeneous Cost Modeling
9.4.2 Heterogeneous Query Optimization
9.4.3 Adaptive Query Processing
9.5 Query Translation and Execution
9.6 Conclusion
9.7 Bibliographic Notes
Chapter 10:
Introduction to Transaction Management
10.1 Definition of a Transaction
10.1.1 Termination Conditions of Transactions
10.1.2 Characterization of Transactions
10.1.3 Formalization of the Transaction Concept
10.2 Properties of Transactions
10.2.1 Atomicity
10.2.2 Consistency
10.2.3 Isolation
10.2.4 Durability
10.3 Types of Transactions
10.3.1 Flat Transactions
10.3.2 Nested Transactions
10.3.3 Workflows
10.4 Architecture Revisited
10.5 Conclusion
10.6 Bibliographic Notes
Chapter 11:
Distributed Concurrency Control
11.1 Serializability Theory
11.2 Taxonomy of Concurrency Control Mechanisms
11.3 Locking-Based Concurrency Control Algorithms
11.3.1 Centralized 2PL
11.3.2 Distributed 2PL
11.4 Timestamp-Based Concurrency Control Algorithms
11.4.1 Basic TO Algorithm
11.4.2 Conservative TO Algorithm
11.4.3 Multiversion TO Algorithm
11.5 Optimistic Concurrency Control Algorithms
11.6 Deadlock Management
11.6.1 Deadlock Prevention
11.6.2 Deadlock Avoidance
11.6.3 Deadlock Detection and Resolution
11.7 “Relaxed” Concurrency Control
11.7.1 Non-Serializable Histories
11.7.2 Nested Distributed Transactions
11.8 Conclusion
11.9 Bibliographic Notes
Chapter 12:
Distributed DBMS Reliability
12.1 Reliability Concepts and Measures
12.1.1 System, State, and Failure
12.1.2 Reliability and Availability
12.1.3 Mean Time between Failures/Mean Time to Repair
12.2 Failures in Distributed DBMS
12.2.1 Transaction Failures
12.2.2 Site (System) Failures
12.2.3 Media Failures
12.2.4 Communication Failures
12.3 Local Reliability Protocols
12.3.1 Architectural Considerations
12.3.2 Recovery Information
12.3.3 Execution of LRM Commands
12.3.4 Checkpointing
12.3.5 Handling Media Failures
12.4 Distributed Reliability Protocols
12.4.1 Components of Distributed Reliability Protocols
12.4.2 Two-Phase Commit Protocol
12.4.3 Variations of 2PC
12.5 Dealing with Site Failures
12.5.1 Termination and Recovery Protocols for 2PC
12.5.1.1 Termination Protocols
12.5.1.2 Recovery Protocols
12.5.2 Three-Phase Commit Protocol
12.6 Network Partitioning
12.6.1 Centralized Protocols
12.6.2 Voting-based Protocols
12.7 Architectural Considerations
12.8 Conclusion
12.9 Bibliographic Notes
Chapter 13:
Data Replication
13.1 Consistency of Replicated Databases
13.1.1 Mutual Consistency
13.1.2 Mutual Consistency versus Transaction Consistency
13.2 Update Management Strategies
13.2.1 Eager Update Propagation
13.2.2 Lazy Update Propagation
13.2.3 Centralized Techniques
13.2.4 Distributed Techniques
13.3 Replication Protocols
13.3.1 Eager Centralized Protocols
13.3.1.1 Single Master with Limited Replication Transparency
13.3.1.2 Single Master with Full Replication Transparency
13.3.1.3 Primary Copy with Full Replication Transparency
13.3.2 Eager Distributed Protocols
13.3.3 Lazy Centralized Protocols
13.3.3.1 Single Master with Limited Transparency
13.3.3.2 Single Master or Primary Copy with Full Replication Transparency
13.3.4 Lazy Distributed Protocols
13.4 Group Communication
13.5 Replication and Failures
13.5.1 Failures and Lazy Replication
13.5.2 Failures and Eager Replication
13.6 Replication Mediator Service
13.7 Conclusion
13.8 Bibliographic Notes
Chapter 14:
Parallel Database Systems
14.1 Parallel Database System Architectures
14.1.1 Objectives
14.1.2 Functional Architecture
14.1.3 Parallel DBMS Architectures
14.1.3.1 Shared-Memory
14.1.3.2 Shared-Disk
14.1.3.3 Shared-Nothing
14.1.3.4 Hybrid Architectures
14.1.3.5 Discussion
14.2 Parallel Data Placement
14.3 Parallel Query Processing
14.3.1 Query Parallelism
14.3.1.1 Intra-operator Parallelism
14.3.1.2 Inter-operator Parallelism
14.3.2 Parallel Algorithms for Data Processing
14.3.3 Parallel Query Optimization
14.4 Load Balancing
14.4.1 Parallel Execution Problems
14.4.2 Intra-Operator Load Balancing
14.4.3 Inter-Operator Load Balancing
14.4.4 Intra-Query Load Balancing
14.5 Database Clusters
14.5.1 Database Cluster Architecture
14.5.2 Replication
14.5.3 Load Balancing
14.5.4 Query Processing
14.5.5 Fault-tolerance
14.6 Conclusion
14.7 Bibliographic Notes
Chapter 15:
Distributed Object Database Management
15.1 Fundamental Object Concepts and Object Models
15.1.1 Object
15.1.2 Types and Classes
15.1.3 Composition (Aggregation)
15.1.4 Subclassing and Inheritance
15.2 Object Distribution Design
15.2.1 Horizontal Class Partitioning
15.2.2 Vertical Class Partitioning
15.2.3 Path Partitioning
15.2.4 Class Partitioning Algorithms
15.2.4.1 Affinity-based Approach
15.2.4.2 Cost-Driven Approach
15.2.5 Allocation
15.2.6 Replication
15.3 Architectural Issues
15.3.1 Alternative Client/Server Architectures
15.3.2 Cache Consistency
15.4 Object Management
15.4.1 Object Identifier Management
15.4.2 Pointer Swizzling
15.4.3 Object Migration
15.5 Distributed Object Storage
15.5.0.1 Object Clustering
15.5.0.2 Distributed Garbage Collection
15.6 Object Query Processing
15.6.1 Object Query Processor Architectures
15.6.2 Query Processing Issues
15.6.3 Query Execution
15.6.3.1 Path Indexes
15.6.3.2 Set Matching
Transaction Management
15.7.1 Correctness Criteria
15.7.1.1 Commutativity
15.7.1.2 Invalidation
15.7.1.3 Recoverability
15.7.2 Transaction Models and Object Structures
15.7.3 Transactions Management in Object DBMSs
15.7.4 Transactions as Objects
15.8 Conclusion
15.9 Bibliographic Notes
Chapter 16:
Peer-to-Peer Data Management
16.1 Infrastructure
16.1.1 Unstructured P2P Networks
16.1.2 Structured P2P Networks
16.1.3 Super-peer P2P Networks
16.1.4 Comparison of P2P Networks
16.2 Schema Mapping in P2P Systems
16.2.1 Pairwise Schema Mapping
16.2.2 Mapping based on Machine Learning Techniques
16.2.3 Common Agreement Mapping
16.2.4 Schema Mapping using IR Techniques
16.3 Querying Over P2P Systems
16.3.1 Top-k Queries
16.3.2 Join Queries
16.3.3 Range Queries
16.4 Replica Consistency
Basic Support in DHTs
Data Currency in DHTs
Replica Reconciliation
16.5 Conclusion
16.6 Bibliographic Notes
Chapter 17:
Web Data Management
17.1 Web Graph Management
17.1.1 Compressing Web Graphs
17.1.2 Storing Web Graphs as S-Nodes
17.2 Web Search
17.2.1 Web Crawling
17.2.2 Indexing
17.2.2.1 Structure Index
17.2.2.2 Text Index
17.2.3 Ranking and Link Analysis
17.2.4 Evaluation of Keyword Search
17.3 Web Querying
17.3.1 Semistructured Data Approach
17.3.2 Web Query Language Approach
17.3.3 Question Answering
17.3.4 Searching and Querying the Hidden Web
17.4 Distributed XML Processing
17.4.1 Overview of XML
17.4.2 XML Query Processing Techniques
17.4.3 Fragmenting XML Data
17.4.4 Optimizing Distributed XML Processing
17.4.4.1 Data Shipping versus Query Shipping
17.4.4.2 Localization and Pruning
17.5 Conclusion
17.6 Bibliographic Notes
Chapter 18:
Computing
Current Issues: Streaming Data and Cloud
18.1 Data Stream Management
18.1.1 Stream Data Models
18.1.2 Stream Query Languages
18.1.3 Streaming Operators and their Implementation
18.1.4 Query Processing
18.1.4.1 Queuing and Scheduling
18.1.4.2 Determining When Tuples Expire
18.1.4.3 Continuous Query Processing over SlidingWindows
18.1.4.4 Periodic Query Processing Over SlidingWindows
18.1.4.5 Query Processing over Windows Stored on Disk.
18.1.5 DSMS Query Optimization
18.1.5.1 Cost Metrics and Statistics
18.1.5.2 Query Rewriting and Adaptive Query Optimization
18.1.6 Load Shedding and Approximation
18.1.7 Multi-Query Optimization
18.1.8 Stream Mining
18.2 Cloud Data Management
18.2.1 Taxonomy of Clouds
18.2.2 Grid Computing
18.2.3 Cloud architectures
18.2.4 Data management in the cloud
18.2.4.1 Distributed File Management
18.2.4.2 Distributed Database Management
18.2.4.3 Parallel Data Processing
18.3 Conclusion
18.4 Bibliographic Notes
References
Index