logo资料库

Spinger Replication Theory and Practice.pdf

第1页 / 共298页
第2页 / 共298页
第3页 / 共298页
第4页 / 共298页
第5页 / 共298页
第6页 / 共298页
第7页 / 共298页
第8页 / 共298页
资料共298页,剩余部分请下载后查看
3642112935
Title Page
Replication // Theory and Practice
Preface
Contents
List of Authors
Consistency Models for Replicated Data
Introduction
Contributions
Defining the Sequential Data Type
Strong Consistency
Relaxing Inter-Client Operation Ordering
Weak Consistency
Transactions
Discussion
Conclusion
References
Replication Techniques for Availability
Introduction
Model
Environment
Specification
Fail-Stop Failure Model
Primary-Backup
Chain Replication
Queries
Crash Failure Model
Quorums
Stake Replication
Broker Replication
Recovery and Reconfiguration
Conclusion
References
Modular Approach to Replication for Availability
Introduction
Atomic Broadcast for State Machine Replication
The Consensus Problem, or How to Implement Atomic Broadcast in a Modular Way
Consensus
Implementation of Atomic Broadcast
Solving Consensus
About System Models
Partially Synchronous Systems
Asynchronous System Augmented with Failure Detectors
Discussion
Generic Broadcast
Dynamic Groups
Group Membership Service
Group Communication in Dynamic Groups
Conclusion
References
Stumbling over Consensus Research: Misunderstandings and Issues
Introduction
Misunderstandings
Asynchronous Systems
Eventually-Forever Assumptions
Eventual Guarantees
The Consensus Impossibility Result
Uses of Replication
Correlated Failures
Issues
The Application Interface
Violation of Abstraction Boundaries
Ambiguities and Errors
Unfriendly Formalisms
Lack of Feedback from Practitioners
Hidden Limitations in Algorithms
Conclusion
References
Replicating for Performance: Case Studies
Introduction
Replication Strategies
Replica Placement
Content Distribution
Strategy Evaluation
Replication Granularity
Example 1: Content Delivery Networks
Example 2: Edge-Server Computing
Example 3: Decentralized Wikipedia
Replicating for Performance versus Consistency
Replication Management
Conclusions
References
A History of the Virtual Synchrony Replication Model
Introduction
Distributed Consistency: Who Needs It?
Goals in This Chapter
Historical Context
Resilient Objects in Isis V1.0
Beyond Resilient Objects
The Isis Toolkit and the Virtual Synchrony Model
A Design Feature Motivated by Performance Considerations
Dynamic Membership
Local Reads and Fast Updates
Partitionable Views
Causally Ordered Multicast: cbcast
Time-Critical Applications
A Series of Commercial Successes, but Ultimately, a Market Failure
How Replication Was Used
Causal and Other Controversies
What Next? Live Objects and Quicksilver Scalable Multicast!
Closing Thoughts
References
From Viewstamped Replication to Byzantine Fault Tolerance
Introduction
Prehistory
Viewstamped Replication
Replica Groups
Architecture
Approach
The VR Protocol
Normal Operation
View Changes
Recovery
Discussion of VR
Differences from the Original
Two-Phase Commit
Optimizations
Performance in the Normal Case
Performance of View Changes
State Management
Non-deterministic Operations
Byzantine Fault Tolerance
Approach
The PBFT Protocol
View Changes
Discussion of PBFT
Cryptography
Optimizations
Selecting the Primary
Recovery
Non-determinism
Conclusions
References
Implementing Trustworthy Services Using Replicated State Machines
Introduction
The State-Machine Approach
Compromise and Proactive Recovery
Service Key Refresh and Scalability
Service Private Keys
Proactive Secret Sharing
Server Key Refresh
Trusted Hardware
Offline Keys
Attack Awareness
Processor Independence
Replica Coordination
Computing with Server Confidential Data
Discussion
References
State Machine Replication with Byzantine Faults
Introduction
Building Blocks
Broadcast Primitives
Distributed Cryptography
Byzantine Consensus
Atomic Broadcast Protocols
Consensus-Based Atomic Broadcast
Sequencer-Based Atomic Broadcast
Hybrid Atomic Broadcast
Service Replication
Replicating Cryptographic Services
Handling Responses Securely
Preserving Causality of Requests
Conclusion
References
Selected Results from the Latest Decade of Quorum Systems Research
Introduction
Quorum Systems for Byzantine Faults
Access Strategies and Load
Probabilistic Quorum Systems
Minimizing Delays of Quorum Accesses
Uses of Byzantine Quorums in Protocols
Read-Overwrite Protocols
State-Machine-Replication Protocols
Conclusion
References
From Object Replication to Database Replication
Introduction
Replication Model and Consistency
Generic Functional Model
Object and Database Consistency
From Object Replication to Database Replication: Multi-primary Passive Replication
Deferred Update Database Replication
Additional Definitions
Atomic Commit-Based Termination
Atomic Broadcast-Based Termination
Reordering-Based Termination
Generic Broadcast-Based Termination
Final Remarks
References
Database Replication: A Tutorial
Introduction
Why Replication
Organization of the Chapter
Basic Taxonomy for Replica Control Approaches
Eager Primary Copy
Eager Update Anywhere
Lazy Primary Copy
Lazy Update Anywhere
Eager vs. Lazy
Correctness Criteria
Atomicity and Consistency
Isolation
Session Consistency
Other Parameters
Message Management
Executing Writes
Concurrency Control Mechanisms
Architectural Alternatives
Cluster vs. WAN Replication
Degree of Replication
Recovery
Existing Systems
Early Work
Commercial Systems
Lazy Replication Made Serializable
Cluster Replication
Other Issues
Related Areas of Research
Conclusions
References
Practical Database Replication
Introduction
An Architecture for Practical Database Replication
Reflector: Replication-Friendly Database Support
Reflection for Replication
Processing Stages
Processing Contexts
Base-Level and Meta-level Calls
Exception Handling
Existing Reflector Bindings
GCS: Communication and Coordination Support
Architectural and Algorithmic Issues
Existing GCS Bindings
Replicator: Pluggable Replication Protocols
Consistent Database Replication
Replication with Conservative Execution
Replication with Optimistic Execution
Active Replication
Hybrid Replication
Evaluation
Conclusions
References
Index
Lecture Notes in Computer Science Commenced Publication in 1973 Founding and Former Series Editors: Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen 5959 Editorial Board David Hutchison Lancaster University, UK Takeo Kanade Carnegie Mellon University, Pittsburgh, PA, USA Josef Kittler University of Surrey, Guildford, UK Jon M. Kleinberg Cornell University, Ithaca, NY, USA Alfred Kobsa University of California, Irvine, CA, USA Friedemann Mattern ETH Zurich, Switzerland John C. Mitchell Stanford University, CA, USA Moni Naor Weizmann Institute of Science, Rehovot, Israel Oscar Nierstrasz University of Bern, Switzerland C. Pandu Rangan Indian Institute of Technology, Madras, India Bernhard Steffen TU Dortmund University, Germany Madhu Sudan Microsoft Research, Cambridge, MA, USA Demetri Terzopoulos University of California, Los Angeles, CA, USA Doug Tygar University of California, Berkeley, CA, USA Gerhard Weikum Max-Planck Institute of Computer Science, Saarbruecken, Germany
Bernadette Charron-Bost Fernando Pedone André Schiper (Eds.) Replication Theory and Practice 1 3
Volume Editors Bernadette Charron-Bost École Polytechnique, CNRS Laboratoire d’Informatique (LIX) 91128 Palaiseau CEDEX, France E-mail: charron@lix.polytechnique.fr Fernando Pedone Università della Svizzera italiana (USI) Facoltà di Scienze informatiche Via Giuseppe Buffi 6, 6900 Lugano, Switzerland E-mail: fernando.pedone@usi.ch André Schiper École Polytechnique Fédérale de Lausanne (EPFL) School of Computer and Communication Sciences Station 14, 1015 Lausanne, Switzerland E-mail: andre.schiper@epfl.ch Cover illustration: M.C. Escher’s “Symmetry Drawing E126” © 2010 The M.C. Escher Company–Holland. All rights reserved. www.mcescher.com Library of Congress Control Number: 2009941316 CR Subject Classification (1998): H.2-4, H.2.4, C.4, C.2.4, E.3, E.1 LNCS Sublibrary: SL 1 – Theoretical Computer Science and General Issues ISSN ISBN-10 ISBN-13 0302-9743 3-642-11293-5 Springer Berlin Heidelberg New York 978-3-642-11293-5 Springer Berlin Heidelberg New York This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965, in its current version, and permission for use must always be obtained from Springer. Violations are liable to prosecution under the German Copyright Law. springer.com © Springer-Verlag Berlin Heidelberg 2010 Printed in Germany Typesetting: Camera-ready by author, data conversion by Markus Richter, Heidelberg Printed on acid-free paper SPIN: 12823659 06/3180 5 4 3 2 1 0
Preface This book is the result of the seminar “A 30-Year Perspective on Replication,” which took place at Monte Verit`a, Ascona, Switzerland, in November 2007. As suggested by the title, the goal of the seminar was not to speculate about the future of repli- cation, but rather to understand the present, by analyzing past successes and past failures, and to make an assessment of about 30 years of research on replication. Replication is a topic addressed by several communities: the distributed computing community, the distributed system community, and the database community. Each of these communities has looked at replication from different points of view and with different goals, e.g., performance vs. fault tolerance. Recently, these different goals have started to converge, and there has been work showing that efficiency and strong consistency can sometimes be reconciled. During the seminar the observation was made that we had reached a point of understanding of the different issues of replication, and this knowledge should be materialized in a book covering the different aspects of replication. This book results from this observation. Its goal is to present a comprehensive view of the achieve- ments of 30 years of research on replication. The book was written by most of the people who have contributed to developing the state-of-the-art replication tech- niques. It brings a comprehensive view of existing solutions, from a theoretical as well as from a practical point of view. It covers replication of processes/objects and of databases; replication for fault tolerance and replication for performance; benign faults and malicious (Byzantine) faults. By covering these different issues in an in- tegrated way, we believe the book fills a gap, and as such it should find a place in the graduate teaching of distributed computing, distributed systems, and databases. The book is organized in thirteen chapters. Chapter 1 introduces consistency models for replicated data, both in the context of process/object and database repli- cation. Chapter 2 discusses replication techniques commonly used in process repli- cation, focusing on primary back-up and related techniques; it considers both the fail-stop and the crash failure models. Chapter 3 considers modular approaches to process replication; it starts with state-machine replication based on atomic broad- cast and shows how this can be built on top of consensus. Although the litera- ture on consensus is vast, there are many misunderstandings, often involving dif-
VI Preface ferent communities. Chapter 4 discusses these misunderstandings. Chapter 5 cov- ers replication for performance; it contains different strategies and examples, and discusses trade-offs. Chapters 6 and 7 provide a historical account of the Virtual Synchrony Replication Model and Viewstamped Replication, two early replication systems and how they have evolved over the years. Chapters 8 and 9 are dedicated to state-machine replication with Byzantine faults; the first considers distributed trust systems, and the second introduces protocols for state-machine replication. Chapter 10 surveys Byzantine quorum systems, suitable for use when parts of the system cannot be trusted. Chapters 11 through 13 consider database replication. Chapter 11 bridges the gap between process/object replication and database repli- cation, while Chap. 12 surveys database replication techniques; it discusses differ- ent replication approaches, consistency criteria for replicated databases and existing systems. Chapter 13 illustrates database replication with a case study: the details of an architecture for practical database replication. Each one of the chapters in the book is self-contained, and can be read individu- ally. Readers interested in certain specific aspects of replication, however, may pre- fer to focus on some of the chapters. Chapters 1 and 11 through 13 provide a detailed description of replication in the context of databases. Theoretical aspects of replica- tion under benign failures are discussed in Chapters 1, 3 and 4. Chapters 5, 12 and 13 cover many issues involving practical replication issues. Chapters 8 through 10 address replication under malign failures (i.e., Byzantine failures). Readers mostly interested in historical aspects of replication should read Chaps. 6 and 7. The Monte Verit`a seminar organizers are thankful to all the participants for ac- cepting to take part in this unique seminar, and to all authors for taking their time to produce this book. We would also like to thank a number of institutions for the fi- nancial support to the seminar: the Monte Verit`a Foundation, the Hasler Foundation, Microsoft, Eidgen¨ossische Technische Hochschule Z¨urich (ETHZ), ´Ecole Polytech- nique F´ed´erale de Lausanne (EPFL), Universit`a della Svizzera italiana (USI), and the ´Ecole polytechnique in Palaiseau. October 2009 Bernadette Charron-Bost Fernando Pedone Andr´e Schiper
List of Authors Marcos K. Aguilera Gustavo Alonso Ken Birman Christian Cachin Nuno Carvalho Alfrˆanio Correia Jr. Alan D. Fekete Rachid Guerraoui Ricardo Jim´enez-Peris Bettina Kemme Barbara Liskov Michael G. Merideth Rui Oliveira Marta Pati˜no-Mart´ınez Fernando Pedone Jos´e Pereira Guillaume Pierre Krithi Ramamritham Michael K. Reiter Lu´ıs Rodrigues Andr´e Schiper Fred B. Schneider Robbert van Renesse Maarten van Steen Lidong Zhou Microsoft Research Silicon Valley, USA ETHZ, Switzerland Cornell University, USA IBM Research - Zurich, Switzerland Instituto Superior T´ecnico/INESC-ID, Portugal Universidade do Minho, Portugal University of Sydney, Australia EPFL, Switzerland Universidad Polit´ecnica de Madrid, Spain McGill University, Canada MIT, USA Carnegie Mellon University, USA Universidade do Minho, Portugal Universidad Polit´ecnica de Madrid, Spain University of Lugano, Switzerland Universidade do Minho, Portugal VU University Amsterdam, The Netherlands I.I.T. Bombay, India University of North Carolina, USA Instituto Superior T´ecnico/INESC-ID, Portugal EPFL, Switzerland Cornell University, USA Cornell University, USA VU University Amsterdam, The Netherlands Microsoft Research Asia, China
Contents 1 Consistency Models for Replicated Data . . . . . . . . . . . . . . . . . . . . . . . . . 1 Alan D. Fekete and Krithi Ramamritham 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Defining the Sequential Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Strong Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Relaxing Inter-Client Operation Ordering . . . . . . . . . . . . . . . . . . 1 2 2 4 7 1.4 Weak Consistency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5 Transactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 2 Replication Techniques for Availability . . . . . . . . . . . . . . . . . . . . . . . . . . 19 Robbert van Renesse and Rachid Guerraoui 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 2.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2.1 Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.2.2 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.3 Fail-Stop Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 2.3.1 Primary-Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.3.2 Chain Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 2.3.3 Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 2.4 Crash Failure Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.4.1 Quorums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.2 Stake Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 2.4.3 Broker Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 2.5 Recovery and Reconfiguration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
分享到:
收藏