Fault-Tolerant Message-Passing Distributed Systems. An Algorithm....pdf

发布时间：2022-05-29 发布人：admin 分类：说明书资料大小：3.97M 资料格式：pdf 举报版权申诉

ystweid-10821067-4744302543293029281.pdf-第1页.png

第1页 / 共465页

ystweid-10821067-4744302543293029281.pdf-第2页.png

第2页 / 共465页

ystweid-10821067-4744302543293029281.pdf-第3页.png

第3页 / 共465页

ystweid-10821067-4744302543293029281.pdf-第4页.png

第4页 / 共465页

ystweid-10821067-4744302543293029281.pdf-第5页.png

第5页 / 共465页

ystweid-10821067-4744302543293029281.pdf-第6页.png

第6页 / 共465页

ystweid-10821067-4744302543293029281.pdf-第7页.png

第7页 / 共465页

ystweid-10821067-4744302543293029281.pdf-第8页.png

第8页 / 共465页

Preface

Contents

Notation

Figures & Algorithms

Tables

1 Definitions & Introductory Examples

1.1 A Few Definitions Related to Distributed Computing

1.2 Example 1: Common Decision Despite Message Losses

1.2.1 The Problem

1.2.2 Trying to Solve the Problem: Attempt 1

1.2.3 Trying to Solve the Problem: Attempt 2

1.2.4 An Impossibility Result

1.2.5 A Coordination Problem

1.3 Example 2: Computing a Global Function Despite a Message Adversary

1.3.1 The Problem

1.3.2 The Notion of a Message Adversary

1.3.3 The TREE-AD Message Adversary

1.3.4 From Message Adversary to Process Mobility

1.4 Main Distributed Computing Models Used in This Book

1.5 Distributed Computing Versus Parallel Computing

1.6 Summary

1.7 Bibliographic Notes

1.8 Exercises and Problems

--- Reliable Broadcast Communication Abstraction

2 In Presence of Process Crash Failures

2.1 Uniform Reliable Broadcast

2.1.1 From Best Effort to Guaranteed Reliability

2.1.2 Uniform Reliable Broadcast (URB-broadcast)

2.1.3 Building the URB-broadcast Abstraction in CAMPn,t[∅]

2.2 Adding Quality of Service

2.2.1 “First In, First Out” (FIFO) Message Delivery

2.2.2 “Causal Order” (CO) Message Delivery

2.2.3 From FIFO-broadcast to CO-broadcast

2.2.4 From URB-broadcast to CO-broadcast: Capturing Causal Past in a Vector

2.2.5 The Total Order Broadcast Abstraction Requires More

2.3 Summary

2.4 Bibliographic Notes

2.5 Exercises and Problems

3 In Presence of Process Crashes & Unreliable Channels

3.1 A System Model with Unreliable Channels

3.1.1 Fairness Notions for Channels

3.1.2 Fair Channel (FC) and Fair Lossy Channel

3.1.3 Reliable Channel in the Presence of Process Crashes

3.1.4 System Model

3.2 URB-broadcast in CAMPn,t[- FC]

3.2.1 URB-broadcast in CAMPn,t[- FC, t < n/2]

3.2.2 An Impossibility Result

3.3 Failure Detectors: an Approach to Circumvent Impossibilities

3.3.1 The Concept of a Failure Detector

3.3.2 Formal Definitions

3.4 URB-broadcast in CAMPn,t[- FC] Enriched with a Failure Detector

3.4.1 Definition of the Failure Detector Class Θ

3.4.2 Solving URB-broadcast in CAMPn,t[- FC, Θ]

3.4.3 Building a Failure Detector Θ in CAMPn,t[- FC, t < n/2]

3.4.4 The Fundamental Added Value Supplied by a Failure Detector

3.5 Quiescent Uniform Reliable Broadcast

3.5.1 The Quiescence Property

3.5.2 Quiescent URB-broadcast Based on a Perfect Failure Detector

3.5.3 The Class HB of Heartbeat Failure Detectors

3.5.4 Quiescent URB-broadcast in CAMPn,t[- FC, Θ,HB]

3.6 Summary

3.7 Bibliographic Notes

3.8 Exercises and Problems

4 In Presence of Byzantine Processes

4.1 Byzantine Processes and Properties of the Model BAMPn,t[t < n/3]

4.2 The No-Duplicity Broadcast Abstraction

4.2.1 Definition

4.2.2 An Impossibility Result

4.2.3 A No-Duplicity Broadcast Algorithm

4.3 The Byzantine Reliable Broadcast Abstraction

4.4 An Optimal Byzantine Reliable Broadcast Algorithm

4.4.1 A Byzantine Reliable Broadcast Algorithm for BAMPn,t[t < n/3]

4.4.2 Correctness Proof

4.4.3 Benefiting from Message Asynchrony

4.5 Time and Message-Efficient Byzantine Reliable Broadcast

4.5.1 A Message-Efficient Byzantine Reliable Broadcast Algorithm

4.5.2 Correctness Proof

4.6 Summary

4.7 Bibliographic Notes

4.8 Exercises and Problems

--- R/W Register Communication Abstraction

5 R/W Register Abstraction

5.1 The Read/Write Register Abstraction

5.1.1 Concurrent Objects and Registers

5.1.2 The Notion of a Regular Register

5.1.3 Registers Defined from a Sequential Specification

5.2 A Formal Approach to Atomicity and Sequential Consistency

5.2.1 Processes, Operations, and Events

5.2.2 Histories

5.2.3 A Formal Definition of Atomicity

5.2.4 A Formal Definition of Sequential Consistency

5.3 Composability of Consistency Conditions

5.3.1 What Is Composability?

5.3.2 Atomicity Is Composable

5.3.3 Sequential Consistency Is Not Composable

5.4 Bounds on the Implementation of Strong Consistency Conditions

5.4.1 Upper Bound on t for Atomicity

5.4.2 Upper Bound on t for Sequential Consistency

5.4.3 Lower Bounds on the Durations of Read andWrite Operations

5.5 Summary

5.6 Bibliographic Notes

5.7 Exercises and Problems

6 R/W Registers despite Asynchrony & t < n/2

6.1 A Structural View

6.2 Building an SWMR Regular Read/Write Register in CAMPn,t[t < n/2]

6.2.1 Problem Specification

6.2.2 Implementing an SWMR Regular Register in CAMPn,t[t < n/2]

6.2.3 Proof of the SWMR Regular Register Construction

6.3 From an SWMR Regular Register to an SWMR Atomic Register

6.3.1 Why the Previous Algorithm Does Not Ensure Atomicity

6.3.2 From Regularity to Atomicity

6.4 From SWMR Atomic Register to MWMR Atomic Register

6.4.1 Replacing Sequence Numbers by Timestamps

6.4.2 Construction of an MWMR Atomic Register

6.4.3 Proof of the MWMR Atomic Register Construction

6.5 Implementing Sequentially Consistent Registers

6.5.1 How to Address the Non-composability of Sequential Consistency

6.5.2 Algorithms Based on a Total Order Broadcast Abstraction

6.5.3 A TO-broadcast-based Algorithm with Local (Fast) Read Operations

6.5.4 A TO-broadcast-based Algorithm with Local (Fast) Write Operations

6.5.5 An Algorithm Based on Logical Time

6.5.6 Proof of the Logical Time-based Algorithm

6.6 Summary

6.7 Bibliographic Notes

6.8 Exercises and Problems

7 Circumventing t < n/2 R/W Register Impossibility - Failure Detector Approach

7.1 The Class Σ of Quorum Failure Detectors

7.1.1 Definition of the Class of Quorum Failure Detectors

7.1.2 Implementing a Failure Detector Σ When t < n/2

7.1.3 A Σ-based Construction of an SWSR Atomic Register

7.2 Σ Is theWeakest Failure Detector to Build an Atomic Register

7.2.1 What Does “Weakest Failure Detector Class” Mean

7.2.2 The Extraction Algorithm

7.2.3 Correctness of the Extraction Algorithm

7.3 Comparing the Failure Detectors Classes Θ and Σ

7.4 Atomic Register Abstraction vs URB-broadcast Abstraction

7.4.1 From Atomic Registers to URB-broadcast

7.4.2 Atomic Registers Are Strictly Stronger than URB-broadcast

7.5 Summary

7.6 Bibliographic Notes

7.7 Exercise and Problem

8 Broadcast Abstraction suited to Family of R/W Implementable Objects

8.1 The SCD-broadcast Communication Abstraction

8.1.1 Definition

8.1.2 Implementing SCD-broadcast in CAMPn,t[t < n/2]

8.1.3 Cost and Proof of the Algorithm

8.1.4 An SCD-broadcast-based Communication Pattern

8.2 From SCD-broadcast to an MWMR Register

8.2.1 Building an MWMR Atomic Register in CAMPn,t[SCD-broadcast]

8.2.2 Cost and Proof of Correctness

8.2.3 From Atomicity to Sequential Consistency

8.2.4 From MWMR Registers to an Atomic Snapshot Object

8.3 From SCD-broadcast to an Atomic Counter

8.3.1 Counter Object

8.3.2 Implementation of an Atomic Counter Object

8.3.3 Implementation of a Sequentially Consistent Counter Object

8.4 From SCD-broadcast to Lattice Agreement

8.4.1 The Lattice Agreement Task

8.4.2 Lattice Agreement from SCD-broadcast

8.5 From SWMR Atomic Registers to SCD-broadcast

8.5.1 From Snapshot to SCD-broadcast

8.5.2 Proof of the Algorithm

8.6 Summary

8.7 Bibliographic Notes

8.8 Exercises and Problems

9 Atomic R/W Registers in Presence of Byzantine Processes

9.1 Atomic Read/Write Registers in the Presence of Byzantine Processes

9.1.1 Why SWMR (and Not MWMR) Atomic Registers?

9.1.2 Reminder on Possible Behaviors of a Byzantine Process

9.1.3 SWMR Atomic Registers Despite Byzantine Processes: Definition

9.2 An Impossibility Result

9.3 Reminder on Byzantine Reliable Broadcast

9.3.1 Specification of Multi-shot Reliable Broadcast

9.3.2 An Algorithm for Multi-shot Byzantine Reliable Broadcast

9.4 Construction of SWMR Atomic Registers in BAMPn,t[t < n/3]

9.4.1 Description of the Algorithm

9.4.2 Comparison with the Crash Failure Model

9.5 Proof of the Algorithm

9.5.1 Preliminary Lemmas

9.5.2 Proof of the Termination Properties

9.5.3 Proof of the Consistency (Atomicity) Properties

9.5.4 Piecing Together the Lemmas

9.6 Building Objects on Top of SWMR Byzantine Registers

9.6.1 One-shot Write-snapshot Object

9.6.2 Correct-only Agreement Object

9.7 Summary

9.8 Bibliographic Notes

9.9 Exercises and Problems

--- Agreement in Sync Systems

10 Consensus & Interactive Consistency in Sync Systems prone to Process Crash Failures

10.1 Consensus in the Crash Failure Model

10.1.1 Definition

10.1.2 A Simple (Unfair) Consensus Algorithm

10.1.3 A Simple (Fair) Consensus Algorithm

10.2 Interactive Consistency (Vector Consensus)

10.2.1 Definition

10.2.2 A Simple Example of Use: Build Atomic Rounds

10.2.3 An Interactive Consistency Algorithm

10.2.4 Proof of the Algorithm

10.2.5 A Convergence Point of View

10.3 Lower Bound on the Number of Rounds

10.3.1 Preliminary Assumptions and Definitions

10.3.2 The (t + 1) Lower Bound

10.3.3 Proof of the Lemmas

10.4 Summary

10.5 Bibliographic Notes

10.6 Exercises and Problems

11 Expediting Decision in Sync Systems prone to Process Crash Failures

11.1 Early Deciding and Stopping Interactive Consistency

11.1.1 Early Deciding vs Early Stopping

11.1.2 An Early Decision Predicate

11.1.3 An Early Deciding and Stopping Algorithm

11.1.4 Correctness Proof

11.1.5 On Early Decision Predicates

11.1.6 Early Deciding and Stopping Consensus

11.2 An Unbeatable Binary Consensus Algorithm

11.2.1 A Knowledge-Based Unbeatable Predicate

11.2.2 PREF0() with Respect to DIFF()

11.2.3 An Algorithm Based on the Predicate PREF0(): CGM

11.2.4 On the Unbeatability of the Predicate PREF0()

11.3 The Synchronous Condition-based Approach

11.3.1 The Condition-based Approach in Synchronous Systems

11.3.2 Legality and Maximality of a Condition

11.3.3 Hierarchy of Legal Conditions

11.3.4 Local View of an Input Vector

11.3.5 A Synchronous Condition-based Consensus Algorithm

11.3.6 Proof of the Algorithm

11.4 Using a Global Clock and a Fast Failure Detector

11.4.1 Fast Perfect Failure Detectors

11.4.2 Enriching the Synchronous Model to Benefit from a Fast Failure Detector

11.4.3 A Simple Consensus Algorithm Based on a Fast Failure Detector

11.4.4 An Early Deciding and Stopping Algorithm

11.5 Summary

11.6 Bibliographic Notes

11.7 Exercises and Problems

12 Consensus Variants - Simultaneous Consensus & k-Set Agreement

12.1 Simultaneous Consensus: Definition and Its Difficulty

12.1.1 Definition of Simultaneous Consensus

12.1.2 Difficulty Early Deciding Before (t + 1) Rounds

12.1.3 Failure Pattern, Failure Discovery, and Waste

12.1.4 A Clean Round and the Horizon of a Round

12.2 An Optimal Simultaneous Consensus Algorithm

12.2.1 An Optimal Algorithm

12.2.2 Proof of the Algorithm

12.3 The k-Set Agreement Abstraction

12.3.1 Definition

12.3.2 A Simple Algorithm

12.4 Early Deciding and Stopping k-Set Agreement

12.4.1 An Early Deciding and Stopping Algorithm

12.4.2 Proof of the Algorithm

12.5 Summary

12.6 Bibliographic Notes

12.7 Exercises and Problems

13 Non-Blocking Atomic Commitment in Presence of Process Crash Failures

13.1 The Non-blocking Atomic Commitment (NBAC) Abstraction

13.1.1 Definition of Non-blocking Atomic Commitment

13.1.2 A Simple Non-blocking Atomic Commitment Algorithm

13.2 Fast Commit and Fast Abort

13.2.1 Looking for Efficient Algorithms

13.2.2 An Impossibility Result

13.3 Weak Fast Commit and Weak Fast Abort

13.4 Fast Commit and Weak Fast Abort Are Compatible

13.4.1 A Fast Commit and Weak Fast Abort Algorithm

13.4.2 Proof of the Algorithm

13.5 Other Non-blocking Atomic Commitment Algorithms

13.5.1 Fast Abort andWeak Fast Commit

13.5.2 The Case t ≤ 2 (System Model CSMPn,t[1 ≤ t < 3 ≤ n])

13.6 Summary

13.7 Bibliographic Notes

13.8 Exercises and Problems

14 Consensus in Sync Systems prone to Byzantine Process Failures

14.1 Agreement Despite Byzantine Processes

14.1.1 On the Agreement and Validity Properties

14.1.2 A Consensus Definition for the Byzantine Failure Model

14.1.3 An Interactive Consistency Definition for the Byzantine Failure Model

14.1.4 The Byzantine General Agreement Abstraction

14.2 Interactive Consistency for Four Processes Despite One Byzantine Process

14.2.1 An Algorithm for n = 4 and t = 1

14.2.2 Proof of the Algorithm

14.3 An Upper Bound on the Number of Byzantine Processes

14.4 A Byzantine Consensus Algorithm for BSMPn,t[t < n/3]

14.4.1 Base Data Structure: a Tree

14.4.2 EIG Algorithm

14.4.3 Example of an Execution

14.4.4 Proof of the EIG Algorithm

14.5 A Simple Consensus Algorithm with Constant Message Size

14.5.1 Features of the Algorithm

14.5.2 Presentation of the Algorithm

14.5.3 Proof and Properties of the Algorithm

14.6 From Binary to Multivalued Byzantine Consensus

14.6.1 Motivation

14.6.2 A Reduction Algorithm

14.6.3 Proof of the Multivalued to Binary Reduction

14.6.4 An Interesting Property of the Construction

14.7 Enriching the Synchronous Model with Message Authentication

14.7.1 Synchronous Model with Signed Messages

14.7.2 The Gain Obtained from Signatures

14.7.3 A Synchronous Signature-Based Consensus Algorithm

14.7.4 Proof of the Algorithm

14.8 Summary

14.9 Bibliographic Notes

14.10 Exercises and Problems

--- Agreement in Async Systems

15 Implementable Agreement Abstractions despite Asynchrony & Minority of Process Crashes

15.1 The Renaming Agreement Abstraction

15.1.1 Definition

15.1.2 A Fundamental Result

15.1.3 The Stacking Approach

15.1.4 A Snapshot-based Implementation of Renaming

15.1.5 Proof of the Algorithm

15.2 The Approximate Agreement Abstraction

15.2.1 Definition

15.2.2 A Read/Write-based Implementation of Approximate Agreement

15.2.3 Proof of the Algorithm

15.3 The Safe Agreement Abstraction

15.3.1 Definition

15.3.2 A Direct Implementation of Safe Agreement in CAMPn,t[t < n/2]

15.3.3 Proof of the Algorithm

15.4 Summary

15.5 Bibliographic Notes

15.6 Exercises and Problems

16 Consensus - Power & Implementability Limit in Crash-prone Async Systems

16.1 The Total Order Broadcast Communication Abstraction

16.1.1 Total Order Broadcast: Definition

16.1.2 A Map of Communication Abstractions

16.2 From Consensus to TO-broadcast

16.2.1 Structure of the Construction

16.2.2 Description of the Algorithm

16.2.3 Proof of the Algorithm

16.3 Consensus and TO-broadcast Are Equivalent

16.4 The State Machine Approach

16.4.1 State Machine Replication

16.4.2 Sequentially-Defined Abstractions (Objects)

16.5 A Simple Consensus-based Universal Construction

16.6 Agreement vs Mutual Exclusion

16.7 Ledger Object

16.7.1 Definition

16.7.2 Implementation of a Ledger in CAMPn,t[TO-broadcast]

16.8 Consensus Impossibility in the Presence of Crashes and Asynchrony

16.8.1 The Intuition That Underlies the Impossibility

16.8.2 Refining the Definition of CAMPn,t[∅]

16.8.3 Notion of Valence of a Global State

16.8.4 Consensus Is Impossible in CAMPn,1[∅]

16.9 The Frontier Between Read/Write Registers and Consensus

16.9.1 The Main Question

16.9.2 The Notion of Consensus Number in Read/Write Systems

16.9.3 An Illustration of Herlihy’s Hierarchy

16.9.4 The Consensus Number of a Ledger

16.10 Summary

16.11 Bibliographic Notes

16.12 Exercises and Problems

17 Implementing Consensus in Enriched Crash-prone Async Systems

17.1 Enriching an Asynchronous System to Implement Consensus

17.2 A Message Scheduling Assumption

17.2.1 Message Scheduling (MS) Assumption

17.2.2 A Binary Consensus Algorithm

17.2.3 Proof of the Algorithm

17.2.4 Additional Properties

17.3 Enriching CAMPn,t[Ø] with a Perpetual Failure Detector

17.3.1 Enriching CAMPn,t[∅] with a Perfect Failure Detector

17.4 Enriching CAMPn,t[t < n/2] with an Eventual Leader

17.4.1 TheWeakest Failure Detector to Implement Consensus

17.4.2 Implementing Consensus in CAMPn,t[t < n/2, Ω]

17.4.3 Proof of the Algorithm

17.4.4 Consensus Versus Eventual Leader Failure Detector

17.4.5 Notions of Indulgence and Zero-degradation

17.4.6 Saving Broadcast Instances

17.5 Enriching CAMPn,t[t < n/2] with Randomization

17.5.1 Asynchronous Randomized Models

17.5.2 Randomized Consensus

17.5.3 Randomized Binary Consensus in CAMPn,t[t < n/2, LC]

17.5.4 Randomized Binary Consensus in CAMPn,t[t < n/2, CC]

17.6 Enriching CAMPn,t[t < n/2] with a Hybrid Approach

17.6.1 The Hybrid Approach: Failure Detector and Randomization

17.6.2 A Hybrid Binary Consensus Algorithm

17.7 A Paxos-inspired Consensus Algorithm

17.7.1 The Alpha Communication Abstraction

17.7.2 Consensus Algorithm

17.7.3 An Implementation of Alpha in CAMPn,t[t < n/2]

17.8 From Binary to Multivalued Consensus

17.8.1 A Reduction Algorithm

17.8.2 Proof of the Reduction Algorithm

17.9 Consensus in One Communication Step

17.9.1 Aim and Model Assumption on t

17.9.2 A One Communication Step Algorithm

17.9.3 Proof of the Early Deciding Algorithm

17.10 Summary

17.11 Bibliographic Notes

17.12 Exercises and Problems

18 Implementing Oracles in Async Systems prone to Process Crash Failures

18.1 The Two Facets of Failure Detectors

18.1.1 The Programming Point of View: Modular Building Block

18.1.2 The Computability Point of View: Abstraction Ranking

18.2 Ω in CAMPn,t[∅]: a Direct Impossibility Proof

18.3 Constructing a Perfect Failure Detector (Class P)

18.3.1 Reminder: Definition of the Class P of Perfect Failure Detectors

18.3.2 Use of an Underlying Synchronous System

18.3.3 Applications Generating a Fair Communication Pattern

18.3.4 The Theta Assumption

18.4 Constructing an Eventually Perfect Failure Detector (Class P)

18.4.1 Reminder: Definition of an Eventually Perfect Failure Detector

18.4.2 From Perpetual to Eventual Properties

18.4.3 Eventually Synchronous Systems

18.5 On the Efficient Monitoring of a Process by Another Process

18.5.1 Motivation and System Model

18.5.2 A Monitoring Algorithm

18.6 An Adaptive Monitoring-based Algorithm Building P

18.6.1 Motivation and Model

18.6.2 A Monitoring-Based Adaptive Algorithm for the Failure Detector Class P

18.6.3 Proof the Algorithm

18.7 From the t-Source Assumption to an Ω Eventual Leader

18.7.1 The t-Source Assumption and the Model CAMPn,t[t-SOURCE]

18.7.2 Electing an Eventual Leader in CAMPn,t[t-SOURCE]

18.7.3 Proof of the Algorithm

18.8 Electing an Eventual Leader in CAMPn,t[ t-MS PAT]

18.8.1 A Query/Response Pattern

18.8.2 Electing an Eventual Leader in CAMPn,t[t-MS PAT]

18.8.3 Proof of the Algorithm

18.9 Building Ω in a Hybrid Model

18.10 Construction of a Biased Common Coin from Local Coins

18.10.1 Definition of a Biased Common Coin

18.10.2 The CORE Communication Abstraction

18.10.3 Construction of a Common Coin with a Constant Bias

18.10.4 On the Use of a Biased Common Coin

18.11 Summary

18.12 Bibliographic notes

18.13 Exercises and Problems

19 Implementing Consensus in Enriched Byzantine Async Systems

19.1 Definition Reminder and Two Observations

19.1.1 Definition of Byzantine Consensus (Reminder)

19.1.2 Why Not to Use an Eventual Leader

19.1.3 On the Weakest Synchrony Assumption for Byzantine Consensus

19.2 Binary Byzantine Consensus from a Message Scheduling Assumption

19.2.1 A Message Scheduling Assumption

19.2.2 A Binary Byzantine Consensus Algorithm

19.2.3 Proof of the Algorithm

19.2.4 Additional Properties

19.3 An Optimal Randomized Binary Byzantine Consensus Algorithm

19.3.1 The Binary-Value Broadcast Abstraction

19.3.2 A Binary Randomized Consensus Algorithm

19.3.3 Proof of the BV-Based Binary Byzantine Consensus Algorithm

19.3.4 From Decision to Decision and Termination

19.4 From Binary to Multivalued Byzantine Consensus

19.4.1 A Reduction Algorithm

19.4.2 Proof of the Reduction Algorithm

19.5 From Binary to No-intrusion Multivalued Byzantine Consensus

19.5.1 The Validated Byzantine Broadcast Abstraction

19.5.2 An Algorithm Implementing VBB-broadcast

19.5.3 Proof of the VBB-broadcast Algorithm

19.5.4 A VBB-Based Multivalued to Binary Byzantine Consensus Reduction

19.5.5 Proof of the VBB-Based Reduction Algorithm

19.6 Summary

19.7 Appendix: Proof-of-Work (PoW) Seen as Eventual Byzantine Agreement

19.8 Bibliographic Notes

19.9 Exercises and Problems

20 Quorum, Signatures & Overlays

20.1 Quorum Systems

20.1.1 Definitions

20.1.2 Examples of Use of a Quorum System

20.1.3 A Few Classical Quorums

20.1.4 Quorum Composition

20.2 Digital Signatures

20.2.1 Cipher, Keys, and Signatures

20.2.2 How to Build a Secret Key: Diffie-Hellman’s Algorithm

20.2.3 How to Build a Public Key: Rivest-Shamir-Adleman’s (RSA) Algorithm

20.2.4 How to Share a Secret: Shamir’s Algorithm

20.3 Overlay Networks

20.3.1 On Regular Graphs

20.3.2 Hypercube

20.3.3 de Bruijn Graphs

20.3.4 Kautz Graphs

20.3.5 Undirected de Bruijn and Kautz Graphs

20.4 Bibliographic Notes

Afterword

Biblio

Index

Michel Raynal Fault-Tolerant Message-Passing Distributed Systems An Algorithmic Approach

Michel Raynal IRISA-ISTIC Université de Rennes 1 Institut Universitaire de France Rennes, France Parts of this work are based on the books “Fault-Tolerant Agreement in Synchronous Message- Passing Systems” and “Communication and Agreement Abstractions for Fault-Tolerant Asynchro- nous Distributed Systems”, author Michel Raynal, © 2010 Morgan & Claypool Publishers (www. morganclaypool.com). Used with permission. ISBN 978-3-319-94140-0 https://doi.org/10.1007/978-3-319-94141-7 Library of Congress Control Number: 2018953101 © Springer Nature Switzerland AG 2018 ISBN 978-3-319-94141-7 (eBook)

Preface La recherche du temps perdu passait par leWeb. [...] La m´emoire ´etait devenue in´epuisable, mais la profondeur du temps [...] avait disparu. On ´etait dans un pr´esent inﬁni. In Les ann´ees (2008), Annie Ernaux (1940) Sed nos immensum spatiis confecimus aequor, Et iam tempus equum fumentia solvere colla.1 In Georgica, Liber II, 541-542, Publius Virgilius (70 BC–19 BC) Je suis arriv´e au jour o`u je ne me souviens plus quand j’ai cess´e d’ˆetre immortel. In Livro de Cr´onicas, Ant´onio Lobo Antunes (1942) C’est une chose ´etrange `a la ﬁn que le monde Un jour je m’en irai sans en avoir tout dit. In Les yeux et la m´emoire (1954), chant II, Louis Aragon (1897–1982) Tout garder, c’est tout d´etruire. Jacques Derrida (1930–2004) 1French: Mais j’ai d´ej`a fourni une vaste carri`ere, il est temps de d´eteler les chevaux tout fumants. English: But now I have traveled a very long way, and the time has come to unyoke my steaming horses.

What is distributed computing? Distributed computing was born in the late 1970s when researchers and practitioners started taking into account the intrinsic characteristic of physically distributed sys- tems. The ﬁeld then emerged as a specialized research area distinct from networking, operating sys- tems, and parallel computing. Distributed computing arises when one has to solve a problem in terms of distributed entities (usually called processors, nodes, processes, actors, agents, sensors, peers, etc.) such that each entity has only a partial knowledge of the many parameters involved in the problem that has to be solved. While parallel computing and real-time computing can be characterized, respectively, by the terms efﬁciency and on-time computing, distributed computing can be characterized by the term uncertainty. This uncertainty is created by asynchrony, multiplicity of control ﬂows, absence of shared memory and global time, failure, dynamicity, mobility, etc. Mastering one form or another of uncertainty is pervasive in all distributed computing problems. A main difﬁculty in designing distributed algorithms comes from the fact that no entity cooperating in the achievement of a common goal can have an instantaneous knowledge of the current state of the other entities, it can only know their past local states. Although distributed algorithms are often made up of a few lines, their behavior can be difﬁcult to understand and their properties hard to state and prove. Hence, distributed computing is not only a fundamental topic but also a challenging topic where simplicity, elegance, and beauty are ﬁrst-class citizens. Why this book? In the book “Distributed algorithms for message-passing systems” (Springer, 2013), I addressed distributed computing in failure-free message-passing systems, where the computing enti- ties (processes) have to cooperate in the presence of asynchrony. Differently, in my book “Concurrent programming: algorithms, principles and foundations” (Springer, 2013), I addressed distributed com- puting where the computing entities (processes) communicate through a read/write shared memory (e.g., multicore), and the main adversary lies in the net effect of asynchrony and process crashes (unexpected deﬁnitive stops). The present book considers synchronous and asynchronous message-passing systems, where pro- cesses can commit crash failures, or Byzantine failures (arbitrary behavior). Its aim is to present in a comprehensive way basic notions, concepts and algorithms in the context of these systems. The main difﬁculty comes from the uncertainty created by the adversaries managing the environment (mainly asynchrony and failures), which, by its very nature, is not under the control of the system. A quick look at the content of the book The book is composed of four parts, the ﬁrst two are on communication abstractions, the other two on agreement abstractions. Those are the most important abstractions distributed applications rely on in asynchronous and synchronous message-passing sys- tems where processes may crash, or commit Byzantine failures. The book addresses what can be done and what cannot be done in the presence of such adversaries. It consequently presents both impossi- bility results and distributed algorithms. All impossibility results are proved, and all algorithms are described in a simple algorithmic notation and proved correct. • Parts on communication abstractions. – Part I is on the reliable broadcast abstraction.

– Part II is on the construction of read/write registers. • Parts on agreement. – Part III is on agreement in synchronous systems. – Part IV is on agreement in asynchronous systems. On the presentation style When known, the names of the authors of a theorem, or of an algorithm, are indicated together with the date of the associated publication. Moreover, each chapter has a bib- liographical section, where a short historical perspective and references related to that chapter are given. Each chapter terminates with a few exercises and problems, whose solutions can be found in the article cited at the end of the corresponding exercise/problem. From a vocabulary point of view, the following terms are used: an object implements an abstrac- tion, deﬁned by a set of properties, which allows a problem to be solved. Moreover, each algorithm is ﬁrst presented intuitively with words, and then proved correct. Understanding an algorithm is a two-step process: • First have a good intuition of its underlying principles, and its possible behaviors. This is nec- • Then prove the algorithm is correct in the model it was designed for. The proof consists in a essary, but remains informal. logical reasoning, based on the properties provided by (i) the underlying model, and (ii) the statements (code) of the algorithm. More precisely, each property deﬁning the abstraction the algorithm is assumed to implement must be satisﬁed in all its executions. Only when these two steps have been done, can we say that we understand the algorithm. Audience This book has been written primarily for people who are not familiar with the topic and the concepts that are presented. These include mainly: • Senior-level undergraduate students and graduate students in informatics or computing engineer- ing, who are interested in the principles and algorithmic foundations of fault-tolerant distributed computing. • Practitioners and engineers who want to be aware of the state-of-the-art concepts, basic princi- ples, mechanisms, and techniques encountered in fault-tolerant distributed computing. Prerequisites for this book include undergraduate courses on algorithms, basic knowledge on operat- ing systems, and notions on concurrency in failure-free distributed computing. One-semester courses, based on this book, are suggested in the section titled “How to Use This Book” in the Afterword. Origin of the book and acknowledgments This book has two complementary origins: • The ﬁrst is a set of lectures for undergraduate and graduate courses on distributed computing I gave at the University of Rennes (France), the Hong Kong Polytechnic University, and, as an invited professor, at several universities all over the world. Hence, I want to thank the numerous students for their questions that, in one way or another, contributed to this book. • The second is the two monographs I wrote in 2010, on fault-tolerant distributed computing, titled “Communication and agreement abstractions for fault-tolerant asynchronous distributed

systems”, and “Fault-tolerant agreement in synchronous distributed systems”. Parts of them appear in this book, after having been revised, corrected, and improved. Hence, I want to thank Morgan & Claypool, and more particularly Diane Cerra, for their per- mission to reuse parts of this work. I also want to thank my colleagues (in no particular order) A. Most´efaoui, D. Imbs, S. Rajsbaum, V. Gramoli, C. Delporte, H. Fauconnier, F. Ta¨ıani, M. Perrin, A. Casta˜neda, M. Larrea, and Z. Bouzid, with whom I collaborated in the recent past years. I also thank the Polytechnic University of Hong Kong (PolyU), and more particularly Professor Jiannong Cao, for hosting me while I was writing parts of this book. My thanks also to Ronan Nugent (Springer) for his support and his help in putting it all together. Last but not least (and maybe most importantly), I thank all the researchers whose results are pre- sented in this book. Without their work, this book would not exist. (Finally, since I typeset the entire text myself – LATEX2 for the text and xﬁg for ﬁgures – any typesetting or technical errors that remain are my responsibility.) Professor Michel Raynal Academia Europaea Institut Universitaire de France Professor IRISA-ISTIC, Universit´e de Rennes 1, France Chair Professor, Hong Kong Polytechnic University June–December 2017 Rennes, Saint-Gr´egoire, Douelle, Saint-Philibert, Hong Kong, Vienna (DISC’17), Washington D.C. (PODC’17), Mexico City (UNAM)

Contents I Introductory Chapter 1 . . . . . . . . . . . . . 1 A Few Deﬁnitions and Two Introductory Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 7 7 9 9 . . . . . . . . . . . . . . . . . . . . . 10 . . . . . . . . . . . . . . . . . . . . . 11 1.1 A Few Deﬁnitions Related to Distributed Computing . . . 1.2 Example 1: Common Decision Despite Message Losses . 1.2.1 The Problem . 1.2.2 Trying to Solve the Problem: Attempt 1 . . . . . . . . . . 1.2.3 Trying to Solve the Problem: Attempt 2 . . . . . . . . . . . . . . . . . 1.2.4 An Impossibility Result . 1.2.5 A Coordination Problem . Example 2: Computing a Global Function Despite a Message Adversary . . . . . . . . . . 1.3.1 The Problem . . 1.3.2 The Notion of a Message Adversary . 1.3.3 The TREE-AD Message Adversary . 1.3.4 . . . . . . . . . . . . . . . . . . . . . . From Message Adversary to Process Mobility . 1.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 . . . . . . . . . . . . . . . 11 . . . . . . . . . . . . . 12 . . . . . . . . . . . . . 13 . . . . . . . . . . . . . . . . . 15 . . . . . . . . . . . . . . . . 16 . . . . . . . . . . . . . . . . . . . 17 . . . . . . . . . . . . . . . . . . . 18 . . . . . . . . . . . . . . . . . . . 18 . . . . . . . . . . . . . . . . . . 19 1.4 Main Distributed Computing Models Used in This Book . 1.5 Distributed Computing Versus Parallel Computing . . 1.6 . 1.7 Bibliographic Notes . . 1.8 . . Exercises and Problems . . . . . . Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . II The Reliable Broadcast Communication Abstraction 21 2 Reliable Broadcast in the Presence of Process Crash Failures 2.1 Uniform Reliable Broadcast . . . . . . . . . . . . . . . . . . . . . 23 . . . . . . . . . . . 23 . . . . . . . . . . . . . . . . 23 . . . . . . . . . . . . . . . . 24 2.1.1 From Best Effort to Guaranteed Reliability . . . 2.1.2 Uniform Reliable Broadcast (URB-broadcast) . . 2.1.3 Building the URB-broadcast Abstraction in CAMP n,t[∅] . . . . . . . . . . . . 25 . . . . . . . . . . . . . . . . 2.2 Adding Quality of Service . 2.2.1 2.2.2 2.2.3 2.2.4 2.2.5 The Total Order Broadcast Abstraction Requires More . . . . . . . . . . Summary . . . . . . . . . . . . . . . . . . . . 27 “First In, First Out” (FIFO) Message Delivery . . . . . . . . . . . . . . . . . . 27 “Causal Order” (CO) Message Delivery . . . . . . . . . . . . . . . . . . . . . 29 From FIFO-broadcast to CO-broadcast . . . . . . . . . . . . . . . . . . . . . 31 From URB-broadcast to CO-broadcast: Capturing Causal Past in a Vector . . . 34 . . . 38 . . . . . . . . . 39 . . . . . . . . . . . . . . . . 39 . . . . . . . . . . . . . . . . . . . 39 . . Exercises and Problems . . . . . . . . . . . . . . . . . . . . 2.3 2.4 Bibliographic Notes . 2.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 URB-broadcast in CAMP n,t[- FC] System Model . 3.4 URB-broadcast in CAMP n,t[- FC] Enriched with a Failure Detector Formal Deﬁnitions . 3 Reliable Broadcast in the Presence of Process Crashes and Unreliable Channels . . 3.1 A System Model with Unreliable Channels . . . Fairness Notions for Channels . . Fair Channel (FC) and Fair Lossy Channel 3.1.1 3.1.2 3.1.3 Reliable Channel in the Presence of Process Crashes . . . 3.1.4 . . 3.2.1 URB-broadcast in CAMP n,t[- FC, t < n/2] . . 3.2.2 An Impossibility Result . Failure Detectors: an Approach to Circumvent Impossibilities . 3.3.1 The Concept of a Failure Detector . 3.3.2 . 41 . 41 . . . 41 . . . . . . . . . . . . . . . . . . . 42 . 43 . . . . 44 . . . 44 . . 45 . . . . . 46 . 47 . . . . . . . . . . . . . . . . . . . . . . . . 47 . . . 48 . . . . . . . . . 49 3.4.1 Deﬁnition of the Failure Detector Class Θ . . . . . . . . . . . . . . . . . . . . 49 . 50 . 3.4.2 3.4.3 Building a Failure Detector Θ in CAMP n,t[- FC, t < n/2] . 50 3.4.4 The Fundamental Added Value Supplied by a Failure Detector . . . . . . . . . 51 . 51 3.5.1 The Quiescence Property . . 51 3.5.2 Quiescent URB-broadcast Based on a Perfect Failure Detector . . . . . . . . . 52 3.5.3 The Class HB of Heartbeat Failure Detectors . . 54 . 3.5.4 Quiescent URB-broadcast in CAMP n,t[- FC, Θ, HB ] . 56 . 58 . . Summary . . 58 . . . . . 59 . . . Exercises and Problems . Solving URB-broadcast in CAMP n,t[- FC, Θ] 3.5 Quiescent Uniform Reliable Broadcast . 3.6 3.7 Bibliographic Notes . 3.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Reliable Broadcast in the Presence of Byzantine Processes . . . . . . . . . . . . 4.1 Byzantine Processes and Properties of the Model BAMP n,t[t < n/3] . . 4.2 . . . . 61 . 61 . . The No-Duplicity Broadcast Abstraction . . 62 . . . . 4.2.1 Deﬁnition . . . . 62 . . 4.2.2 An Impossibility Result . . . . . 63 . 4.2.3 A No-Duplicity Broadcast Algorithm . . . . . . . . . . . . . . . . . . . . . . 63 4.3 The Byzantine Reliable Broadcast Abstraction . . . . . . . . . . . . . . . . . . . . . . 65 4.4 An Optimal Byzantine Reliable Broadcast Algorithm . . . . . . . . . . . . . . . . . . 66 . . . . . . 66 . 67 . . . . . . . . . . . . . . . . . . . . . . 68 . . 69 . . . . . . . . 70 . 70 . . . . . 72 . . . 73 . . . Exercises and Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 4.4.1 A Byzantine Reliable Broadcast Algorithm for BAMP n,t[t < n/3] . 4.4.2 Correctness Proof . . 4.4.3 Beneﬁting from Message Asynchrony . Time and Message-Efﬁcient Byzantine Reliable Broadcast . . 4.5.1 A Message-Efﬁcient Byzantine Reliable Broadcast Algorithm . . 4.5.2 Correctness Proof . Summary . . . . . 4.6 . 4.7 Bibliographic Notes . 4.8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . III The Read/Write Register Communication Abstraction 75 5 The Read/Write Register Abstraction 77 The Read/Write Register Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 5.1.1 Concurrent Objects and Registers . . . . . . . . . . . . . . . . . . . . . 77 5.1 . . .

分享到：

赞收藏

资料库

Fault-Tolerant Message-Passing Distributed Systems. An Algorithm....pdf

相关推荐

开发技术

热门标签

最新资料