Preface
Contents
Notation
Figures & Algorithms
Tables
1 Definitions & Introductory Examples
1.1 A Few Definitions Related to Distributed Computing
1.2 Example 1: Common Decision Despite Message Losses
1.2.1 The Problem
1.2.2 Trying to Solve the Problem: Attempt 1
1.2.3 Trying to Solve the Problem: Attempt 2
1.2.4 An Impossibility Result
1.2.5 A Coordination Problem
1.3 Example 2: Computing a Global Function Despite a Message Adversary
1.3.1 The Problem
1.3.2 The Notion of a Message Adversary
1.3.3 The TREE-AD Message Adversary
1.3.4 From Message Adversary to Process Mobility
1.4 Main Distributed Computing Models Used in This Book
1.5 Distributed Computing Versus Parallel Computing
1.6 Summary
1.7 Bibliographic Notes
1.8 Exercises and Problems
--- Reliable Broadcast Communication Abstraction
2 In Presence of Process Crash Failures
2.1 Uniform Reliable Broadcast
2.1.1 From Best Effort to Guaranteed Reliability
2.1.2 Uniform Reliable Broadcast (URB-broadcast)
2.1.3 Building the URB-broadcast Abstraction in CAMPn,t[∅]
2.2 Adding Quality of Service
2.2.1 “First In, First Out” (FIFO) Message Delivery
2.2.2 “Causal Order” (CO) Message Delivery
2.2.3 From FIFO-broadcast to CO-broadcast
2.2.4 From URB-broadcast to CO-broadcast: Capturing Causal Past in a Vector
2.2.5 The Total Order Broadcast Abstraction Requires More
2.3 Summary
2.4 Bibliographic Notes
2.5 Exercises and Problems
3 In Presence of Process Crashes & Unreliable Channels
3.1 A System Model with Unreliable Channels
3.1.1 Fairness Notions for Channels
3.1.2 Fair Channel (FC) and Fair Lossy Channel
3.1.3 Reliable Channel in the Presence of Process Crashes
3.1.4 System Model
3.2 URB-broadcast in CAMPn,t[- FC]
3.2.1 URB-broadcast in CAMPn,t[- FC, t < n/2]
3.2.2 An Impossibility Result
3.3 Failure Detectors: an Approach to Circumvent Impossibilities
3.3.1 The Concept of a Failure Detector
3.3.2 Formal Definitions
3.4 URB-broadcast in CAMPn,t[- FC] Enriched with a Failure Detector
3.4.1 Definition of the Failure Detector Class Θ
3.4.2 Solving URB-broadcast in CAMPn,t[- FC, Θ]
3.4.3 Building a Failure Detector Θ in CAMPn,t[- FC, t < n/2]
3.4.4 The Fundamental Added Value Supplied by a Failure Detector
3.5 Quiescent Uniform Reliable Broadcast
3.5.1 The Quiescence Property
3.5.2 Quiescent URB-broadcast Based on a Perfect Failure Detector
3.5.3 The Class HB of Heartbeat Failure Detectors
3.5.4 Quiescent URB-broadcast in CAMPn,t[- FC, Θ,HB]
3.6 Summary
3.7 Bibliographic Notes
3.8 Exercises and Problems
4 In Presence of Byzantine Processes
4.1 Byzantine Processes and Properties of the Model BAMPn,t[t < n/3]
4.2 The No-Duplicity Broadcast Abstraction
4.2.1 Definition
4.2.2 An Impossibility Result
4.2.3 A No-Duplicity Broadcast Algorithm
4.3 The Byzantine Reliable Broadcast Abstraction
4.4 An Optimal Byzantine Reliable Broadcast Algorithm
4.4.1 A Byzantine Reliable Broadcast Algorithm for BAMPn,t[t < n/3]
4.4.2 Correctness Proof
4.4.3 Benefiting from Message Asynchrony
4.5 Time and Message-Efficient Byzantine Reliable Broadcast
4.5.1 A Message-Efficient Byzantine Reliable Broadcast Algorithm
4.5.2 Correctness Proof
4.6 Summary
4.7 Bibliographic Notes
4.8 Exercises and Problems
--- R/W Register Communication Abstraction
5 R/W Register Abstraction
5.1 The Read/Write Register Abstraction
5.1.1 Concurrent Objects and Registers
5.1.2 The Notion of a Regular Register
5.1.3 Registers Defined from a Sequential Specification
5.2 A Formal Approach to Atomicity and Sequential Consistency
5.2.1 Processes, Operations, and Events
5.2.2 Histories
5.2.3 A Formal Definition of Atomicity
5.2.4 A Formal Definition of Sequential Consistency
5.3 Composability of Consistency Conditions
5.3.1 What Is Composability?
5.3.2 Atomicity Is Composable
5.3.3 Sequential Consistency Is Not Composable
5.4 Bounds on the Implementation of Strong Consistency Conditions
5.4.1 Upper Bound on t for Atomicity
5.4.2 Upper Bound on t for Sequential Consistency
5.4.3 Lower Bounds on the Durations of Read andWrite Operations
5.5 Summary
5.6 Bibliographic Notes
5.7 Exercises and Problems
6 R/W Registers despite Asynchrony & t < n/2
6.1 A Structural View
6.2 Building an SWMR Regular Read/Write Register in CAMPn,t[t < n/2]
6.2.1 Problem Specification
6.2.2 Implementing an SWMR Regular Register in CAMPn,t[t < n/2]
6.2.3 Proof of the SWMR Regular Register Construction
6.3 From an SWMR Regular Register to an SWMR Atomic Register
6.3.1 Why the Previous Algorithm Does Not Ensure Atomicity
6.3.2 From Regularity to Atomicity
6.4 From SWMR Atomic Register to MWMR Atomic Register
6.4.1 Replacing Sequence Numbers by Timestamps
6.4.2 Construction of an MWMR Atomic Register
6.4.3 Proof of the MWMR Atomic Register Construction
6.5 Implementing Sequentially Consistent Registers
6.5.1 How to Address the Non-composability of Sequential Consistency
6.5.2 Algorithms Based on a Total Order Broadcast Abstraction
6.5.3 A TO-broadcast-based Algorithm with Local (Fast) Read Operations
6.5.4 A TO-broadcast-based Algorithm with Local (Fast) Write Operations
6.5.5 An Algorithm Based on Logical Time
6.5.6 Proof of the Logical Time-based Algorithm
6.6 Summary
6.7 Bibliographic Notes
6.8 Exercises and Problems
7 Circumventing t < n/2 R/W Register Impossibility - Failure Detector Approach
7.1 The Class Σ of Quorum Failure Detectors
7.1.1 Definition of the Class of Quorum Failure Detectors
7.1.2 Implementing a Failure Detector Σ When t < n/2
7.1.3 A Σ-based Construction of an SWSR Atomic Register
7.2 Σ Is theWeakest Failure Detector to Build an Atomic Register
7.2.1 What Does “Weakest Failure Detector Class” Mean
7.2.2 The Extraction Algorithm
7.2.3 Correctness of the Extraction Algorithm
7.3 Comparing the Failure Detectors Classes Θ and Σ
7.4 Atomic Register Abstraction vs URB-broadcast Abstraction
7.4.1 From Atomic Registers to URB-broadcast
7.4.2 Atomic Registers Are Strictly Stronger than URB-broadcast
7.5 Summary
7.6 Bibliographic Notes
7.7 Exercise and Problem
8 Broadcast Abstraction suited to Family of R/W Implementable Objects
8.1 The SCD-broadcast Communication Abstraction
8.1.1 Definition
8.1.2 Implementing SCD-broadcast in CAMPn,t[t < n/2]
8.1.3 Cost and Proof of the Algorithm
8.1.4 An SCD-broadcast-based Communication Pattern
8.2 From SCD-broadcast to an MWMR Register
8.2.1 Building an MWMR Atomic Register in CAMPn,t[SCD-broadcast]
8.2.2 Cost and Proof of Correctness
8.2.3 From Atomicity to Sequential Consistency
8.2.4 From MWMR Registers to an Atomic Snapshot Object
8.3 From SCD-broadcast to an Atomic Counter
8.3.1 Counter Object
8.3.2 Implementation of an Atomic Counter Object
8.3.3 Implementation of a Sequentially Consistent Counter Object
8.4 From SCD-broadcast to Lattice Agreement
8.4.1 The Lattice Agreement Task
8.4.2 Lattice Agreement from SCD-broadcast
8.5 From SWMR Atomic Registers to SCD-broadcast
8.5.1 From Snapshot to SCD-broadcast
8.5.2 Proof of the Algorithm
8.6 Summary
8.7 Bibliographic Notes
8.8 Exercises and Problems
9 Atomic R/W Registers in Presence of Byzantine Processes
9.1 Atomic Read/Write Registers in the Presence of Byzantine Processes
9.1.1 Why SWMR (and Not MWMR) Atomic Registers?
9.1.2 Reminder on Possible Behaviors of a Byzantine Process
9.1.3 SWMR Atomic Registers Despite Byzantine Processes: Definition
9.2 An Impossibility Result
9.3 Reminder on Byzantine Reliable Broadcast
9.3.1 Specification of Multi-shot Reliable Broadcast
9.3.2 An Algorithm for Multi-shot Byzantine Reliable Broadcast
9.4 Construction of SWMR Atomic Registers in BAMPn,t[t < n/3]
9.4.1 Description of the Algorithm
9.4.2 Comparison with the Crash Failure Model
9.5 Proof of the Algorithm
9.5.1 Preliminary Lemmas
9.5.2 Proof of the Termination Properties
9.5.3 Proof of the Consistency (Atomicity) Properties
9.5.4 Piecing Together the Lemmas
9.6 Building Objects on Top of SWMR Byzantine Registers
9.6.1 One-shot Write-snapshot Object
9.6.2 Correct-only Agreement Object
9.7 Summary
9.8 Bibliographic Notes
9.9 Exercises and Problems
--- Agreement in Sync Systems
10 Consensus & Interactive Consistency in Sync Systems prone to Process Crash Failures
10.1 Consensus in the Crash Failure Model
10.1.1 Definition
10.1.2 A Simple (Unfair) Consensus Algorithm
10.1.3 A Simple (Fair) Consensus Algorithm
10.2 Interactive Consistency (Vector Consensus)
10.2.1 Definition
10.2.2 A Simple Example of Use: Build Atomic Rounds
10.2.3 An Interactive Consistency Algorithm
10.2.4 Proof of the Algorithm
10.2.5 A Convergence Point of View
10.3 Lower Bound on the Number of Rounds
10.3.1 Preliminary Assumptions and Definitions
10.3.2 The (t + 1) Lower Bound
10.3.3 Proof of the Lemmas
10.4 Summary
10.5 Bibliographic Notes
10.6 Exercises and Problems
11 Expediting Decision in Sync Systems prone to Process Crash Failures
11.1 Early Deciding and Stopping Interactive Consistency
11.1.1 Early Deciding vs Early Stopping
11.1.2 An Early Decision Predicate
11.1.3 An Early Deciding and Stopping Algorithm
11.1.4 Correctness Proof
11.1.5 On Early Decision Predicates
11.1.6 Early Deciding and Stopping Consensus
11.2 An Unbeatable Binary Consensus Algorithm
11.2.1 A Knowledge-Based Unbeatable Predicate
11.2.2 PREF0() with Respect to DIFF()
11.2.3 An Algorithm Based on the Predicate PREF0(): CGM
11.2.4 On the Unbeatability of the Predicate PREF0()
11.3 The Synchronous Condition-based Approach
11.3.1 The Condition-based Approach in Synchronous Systems
11.3.2 Legality and Maximality of a Condition
11.3.3 Hierarchy of Legal Conditions
11.3.4 Local View of an Input Vector
11.3.5 A Synchronous Condition-based Consensus Algorithm
11.3.6 Proof of the Algorithm
11.4 Using a Global Clock and a Fast Failure Detector
11.4.1 Fast Perfect Failure Detectors
11.4.2 Enriching the Synchronous Model to Benefit from a Fast Failure Detector
11.4.3 A Simple Consensus Algorithm Based on a Fast Failure Detector
11.4.4 An Early Deciding and Stopping Algorithm
11.5 Summary
11.6 Bibliographic Notes
11.7 Exercises and Problems
12 Consensus Variants - Simultaneous Consensus & k-Set Agreement
12.1 Simultaneous Consensus: Definition and Its Difficulty
12.1.1 Definition of Simultaneous Consensus
12.1.2 Difficulty Early Deciding Before (t + 1) Rounds
12.1.3 Failure Pattern, Failure Discovery, and Waste
12.1.4 A Clean Round and the Horizon of a Round
12.2 An Optimal Simultaneous Consensus Algorithm
12.2.1 An Optimal Algorithm
12.2.2 Proof of the Algorithm
12.3 The k-Set Agreement Abstraction
12.3.1 Definition
12.3.2 A Simple Algorithm
12.4 Early Deciding and Stopping k-Set Agreement
12.4.1 An Early Deciding and Stopping Algorithm
12.4.2 Proof of the Algorithm
12.5 Summary
12.6 Bibliographic Notes
12.7 Exercises and Problems
13 Non-Blocking Atomic Commitment in Presence of Process Crash Failures
13.1 The Non-blocking Atomic Commitment (NBAC) Abstraction
13.1.1 Definition of Non-blocking Atomic Commitment
13.1.2 A Simple Non-blocking Atomic Commitment Algorithm
13.2 Fast Commit and Fast Abort
13.2.1 Looking for Efficient Algorithms
13.2.2 An Impossibility Result
13.3 Weak Fast Commit and Weak Fast Abort
13.4 Fast Commit and Weak Fast Abort Are Compatible
13.4.1 A Fast Commit and Weak Fast Abort Algorithm
13.4.2 Proof of the Algorithm
13.5 Other Non-blocking Atomic Commitment Algorithms
13.5.1 Fast Abort andWeak Fast Commit
13.5.2 The Case t ≤ 2 (System Model CSMPn,t[1 ≤ t < 3 ≤ n])
13.6 Summary
13.7 Bibliographic Notes
13.8 Exercises and Problems
14 Consensus in Sync Systems prone to Byzantine Process Failures
14.1 Agreement Despite Byzantine Processes
14.1.1 On the Agreement and Validity Properties
14.1.2 A Consensus Definition for the Byzantine Failure Model
14.1.3 An Interactive Consistency Definition for the Byzantine Failure Model
14.1.4 The Byzantine General Agreement Abstraction
14.2 Interactive Consistency for Four Processes Despite One Byzantine Process
14.2.1 An Algorithm for n = 4 and t = 1
14.2.2 Proof of the Algorithm
14.3 An Upper Bound on the Number of Byzantine Processes
14.4 A Byzantine Consensus Algorithm for BSMPn,t[t < n/3]
14.4.1 Base Data Structure: a Tree
14.4.2 EIG Algorithm
14.4.3 Example of an Execution
14.4.4 Proof of the EIG Algorithm
14.5 A Simple Consensus Algorithm with Constant Message Size
14.5.1 Features of the Algorithm
14.5.2 Presentation of the Algorithm
14.5.3 Proof and Properties of the Algorithm
14.6 From Binary to Multivalued Byzantine Consensus
14.6.1 Motivation
14.6.2 A Reduction Algorithm
14.6.3 Proof of the Multivalued to Binary Reduction
14.6.4 An Interesting Property of the Construction
14.7 Enriching the Synchronous Model with Message Authentication
14.7.1 Synchronous Model with Signed Messages
14.7.2 The Gain Obtained from Signatures
14.7.3 A Synchronous Signature-Based Consensus Algorithm
14.7.4 Proof of the Algorithm
14.8 Summary
14.9 Bibliographic Notes
14.10 Exercises and Problems
--- Agreement in Async Systems
15 Implementable Agreement Abstractions despite Asynchrony & Minority of Process Crashes
15.1 The Renaming Agreement Abstraction
15.1.1 Definition
15.1.2 A Fundamental Result
15.1.3 The Stacking Approach
15.1.4 A Snapshot-based Implementation of Renaming
15.1.5 Proof of the Algorithm
15.2 The Approximate Agreement Abstraction
15.2.1 Definition
15.2.2 A Read/Write-based Implementation of Approximate Agreement
15.2.3 Proof of the Algorithm
15.3 The Safe Agreement Abstraction
15.3.1 Definition
15.3.2 A Direct Implementation of Safe Agreement in CAMPn,t[t < n/2]
15.3.3 Proof of the Algorithm
15.4 Summary
15.5 Bibliographic Notes
15.6 Exercises and Problems
16 Consensus - Power & Implementability Limit in Crash-prone Async Systems
16.1 The Total Order Broadcast Communication Abstraction
16.1.1 Total Order Broadcast: Definition
16.1.2 A Map of Communication Abstractions
16.2 From Consensus to TO-broadcast
16.2.1 Structure of the Construction
16.2.2 Description of the Algorithm
16.2.3 Proof of the Algorithm
16.3 Consensus and TO-broadcast Are Equivalent
16.4 The State Machine Approach
16.4.1 State Machine Replication
16.4.2 Sequentially-Defined Abstractions (Objects)
16.5 A Simple Consensus-based Universal Construction
16.6 Agreement vs Mutual Exclusion
16.7 Ledger Object
16.7.1 Definition
16.7.2 Implementation of a Ledger in CAMPn,t[TO-broadcast]
16.8 Consensus Impossibility in the Presence of Crashes and Asynchrony
16.8.1 The Intuition That Underlies the Impossibility
16.8.2 Refining the Definition of CAMPn,t[∅]
16.8.3 Notion of Valence of a Global State
16.8.4 Consensus Is Impossible in CAMPn,1[∅]
16.9 The Frontier Between Read/Write Registers and Consensus
16.9.1 The Main Question
16.9.2 The Notion of Consensus Number in Read/Write Systems
16.9.3 An Illustration of Herlihy’s Hierarchy
16.9.4 The Consensus Number of a Ledger
16.10 Summary
16.11 Bibliographic Notes
16.12 Exercises and Problems
17 Implementing Consensus in Enriched Crash-prone Async Systems
17.1 Enriching an Asynchronous System to Implement Consensus
17.2 A Message Scheduling Assumption
17.2.1 Message Scheduling (MS) Assumption
17.2.2 A Binary Consensus Algorithm
17.2.3 Proof of the Algorithm
17.2.4 Additional Properties
17.3 Enriching CAMPn,t[Ø] with a Perpetual Failure Detector
17.3.1 Enriching CAMPn,t[∅] with a Perfect Failure Detector
17.4 Enriching CAMPn,t[t < n/2] with an Eventual Leader
17.4.1 TheWeakest Failure Detector to Implement Consensus
17.4.2 Implementing Consensus in CAMPn,t[t < n/2, Ω]
17.4.3 Proof of the Algorithm
17.4.4 Consensus Versus Eventual Leader Failure Detector
17.4.5 Notions of Indulgence and Zero-degradation
17.4.6 Saving Broadcast Instances
17.5 Enriching CAMPn,t[t < n/2] with Randomization
17.5.1 Asynchronous Randomized Models
17.5.2 Randomized Consensus
17.5.3 Randomized Binary Consensus in CAMPn,t[t < n/2, LC]
17.5.4 Randomized Binary Consensus in CAMPn,t[t < n/2, CC]
17.6 Enriching CAMPn,t[t < n/2] with a Hybrid Approach
17.6.1 The Hybrid Approach: Failure Detector and Randomization
17.6.2 A Hybrid Binary Consensus Algorithm
17.7 A Paxos-inspired Consensus Algorithm
17.7.1 The Alpha Communication Abstraction
17.7.2 Consensus Algorithm
17.7.3 An Implementation of Alpha in CAMPn,t[t < n/2]
17.8 From Binary to Multivalued Consensus
17.8.1 A Reduction Algorithm
17.8.2 Proof of the Reduction Algorithm
17.9 Consensus in One Communication Step
17.9.1 Aim and Model Assumption on t
17.9.2 A One Communication Step Algorithm
17.9.3 Proof of the Early Deciding Algorithm
17.10 Summary
17.11 Bibliographic Notes
17.12 Exercises and Problems
18 Implementing Oracles in Async Systems prone to Process Crash Failures
18.1 The Two Facets of Failure Detectors
18.1.1 The Programming Point of View: Modular Building Block
18.1.2 The Computability Point of View: Abstraction Ranking
18.2 Ω in CAMPn,t[∅]: a Direct Impossibility Proof
18.3 Constructing a Perfect Failure Detector (Class P)
18.3.1 Reminder: Definition of the Class P of Perfect Failure Detectors
18.3.2 Use of an Underlying Synchronous System
18.3.3 Applications Generating a Fair Communication Pattern
18.3.4 The Theta Assumption
18.4 Constructing an Eventually Perfect Failure Detector (Class P)
18.4.1 Reminder: Definition of an Eventually Perfect Failure Detector
18.4.2 From Perpetual to Eventual Properties
18.4.3 Eventually Synchronous Systems
18.5 On the Efficient Monitoring of a Process by Another Process
18.5.1 Motivation and System Model
18.5.2 A Monitoring Algorithm
18.6 An Adaptive Monitoring-based Algorithm Building P
18.6.1 Motivation and Model
18.6.2 A Monitoring-Based Adaptive Algorithm for the Failure Detector Class P
18.6.3 Proof the Algorithm
18.7 From the t-Source Assumption to an Ω Eventual Leader
18.7.1 The t-Source Assumption and the Model CAMPn,t[t-SOURCE]
18.7.2 Electing an Eventual Leader in CAMPn,t[t-SOURCE]
18.7.3 Proof of the Algorithm
18.8 Electing an Eventual Leader in CAMPn,t[ t-MS PAT]
18.8.1 A Query/Response Pattern
18.8.2 Electing an Eventual Leader in CAMPn,t[t-MS PAT]
18.8.3 Proof of the Algorithm
18.9 Building Ω in a Hybrid Model
18.10 Construction of a Biased Common Coin from Local Coins
18.10.1 Definition of a Biased Common Coin
18.10.2 The CORE Communication Abstraction
18.10.3 Construction of a Common Coin with a Constant Bias
18.10.4 On the Use of a Biased Common Coin
18.11 Summary
18.12 Bibliographic notes
18.13 Exercises and Problems
19 Implementing Consensus in Enriched Byzantine Async Systems
19.1 Definition Reminder and Two Observations
19.1.1 Definition of Byzantine Consensus (Reminder)
19.1.2 Why Not to Use an Eventual Leader
19.1.3 On the Weakest Synchrony Assumption for Byzantine Consensus
19.2 Binary Byzantine Consensus from a Message Scheduling Assumption
19.2.1 A Message Scheduling Assumption
19.2.2 A Binary Byzantine Consensus Algorithm
19.2.3 Proof of the Algorithm
19.2.4 Additional Properties
19.3 An Optimal Randomized Binary Byzantine Consensus Algorithm
19.3.1 The Binary-Value Broadcast Abstraction
19.3.2 A Binary Randomized Consensus Algorithm
19.3.3 Proof of the BV-Based Binary Byzantine Consensus Algorithm
19.3.4 From Decision to Decision and Termination
19.4 From Binary to Multivalued Byzantine Consensus
19.4.1 A Reduction Algorithm
19.4.2 Proof of the Reduction Algorithm
19.5 From Binary to No-intrusion Multivalued Byzantine Consensus
19.5.1 The Validated Byzantine Broadcast Abstraction
19.5.2 An Algorithm Implementing VBB-broadcast
19.5.3 Proof of the VBB-broadcast Algorithm
19.5.4 A VBB-Based Multivalued to Binary Byzantine Consensus Reduction
19.5.5 Proof of the VBB-Based Reduction Algorithm
19.6 Summary
19.7 Appendix: Proof-of-Work (PoW) Seen as Eventual Byzantine Agreement
19.8 Bibliographic Notes
19.9 Exercises and Problems
20 Quorum, Signatures & Overlays
20.1 Quorum Systems
20.1.1 Definitions
20.1.2 Examples of Use of a Quorum System
20.1.3 A Few Classical Quorums
20.1.4 Quorum Composition
20.2 Digital Signatures
20.2.1 Cipher, Keys, and Signatures
20.2.2 How to Build a Secret Key: Diffie-Hellman’s Algorithm
20.2.3 How to Build a Public Key: Rivest-Shamir-Adleman’s (RSA) Algorithm
20.2.4 How to Share a Secret: Shamir’s Algorithm
20.3 Overlay Networks
20.3.1 On Regular Graphs
20.3.2 Hypercube
20.3.3 de Bruijn Graphs
20.3.4 Kautz Graphs
20.3.5 Undirected de Bruijn and Kautz Graphs
20.4 Bibliographic Notes
Afterword
Biblio
Index