Contents
1. Introduction to Consistency and Coherence
1.1 Consistency
1.2 Coherence
1.3 A Consistency and Coherence Quiz
1.4 What this Primer Does Not Do
2. Coherence Basics
2.1 Baseline System Model
2.2 The Problem: How Incoherence Could Possibly Occur
2.3 Defining Coherence
2.3.1 Maintaining the Coherence Invariants
2.3.2 The Granularity of Coherence
2.3.3 The Scope of Coherence
2.4 References
3. Memory Consistency Motivation and Sequential Consistency
3.1 Problems with Shared Memory Behavior
3.2 What Is a Memory Consistency Model?
3.3 Consistency vs. Coherence
3.4 Basic Idea of Sequential Consistency (SC)
3.5 A Little SC Formalism
3.6 Naive SC Implementations
3.7 A Basic SC Implementation with Cache Coherence
3.8 Optimized SC Implementations with Cache Coherence
3.9 Atomic Operations with SC
3.10 Putting it All Together: MIPS R10000
3.11 Further Reading Regarding SC
3.12 References
4. Total Store Order and the x86 Memory Model
4.1 Motivation for TSO/x86
4.2 Basic Idea of TSO/x86
4.3 A Little TSO Formalism and an x86 Conjecture
4.4 Implementing TSO/x86
4.5 Atomic Instructions and Fences with TSO
4.5.1 Atomic Instructions
4.5.2 Fences
4.6 Further Reading Regarding TSO
4.7 Comparing SC and TSO
4.8 References
5. Relaxed Memory Consistency
5.1 Motivation
5.1.1 Opportunities to Reorder Memory Operations
5.1.2 Opportunities to Exploit Reordering
5.2 An Example Relaxed Consistency Model (XC)
5.2.1 The Basic Idea of the XC Model
5.2.2 Examples Using Fences under XC
5.2.3 Formalizing XC
5.2.4 Examples Showing XC Operating Correctly
5.3 Implementing XC
5.3.1 Atomic Instructions with XC
5.3.2 Fences with XC
5.3.3 A Caveat
5.4 Sequential Consistency for Data-Race-Free Programs
5.5 Some Relaxed Model Concepts
5.5.1 Release Consistency
5.5.2 Causality and Write Atomicity
5.6 A Relaxed Memory Model Case Study: IBM Power
5.7 Further Reading and Commercial Relaxed Memory Models
5.7.1 Academic Literature
5.7.2 Commercial Models
5.8 Comparing Memory Models
5.8.1 How Do Relaxed Memory Models Relate to Each Other and TSO and SC?
5.8.2 How Good Are Relaxed Models?
5.9 High-Level Language Models
5.10 References
6. Coherence Protocols
6.1 The Big Picture
6.2 Specifying Coherence Protocols
6.3 Example of a Simple Coherence Protocol
6.4 Overview of Coherence Protocol Design Space
6.4.1 States
6.4.2 Transactions
6.4.3 Major Protocol Design Options
6.5 References
7. Snooping Coherence Protocols
7.1 Introduction to Snooping
7.2 Baseline Snooping Protocol
7.2.1 High-Level Protocol Specification
7.2.2 Simple Snooping System Model: Atomic Requests,Atomic Transactions
7.2.3 Baseline Snooping System Model: Non-Atomic Requests,Atomic Transactions
7.2.4 Running Example
7.2.5 Protocol Simplifications
7.3 Adding the Exclusive State
7.3.1 Motivation
7.3.2 Getting to the Exclusive State
7.3.3 High-Level Specification of Protocol
7.3.4 Detailed Specification
7.3.5 Running Example
7.4 Adding the Owned State
7.4.1 Motivation
7.4.2 High-Level Protocol Specification
7.4.3 Detailed Protocol Specification
7.4.4 Running Example
7.5 Non-Atomic Bus
7.5.1 Motivation
7.5.2 In-Order vs. Out-of-Order Responses
7.5.3 Non-Atomic System Model
7.5.4 An MSI Protocol with a Split-Transaction Bus
7.5.5 An Optimized, Non-Stalling MSI Protocol with a Split-Transaction Bus
7.6 Optimizations to the Bus Interconnection Network
7.6.1 Separate Non-Bus Network for Data Responses
7.6.2 Logical Bus for Coherence Requests
7.7 Case Studies
7.7.1 Sun Starfire E10000
7.7.2 IBM Power5
7.8 Discussion and the Future of Snooping
7.9 References
8. Directory Coherence Protocols
8.1 Introduction to Directory Protocols
8.2 Baseline Directory System
8.2.1 Directory System Model
8.2.2 High-level Protocol Specification
8.2.3 Avoiding Deadlock
8.2.4 Detailed Protocol Specification
8.2.5 Protocol Operation
8.2.6 Protocol Simplifications
8.3 Adding the Exclusive State
8.3.1 High-Level Protocol Specification
8.3.2 Detailed Protocol Specification
8.4 Adding the Owned State
8.4.1 High-Level Protocol Specification
8.4.2 Detailed Protocol Specification
8.5 Representing Directory State
8.5.1 Coarse Directory
8.5.2 Limited Pointer Directory
8.6 Directory Organization
8.6.1 Directory Cache Backed by DRAM
8.6.2 Inclusive Directory Caches
8.6.3 Null Directory Cache (with no backing store)
8.7 Performance and Scalability Optimizations
8.7.1 Distributed Directories
8.7.2 Non-Stalling Directory Protocols
8.7.3 Interconnection Networks without Point-to-Point Ordering
8.7.4 Silent vs. Non-Silent Evictions of Blocks in State S
8.8 Case Studies
8.8.1 SGI Origin 2000
8.8.2 Coherent HyperTransport
8.8.3 HyperTransport Assist
8.8.4 Intel QPI
8.9 Discussion and the Future of Directory Protocols
8.10 References
9. Advanced Topics in Coherence
9.1 System Models
9.1.1 Instruction Caches
9.1.2 Translation Lookaside Buffers (TLBs)
9.1.3 Virtual Caches
9.1.4 Write-Through Caches
9.1.5 Coherent Direct Memory Access (DMA)
9.1.6 Multi-Level Caches and Hierarchical Coherence Protocols
9.2 Performance Optimizations
9.2.1 Migratory Sharing Optimization
9.2.2 False Sharing Optimizations
9.3 Maintaining Liveness
9.3.1 Deadlock
9.3.2 Livelock
9.3.3 Starvation
9.4 Token Coherence
9.5 The Future of Coherence
9.6 References
Author Biographies