A primer on memory consistency and cache coherence(带标签).pdf

发布时间：2022-06-07 发布人：admin 分类：说明书资料大小：3.88M 资料格式：pdf 举报版权申诉

u014041173-10360729-4744302542910831387.pdf-第1页.png

第1页 / 共214页

u014041173-10360729-4744302542910831387.pdf-第2页.png

第2页 / 共214页

u014041173-10360729-4744302542910831387.pdf-第3页.png

第3页 / 共214页

u014041173-10360729-4744302542910831387.pdf-第4页.png

第4页 / 共214页

u014041173-10360729-4744302542910831387.pdf-第5页.png

第5页 / 共214页

u014041173-10360729-4744302542910831387.pdf-第6页.png

第6页 / 共214页

u014041173-10360729-4744302542910831387.pdf-第7页.png

第7页 / 共214页

u014041173-10360729-4744302542910831387.pdf-第8页.png

第8页 / 共214页

Contents

1. Introduction to Consistency and Coherence

1.1 Consistency

1.2 Coherence

1.3 A Consistency and Coherence Quiz

1.4 What this Primer Does Not Do

2. Coherence Basics

2.1 Baseline System Model

2.2 The Problem: How Incoherence Could Possibly Occur

2.3 Defining Coherence

2.3.1 Maintaining the Coherence Invariants

2.3.2 The Granularity of Coherence

2.3.3 The Scope of Coherence

2.4 References

3. Memory Consistency Motivation and Sequential Consistency

3.1 Problems with Shared Memory Behavior

3.2 What Is a Memory Consistency Model?

3.3 Consistency vs. Coherence

3.4 Basic Idea of Sequential Consistency (SC)

3.5 A Little SC Formalism

3.6 Naive SC Implementations

3.7 A Basic SC Implementation with Cache Coherence

3.8 Optimized SC Implementations with Cache Coherence

3.9 Atomic Operations with SC

3.10 Putting it All Together: MIPS R10000

3.11 Further Reading Regarding SC

3.12 References

4. Total Store Order and the x86 Memory Model

4.1 Motivation for TSO/x86

4.2 Basic Idea of TSO/x86

4.3 A Little TSO Formalism and an x86 Conjecture

4.4 Implementing TSO/x86

4.5 Atomic Instructions and Fences with TSO

4.5.1 Atomic Instructions

4.5.2 Fences

4.6 Further Reading Regarding TSO

4.7 Comparing SC and TSO

4.8 References

5. Relaxed Memory Consistency

5.1 Motivation

5.1.1 Opportunities to Reorder Memory Operations

5.1.2 Opportunities to Exploit Reordering

5.2 An Example Relaxed Consistency Model (XC)

5.2.1 The Basic Idea of the XC Model

5.2.2 Examples Using Fences under XC

5.2.3 Formalizing XC

5.2.4 Examples Showing XC Operating Correctly

5.3 Implementing XC

5.3.1 Atomic Instructions with XC

5.3.2 Fences with XC

5.3.3 A Caveat

5.4 Sequential Consistency for Data-Race-Free Programs

5.5 Some Relaxed Model Concepts

5.5.1 Release Consistency

5.5.2 Causality and Write Atomicity

5.6 A Relaxed Memory Model Case Study: IBM Power

5.7 Further Reading and Commercial Relaxed Memory Models

5.7.1 Academic Literature

5.7.2 Commercial Models

5.8 Comparing Memory Models

5.8.1 How Do Relaxed Memory Models Relate to Each Other and TSO and SC?

5.8.2 How Good Are Relaxed Models?

5.9 High-Level Language Models

5.10 References

6. Coherence Protocols

6.1 The Big Picture

6.2 Specifying Coherence Protocols

6.3 Example of a Simple Coherence Protocol

6.4 Overview of Coherence Protocol Design Space

6.4.1 States

6.4.2 Transactions

6.4.3 Major Protocol Design Options

6.5 References

7. Snooping Coherence Protocols

7.1 Introduction to Snooping

7.2 Baseline Snooping Protocol

7.2.1 High-Level Protocol Specification

7.2.2 Simple Snooping System Model: Atomic Requests,Atomic Transactions

7.2.3 Baseline Snooping System Model: Non-Atomic Requests,Atomic Transactions

7.2.4 Running Example

7.2.5 Protocol Simplifications

7.3 Adding the Exclusive State

7.3.1 Motivation

7.3.2 Getting to the Exclusive State

7.3.3 High-Level Specification of Protocol

7.3.4 Detailed Specification

7.3.5 Running Example

7.4 Adding the Owned State

7.4.1 Motivation

7.4.2 High-Level Protocol Specification

7.4.3 Detailed Protocol Specification

7.4.4 Running Example

7.5 Non-Atomic Bus

7.5.1 Motivation

7.5.2 In-Order vs. Out-of-Order Responses

7.5.3 Non-Atomic System Model

7.5.4 An MSI Protocol with a Split-Transaction Bus

7.5.5 An Optimized, Non-Stalling MSI Protocol with a Split-Transaction Bus

7.6 Optimizations to the Bus Interconnection Network

7.6.1 Separate Non-Bus Network for Data Responses

7.6.2 Logical Bus for Coherence Requests

7.7 Case Studies

7.7.1 Sun Starfire E10000

7.7.2 IBM Power5

7.8 Discussion and the Future of Snooping

7.9 References

8. Directory Coherence Protocols

8.1 Introduction to Directory Protocols

8.2 Baseline Directory System

8.2.1 Directory System Model

8.2.2 High-level Protocol Specification

8.2.3 Avoiding Deadlock

8.2.4 Detailed Protocol Specification

8.2.5 Protocol Operation

8.2.6 Protocol Simplifications

8.3 Adding the Exclusive State

8.3.1 High-Level Protocol Specification

8.3.2 Detailed Protocol Specification

8.4 Adding the Owned State

8.4.1 High-Level Protocol Specification

8.4.2 Detailed Protocol Specification

8.5 Representing Directory State

8.5.1 Coarse Directory

8.5.2 Limited Pointer Directory

8.6 Directory Organization

8.6.1 Directory Cache Backed by DRAM

8.6.2 Inclusive Directory Caches

8.6.3 Null Directory Cache (with no backing store)

8.7 Performance and Scalability Optimizations

8.7.1 Distributed Directories

8.7.2 Non-Stalling Directory Protocols

8.7.3 Interconnection Networks without Point-to-Point Ordering

8.7.4 Silent vs. Non-Silent Evictions of Blocks in State S

8.8 Case Studies

8.8.1 SGI Origin 2000

8.8.2 Coherent HyperTransport

8.8.3 HyperTransport Assist

8.8.4 Intel QPI

8.9 Discussion and the Future of Directory Protocols

8.10 References

9. Advanced Topics in Coherence

9.1 System Models

9.1.1 Instruction Caches

9.1.2 Translation Lookaside Buffers (TLBs)

9.1.3 Virtual Caches

9.1.4 Write-Through Caches

9.1.5 Coherent Direct Memory Access (DMA)

9.1.6 Multi-Level Caches and Hierarchical Coherence Protocols

9.2 Performance Optimizations

9.2.1 Migratory Sharing Optimization

9.2.2 False Sharing Optimizations

9.3 Maintaining Liveness

9.3.1 Deadlock

9.3.2 Livelock

9.3.3 Starvation

9.4 Token Coherence

9.5 The Future of Coherence

9.6 References

Author Biographies

SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE Series Editor: Mark D. Hill, University of Wisconsin A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Duke University Mark D. Hill and David A. Wood, University of Wisconsin, Madison Series ISSN: 1935-3235 S O R I N • H I L L • W O O D A Many modern computer systems and most multicore chips (chip multiprocessors) support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. For a shared memory machine, the memory consistency model defines the architecturally visible behavior of its memory system. Consistency definitions provide rules about loads and stores (or memory reads and writes) and how they act upon memory. As part of supporting a memory consistency model, many machines also provide cache coherence proto-cols that ensure that multiple cached copies of data are kept up-to-date. The goal of this primer is to provide readers with a basic understanding of consistency and coherence. This understanding includes both the issues that must be solved as well as a variety of solutions. We present both high-level concepts as well as specific, concrete examples from real-world systems. About SYNTHESIs This volume is a printed version of a work that appears in the Synthesis Digital Library of Engineering and Computer Science. Synthesis Lectures provide concise, original presentations of important research and development topics, published quickly, in digital and print formats. For more information visit www.morganclaypool.com & Morgan Claypool Publishers w w w . m o r g a n c l a y p o o l . c o m C O H E R E N C E M O R G A N & C L A Y P O O L ISBN: 978-1-60845-564-5 90000 9 781608 455645 P R I M E R O N M E M O R Y C O N S I S T E N C Y A N D C A C H E & CM& Morgan Claypool Publishers A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin Mark D. Hill David A. Wood SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE Mark D. Hill, Series Editor

A Primer on Memory Consistency and Cache Coherence

ii Synthesis Lectures on Computer One liner Chapter Title Architecture Editor Mark D. Hill, University of Wisconsin Synthesis Lectures on Computer Architecture publishes 50- to 100-page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. The scope will largely follow the purview of premier computer architecture conferences, such as ISCA, HPCA, MICRO, and ASPLOS. A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Mark D. Hill, and David A. Wood 2011 Dynamic Binary Modification: Tools, Techniques, and Applications Kim Hazelwood 2011 Quantum Computing for Computer Architects, Second Edition Tzvetan S. Metodi, Arvin I. Faruque, Frederic T. Chong 2011 High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities Dennis Abts, John Kim 2011 Processor Microarchitecture: An Implementation Perspective Antonio González, Fernando Latorre, and Grigorios Magklis 2011 Transactional Memory, 2nd edition Tim Harris, James Larus, and Ravi Rajwar 2010

iii Computer Architecture Performance Evaluation Models Lieven Eeckhout 2010 Introduction to Reconfigurable Supercomputing Marco Lanzagorta, Stephen Bique, and Robert Rosenberg 2009 On-Chip Networks Natalie Enright Jerger and Li-Shiuan Peh 2009 The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It Bruce Jacob 2009 Fault Tolerant Computer Architecture Daniel J. Sorin 2009 The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines Luiz André Barroso and Urs Hölzle 2009 Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi 2008 Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency Kunle Olukotun, Lance Hammond, and James Laudon 2007 Transactional Memory James R. Larus and Ravi Rajwar 2006 Quantum Computing for Computer Architects Tzvetan S. Metodi and Frederic T. Chong 2006

Copyright © 2011 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Mark D. Hill, and David A. Wood www.morganclaypool.com ISBN: 9781608455645 paperback ISBN: 9781608455652 ebook DOI: 10.2200/S00346ED1V01Y201104CAC016 A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #16 Lecture #16 Series Editor: Mark D. Hill, University of Wisconsin Series ISSN ISSN 1935-3235 ISSN 1935-3243 print electronic

A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Mark D. Hill, and David A. Wood SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #16

vi ABSTRACT Many modern computer systems and most multicore chips (chip multiprocessors) support shared memory in hardware. In a shared memory system, each of the processor cores may read and write to a single shared address space. For a shared memory machine, the memory consistency model defines the architecturally visible behavior of its memory system. Consistency definitions provide rules about loads and stores (or memory reads and writes) and how they act upon memory. As part of supporting a memory consistency model, many machines also provide cache coherence proto- cols that ensure that multiple cached copies of data are kept up-to-date. The goal of this primer is to provide readers with a basic understanding of consistency and coherence. This understanding includes both the issues that must be solved as well as a variety of solutions. We present both high- level concepts as well as specific, concrete examples from real-world systems. KEYWORDS computer architecture, memory consistency, cache coherence, shared memory, memory systems, multicore processor, multiprocessor

分享到：

赞收藏

资料库

A primer on memory consistency and cache coherence(带标签).pdf

相关推荐

操作系统

热门标签

最新资料