logo资料库

Multi-Core Cache Hierarchies.pdf

第1页 / 共155页
第2页 / 共155页
第3页 / 共155页
第4页 / 共155页
第5页 / 共155页
第6页 / 共155页
第7页 / 共155页
第8页 / 共155页
资料共155页,剩余部分请下载后查看
Preface
Acknowledgments
Basic Elements of Large Cache Design
Shared Vs. Private Caches
Shared LLC
Private LLC
Workload Analysis
Centralized Vs. Distributed Shared Caches
Non-Uniform Cache Access
Inclusion
Organizing Data in CMP Last Level Caches
Data Management for a Large Shared NUCA Cache
Placement/Migration/Search Policies for D-NUCA
Replication Policies in Shared Caches
OS-based Page Placement
Data Management for a Collection of Private Caches
Discussion
Policies Impacting Cache Hit Rates
Cache Partitioning for Throughput and Quality-of-Service
Introduction
Throughput
QoS Policies
Selecting a Highly Useful Population for a Large Shared Cache
Replacement/Insertion Policies
Novel Organizations for Associativity
Block-Level Optimizations
Summary
Interconnection Networks within Large Caches
Basic Large Cache Design
Cache Array Design
Cache Interconnects
Packet-Switched Routed Networks
The Impact of Interconnect Design on NUCA and UCA Caches
NUCA Caches
UCA Caches
Innovative Network Architectures for Large Caches
Technology
Static-RAM Limitations
Parameter Variation
Modeling Methodology
Mitigating the Effects of Process Variation
Tolerating Hard and Soft Errors
Leveraging 3D Stacking to Resolve SRAM Problems
Emerging Technologies
3T1D RAM
Embedded DRAM
Non-Volatile Memories
Concluding Remarks
Bibliography
Authors' Biographies
fm.pdf
Preface
Acknowledgments
Basic Elements of Large Cache Design
Shared Vs. Private Caches
Shared LLC
Private LLC
Workload Analysis
Centralized Vs. Distributed Shared Caches
Non-Uniform Cache Access
Inclusion
Organizing Data in CMP Last Level Caches
Data Management for a Large Shared NUCA Cache
Placement/Migration/Search Policies for D-NUCA
Replication Policies in Shared Caches
OS-based Page Placement
Data Management for a Collection of Private Caches
Discussion
Policies Impacting Cache Hit Rates
Cache Partitioning for Throughput and Quality-of-Service
Introduction
Throughput
QoS Policies
Selecting a Highly Useful Population for a Large Shared Cache
Replacement/Insertion Policies
Novel Organizations for Associativity
Block-Level Optimizations
Summary
Interconnection Networks within Large Caches
Basic Large Cache Design
Cache Array Design
Cache Interconnects
Packet-Switched Routed Networks
The Impact of Interconnect Design on NUCA and UCA Caches
NUCA Caches
UCA Caches
Innovative Network Architectures for Large Caches
Technology
Static-RAM Limitations
Parameter Variation
Modeling Methodology
Mitigating the Effects of Process Variation
Tolerating Hard and Soft Errors
Leveraging 3D Stacking to Resolve SRAM Problems
Emerging Technologies
3T1D RAM
Embedded DRAM
Non-Volatile Memories
Concluding Remarks
Bibliography
Authors' Biographies
Series ISSN: 1935-3235 Series ISSN: 1935-3235 Series ISSN: 1935-3235 SYNTHESIS LECTURES ON SYNTHESIS LECTURES ON SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE COMPUTER ARCHITECTURE COMPUTER ARCHITECTURE Series Editor: Mark D. Hill, University of Wisconsin Series Editor: Mark D. Hill, University of Wisconsin Series Editor: Mark D. Hill, University of Wisconsin Multi-Core Cache Hierarchies Multi-Core Cache Hierarchies Multi-Core Cache Hierarchies Rajeev Balasubramonian, University of Utah Rajeev Balasubramonian, University of Utah Rajeev Balasubramonian, University of Utah Norman Jouppi, HP Labs Norman Jouppi, HP Labs Norman Jouppi, HP Labs Naveen Muralimanohar, HP Labs Naveen Muralimanohar, HP Labs Naveen Muralimanohar, HP Labs A key determinant of overall system performance and power dissipation is the cache hierarchy A key determinant of overall system performance and power dissipation is the cache hierarchy A key determinant of overall system performance and power dissipation is the cache hierarchy since access to off-chip memory consumes many more cycles and energy than on-chip since access to off-chip memory consumes many more cycles and energy than on-chip since access to off-chip memory consumes many more cycles and energy than on-chip accesses. In addition, multi-core processors are expected to place ever higher bandwidth accesses. In addition, multi-core processors are expected to place ever higher bandwidth accesses. In addition, multi-core processors are expected to place ever higher bandwidth demands on the memory system. All these issues make it important to avoid off-chip memory demands on the memory system. All these issues make it important to avoid off-chip memory demands on the memory system. All these issues make it important to avoid off-chip memory access by improving the efficiency of the on-chip cache. Future multi-core processors will access by improving the efficiency of the on-chip cache. Future multi-core processors will access by improving the efficiency of the on-chip cache. Future multi-core processors will have many large cache banks connected by a network and shared by many cores. Hence, have many large cache banks connected by a network and shared by many cores. Hence, have many large cache banks connected by a network and shared by many cores. Hence, many important problems must be solved: cache resources must be allocated across many many important problems must be solved: cache resources must be allocated across many many important problems must be solved: cache resources must be allocated across many cores, data must be placed in cache banks that are near the accessing core, and the most cores, data must be placed in cache banks that are near the accessing core, and the most cores, data must be placed in cache banks that are near the accessing core, and the most important data must be identified for retention. Finally, difficulties in scaling existing important data must be identified for retention. Finally, difficulties in scaling existing important data must be identified for retention. Finally, difficulties in scaling existing technologies require adapting to and exploiting new technology constraints. technologies require adapting to and exploiting new technology constraints. technologies require adapting to and exploiting new technology constraints. The book attempts a synthesis of recent cache research that has focused on innovations The book attempts a synthesis of recent cache research that has focused on innovations The book attempts a synthesis of recent cache research that has focused on innovations for multi-core processors. It is an excellent starting point for early-stage graduate students, for multi-core processors. It is an excellent starting point for early-stage graduate students, for multi-core processors. It is an excellent starting point for early-stage graduate students, researchers, and practitioners who wish to understand the landscape of recent cache research. researchers, and practitioners who wish to understand the landscape of recent cache research. researchers, and practitioners who wish to understand the landscape of recent cache research. The book is suitable as a reference for advanced computer architecture classes as well as for The book is suitable as a reference for advanced computer architecture classes as well as for The book is suitable as a reference for advanced computer architecture classes as well as for experienced researchers and VLSI engineers. experienced researchers and VLSI engineers. experienced researchers and VLSI engineers. B B B A A A L L L A A A S S S U U U B B B R R R A A A M M M O O O N N N I I I A A A N N N • J • J • J O O O U U U P P P P P P I I I • • • M M M U U U R R R A A A L L L I I I M M M A A A N N N O O O H H H A A A R R R M M M U U U L L L T T T I I I - - - C C C O O O R R R E E E C C C A A A C C C H H H E E E H H H I I I E E E R R R A A A R R R C C C H H H I I I E E E S S S & & & CM& Morgan Claypool Publishers CM& Morgan Claypool Publishers CM& Morgan Claypool Publishers Multi-Core Cache Multi-Core Cache Multi-Core Cache Hierarchies Hierarchies Hierarchies Rajeev Balasubramonian Rajeev Balasubramonian Rajeev Balasubramonian Norman Jouppi Norman Jouppi Norman Jouppi Naveen Muralimanohar Naveen Muralimanohar Naveen Muralimanohar About SYNTHESIs About SYNTHESIs About SYNTHESIs This volume is a printed version of a work that appears in the Synthesis This volume is a printed version of a work that appears in the Synthesis This volume is a printed version of a work that appears in the Synthesis Digital Library of Engineering and Computer Science. Synthesis Lectures Digital Library of Engineering and Computer Science. Synthesis Lectures Digital Library of Engineering and Computer Science. Synthesis Lectures provide concise, original presentations of important research and development provide concise, original presentations of important research and development provide concise, original presentations of important research and development topics, published quickly, in digital and print formats. For more information topics, published quickly, in digital and print formats. For more information topics, published quickly, in digital and print formats. For more information visit www.morganclaypool.com visit www.morganclaypool.com visit www.morganclaypool.com & & & Morgan Claypool Publishers Morgan Claypool Publishers Morgan Claypool Publishers w w w . m o r g a n c l a y p o o l . c o m w w w . m o r g a n c l a y p o o l . c o m w w w . m o r g a n c l a y p o o l . c o m M M M O O O R R R G G G A A A N N N & & & C C C L L L A A A Y Y Y P P P O O O O O O L L L ISBN: 978-1-59829-753-9 ISBN: 978-1-59829-753-9 ISBN: 978-1-59829-753-9 90000 90000 90000 9 781598 297539 9 781598 297539 9 781598 297539 SYNTHESIS LECTURES ON SYNTHESIS LECTURES ON SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE COMPUTER ARCHITECTURE COMPUTER ARCHITECTURE Mark D. Hill, Series Editor Mark D. Hill, Series Editor Mark D. Hill, Series Editor
Multi-Core Cache Hierarchies
Synthesis Lectures on Computer Architecture Editor Mark D. Hill, University of Wisconsin Synthesis Lectures on Computer Architecture publishes 50- to 100-page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. The scope will largely follow the purview of premier computer architecture conferences, such as ISCA, HPCA, MICRO, and ASPLOS. Multi-Core Cache Hierarchies Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar 2011 A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Mark D. Hill, and David A. Wood 2011 Dynamic Binary Modification: Tools, Techniques, and Applications Kim Hazelwood 2011 Quantum Computing for Computer Architects, Second Edition Tzvetan S. Metodi, Arvin I. Faruque, and Frederic T. Chong 2011 High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities Dennis Abts and John Kim 2011 Processor Microarchitecture: An Implementation Perspective Antonio González, Fernando Latorre, and Grigorios Magklis 2010 Transactional Memory, 2nd edition Tim Harris, James Larus, and Ravi Rajwar 2010
iii Computer Architecture Performance Evaluation Methods Lieven Eeckhout 2010 Introduction to Reconfigurable Supercomputing Marco Lanzagorta, Stephen Bique, and Robert Rosenberg 2009 On-Chip Networks Natalie Enright Jerger and Li-Shiuan Peh 2009 The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It Bruce Jacob 2009 Fault Tolerant Computer Architecture Daniel J. Sorin 2009 The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines free access Luiz André Barroso and Urs Hölzle 2009 Computer Architecture Techniques for Power-Efficiency Stefanos Kaxiras and Margaret Martonosi 2008 Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency Kunle Olukotun, Lance Hammond, and James Laudon 2007 Transactional Memory James R. Larus and Ravi Rajwar 2006 Quantum Computing for Computer Architects Tzvetan S. Metodi and Frederic T. Chong 2006
Copyright © 2011 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. Multi-Core Cache Hierarchies Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar www.morganclaypool.com ISBN: 9781598297539 ISBN: 9781598297546 paperback ebook DOI 10.2200/S00365ED1V01Y201105CAC017 A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE Lecture #17 Series Editor: Mark D. Hill, University of Wisconsin Series ISSN Synthesis Lectures on Computer Architecture Print 1935-3235 Electronic 1935-3243
Multi-Core Cache Hierarchies Rajeev Balasubramonian University of Utah Norman P. Jouppi HP Labs Naveen Muralimanohar HP Labs SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #17 CM& Morgan & cLaypool publishers
ABSTRACT A key determinant of overall system performance and power dissipation is the cache hierarchy since access to off-chip memory consumes many more cycles and energy than on-chip accesses. In addition, multi-core processors are expected to place ever higher bandwidth demands on the memory system. All these issues make it important to avoid off-chip memory access by improving the efficiency of the on-chip cache. Future multi-core processors will have many large cache banks connected by a network and shared by many cores. Hence, many important problems must be solved: cache resources must be allocated across many cores, data must be placed in cache banks that are near the accessing core, and the most important data must be identified for retention. Finally, difficulties in scaling existing technologies require adapting to and exploiting new technology constraints. The book attempts a synthesis of recent cache research that has focused on innovations for multi-core processors. It is an excellent starting point for early-stage graduate students, researchers, practitioners who wish to understand the landscape of recent cache research. The book is suitable as a reference for advanced computer architecture classes as well as for experienced researchers and VLSI engineers. KEYWORDS computer architecture, multi-core processors, cache hierarchies, shared and private caches, non-uniform cache access (NUCA), quality-of-service, cache partitions, re- placement policies, memory prefetch, on-chip networks, memory cells.
分享到:
收藏