Series ISSN: 1935-3235
Series ISSN: 1935-3235
Series ISSN: 1935-3235
SYNTHESIS LECTURES ON
SYNTHESIS LECTURES ON
SYNTHESIS LECTURES ON
COMPUTER ARCHITECTURE
COMPUTER ARCHITECTURE
COMPUTER ARCHITECTURE
Series Editor: Mark D. Hill, University of Wisconsin
Series Editor: Mark D. Hill, University of Wisconsin
Series Editor: Mark D. Hill, University of Wisconsin
Multi-Core Cache Hierarchies
Multi-Core Cache Hierarchies
Multi-Core Cache Hierarchies
Rajeev Balasubramonian, University of Utah
Rajeev Balasubramonian, University of Utah
Rajeev Balasubramonian, University of Utah
Norman Jouppi, HP Labs
Norman Jouppi, HP Labs
Norman Jouppi, HP Labs
Naveen Muralimanohar, HP Labs
Naveen Muralimanohar, HP Labs
Naveen Muralimanohar, HP Labs
A key determinant of overall system performance and power dissipation is the cache hierarchy
A key determinant of overall system performance and power dissipation is the cache hierarchy
A key determinant of overall system performance and power dissipation is the cache hierarchy
since access to off-chip memory consumes many more cycles and energy than on-chip
since access to off-chip memory consumes many more cycles and energy than on-chip
since access to off-chip memory consumes many more cycles and energy than on-chip
accesses. In addition, multi-core processors are expected to place ever higher bandwidth
accesses. In addition, multi-core processors are expected to place ever higher bandwidth
accesses. In addition, multi-core processors are expected to place ever higher bandwidth
demands on the memory system. All these issues make it important to avoid off-chip memory
demands on the memory system. All these issues make it important to avoid off-chip memory
demands on the memory system. All these issues make it important to avoid off-chip memory
access by improving the efficiency of the on-chip cache. Future multi-core processors will
access by improving the efficiency of the on-chip cache. Future multi-core processors will
access by improving the efficiency of the on-chip cache. Future multi-core processors will
have many large cache banks connected by a network and shared by many cores. Hence,
have many large cache banks connected by a network and shared by many cores. Hence,
have many large cache banks connected by a network and shared by many cores. Hence,
many important problems must be solved: cache resources must be allocated across many
many important problems must be solved: cache resources must be allocated across many
many important problems must be solved: cache resources must be allocated across many
cores, data must be placed in cache banks that are near the accessing core, and the most
cores, data must be placed in cache banks that are near the accessing core, and the most
cores, data must be placed in cache banks that are near the accessing core, and the most
important data must be identified for retention. Finally, difficulties in scaling existing
important data must be identified for retention. Finally, difficulties in scaling existing
important data must be identified for retention. Finally, difficulties in scaling existing
technologies require adapting to and exploiting new technology constraints.
technologies require adapting to and exploiting new technology constraints.
technologies require adapting to and exploiting new technology constraints.
The book attempts a synthesis of recent cache research that has focused on innovations
The book attempts a synthesis of recent cache research that has focused on innovations
The book attempts a synthesis of recent cache research that has focused on innovations
for multi-core processors. It is an excellent starting point for early-stage graduate students,
for multi-core processors. It is an excellent starting point for early-stage graduate students,
for multi-core processors. It is an excellent starting point for early-stage graduate students,
researchers, and practitioners who wish to understand the landscape of recent cache research.
researchers, and practitioners who wish to understand the landscape of recent cache research.
researchers, and practitioners who wish to understand the landscape of recent cache research.
The book is suitable as a reference for advanced computer architecture classes as well as for
The book is suitable as a reference for advanced computer architecture classes as well as for
The book is suitable as a reference for advanced computer architecture classes as well as for
experienced researchers and VLSI engineers.
experienced researchers and VLSI engineers.
experienced researchers and VLSI engineers.
B
B
B
A
A
A
L
L
L
A
A
A
S
S
S
U
U
U
B
B
B
R
R
R
A
A
A
M
M
M
O
O
O
N
N
N
I
I
I
A
A
A
N
N
N
•
J
•
J
•
J
O
O
O
U
U
U
P
P
P
P
P
P
I
I
I
•
•
•
M
M
M
U
U
U
R
R
R
A
A
A
L
L
L
I
I
I
M
M
M
A
A
A
N
N
N
O
O
O
H
H
H
A
A
A
R
R
R
M
M
M
U
U
U
L
L
L
T
T
T
I
I
I
-
-
-
C
C
C
O
O
O
R
R
R
E
E
E
C
C
C
A
A
A
C
C
C
H
H
H
E
E
E
H
H
H
I
I
I
E
E
E
R
R
R
A
A
A
R
R
R
C
C
C
H
H
H
I
I
I
E
E
E
S
S
S
&
&
&
CM& Morgan Claypool Publishers
CM& Morgan Claypool Publishers
CM& Morgan Claypool Publishers
Multi-Core Cache
Multi-Core Cache
Multi-Core Cache
Hierarchies
Hierarchies
Hierarchies
Rajeev Balasubramonian
Rajeev Balasubramonian
Rajeev Balasubramonian
Norman Jouppi
Norman Jouppi
Norman Jouppi
Naveen Muralimanohar
Naveen Muralimanohar
Naveen Muralimanohar
About SYNTHESIs
About SYNTHESIs
About SYNTHESIs
This volume is a printed version of a work that appears in the Synthesis
This volume is a printed version of a work that appears in the Synthesis
This volume is a printed version of a work that appears in the Synthesis
Digital Library of Engineering and Computer Science. Synthesis Lectures
Digital Library of Engineering and Computer Science. Synthesis Lectures
Digital Library of Engineering and Computer Science. Synthesis Lectures
provide concise, original presentations of important research and development
provide concise, original presentations of important research and development
provide concise, original presentations of important research and development
topics, published quickly, in digital and print formats. For more information
topics, published quickly, in digital and print formats. For more information
topics, published quickly, in digital and print formats. For more information
visit www.morganclaypool.com
visit www.morganclaypool.com
visit www.morganclaypool.com
&
&
&
Morgan Claypool Publishers
Morgan Claypool Publishers
Morgan Claypool Publishers
w w w . m o r g a n c l a y p o o l . c o m
w w w . m o r g a n c l a y p o o l . c o m
w w w . m o r g a n c l a y p o o l . c o m
M
M
M
O
O
O
R
R
R
G
G
G
A
A
A
N
N
N
&
&
&
C
C
C
L
L
L
A
A
A
Y
Y
Y
P
P
P
O
O
O
O
O
O
L
L
L
ISBN: 978-1-59829-753-9
ISBN: 978-1-59829-753-9
ISBN: 978-1-59829-753-9
90000
90000
90000
9 781598 297539
9 781598 297539
9 781598 297539
SYNTHESIS LECTURES ON
SYNTHESIS LECTURES ON
SYNTHESIS LECTURES ON
COMPUTER ARCHITECTURE
COMPUTER ARCHITECTURE
COMPUTER ARCHITECTURE
Mark D. Hill, Series Editor
Mark D. Hill, Series Editor
Mark D. Hill, Series Editor
Multi-Core Cache Hierarchies
Synthesis Lectures on Computer
Architecture
Editor
Mark D. Hill, University of Wisconsin
Synthesis Lectures on Computer Architecture publishes 50- to 100-page publications on topics
pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware
components to create computers that meet functional, performance and cost goals. The scope will
largely follow the purview of premier computer architecture conferences, such as ISCA, HPCA,
MICRO, and ASPLOS.
Multi-Core Cache Hierarchies
Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar
2011
A Primer on Memory Consistency and Cache Coherence
Daniel J. Sorin, Mark D. Hill, and David A. Wood
2011
Dynamic Binary Modification: Tools, Techniques, and Applications
Kim Hazelwood
2011
Quantum Computing for Computer Architects, Second Edition
Tzvetan S. Metodi, Arvin I. Faruque, and Frederic T. Chong
2011
High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities
Dennis Abts and John Kim
2011
Processor Microarchitecture: An Implementation Perspective
Antonio González, Fernando Latorre, and Grigorios Magklis
2010
Transactional Memory, 2nd edition
Tim Harris, James Larus, and Ravi Rajwar
2010
iii
Computer Architecture Performance Evaluation Methods
Lieven Eeckhout
2010
Introduction to Reconfigurable Supercomputing
Marco Lanzagorta, Stephen Bique, and Robert Rosenberg
2009
On-Chip Networks
Natalie Enright Jerger and Li-Shiuan Peh
2009
The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It
Bruce Jacob
2009
Fault Tolerant Computer Architecture
Daniel J. Sorin
2009
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines
free access
Luiz André Barroso and Urs Hölzle
2009
Computer Architecture Techniques for Power-Efficiency
Stefanos Kaxiras and Margaret Martonosi
2008
Chip Multiprocessor Architecture: Techniques to Improve Throughput and Latency
Kunle Olukotun, Lance Hammond, and James Laudon
2007
Transactional Memory
James R. Larus and Ravi Rajwar
2006
Quantum Computing for Computer Architects
Tzvetan S. Metodi and Frederic T. Chong
2006
Copyright © 2011 by Morgan & Claypool
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means—electronic, mechanical, photocopy, recording, or any other except for brief quotations in
printed reviews, without the prior permission of the publisher.
Multi-Core Cache Hierarchies
Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar
www.morganclaypool.com
ISBN: 9781598297539
ISBN: 9781598297546
paperback
ebook
DOI 10.2200/S00365ED1V01Y201105CAC017
A Publication in the Morgan & Claypool Publishers series
SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE
Lecture #17
Series Editor: Mark D. Hill, University of Wisconsin
Series ISSN
Synthesis Lectures on Computer Architecture
Print 1935-3235 Electronic 1935-3243
Multi-Core Cache Hierarchies
Rajeev Balasubramonian
University of Utah
Norman P. Jouppi
HP Labs
Naveen Muralimanohar
HP Labs
SYNTHESIS LECTURES ON COMPUTER ARCHITECTURE #17
CM&
Morgan
&
cLaypool
publishers
ABSTRACT
A key determinant of overall system performance and power dissipation is the cache hierarchy
since access to off-chip memory consumes many more cycles and energy than on-chip accesses.
In addition, multi-core processors are expected to place ever higher bandwidth demands on the
memory system. All these issues make it important to avoid off-chip memory access by improving
the efficiency of the on-chip cache. Future multi-core processors will have many large cache banks
connected by a network and shared by many cores. Hence, many important problems must be solved:
cache resources must be allocated across many cores, data must be placed in cache banks that are near
the accessing core, and the most important data must be identified for retention. Finally, difficulties
in scaling existing technologies require adapting to and exploiting new technology constraints.
The book attempts a synthesis of recent cache research that has focused on innovations for
multi-core processors. It is an excellent starting point for early-stage graduate students, researchers,
practitioners who wish to understand the landscape of recent cache research. The book is suitable
as a reference for advanced computer architecture classes as well as for experienced researchers and
VLSI engineers.
KEYWORDS
computer architecture, multi-core processors, cache hierarchies, shared and private
caches, non-uniform cache access (NUCA), quality-of-service, cache partitions, re-
placement policies, memory prefetch, on-chip networks, memory cells.