Synthesis Lectures on
Computer Architecture
Series ISSN: 1935-3235
Series Editor: Margaret Martonosi, Princeton University
General-Purpose Graphics Processor Architecture
Tor M. Aamodt, University of British Columbia
Wilson Wai Lun Fung, Samsung Electronics
Timothy G. Rogers, Purdue University
Originally developed to support video games, graphics processor units (GPUs) are now increasingly used
for general-purpose (non-graphics) applications ranging from machine learning to mining of cryptographic
currencies. GPUs can achieve improved performance and efficiency versus central processing units (CPUs)
by dedicating a larger fraction of hardware resources to computation. In addition, their general-purpose
programmability makes contemporary GPUs appealing to software developers in comparison to domain-
specific accelerators. This book provides an introduction to those interested in studying the architecture of
GPUs that support general-purpose computing. It collects together information currently only found among
a wide range of disparate sources.The authors led development of the GPGPU-Sim simulator widely used in
academic research on GPU architectures.
The first chapter of this book describes the basic hardware structure of GPUs and provides a brief
overview of their history. Chapter 2 provides a summary of GPU programming models relevant to the rest of
the book. Chapter 3 explores the architecture of GPU compute cores. Chapter 4 explores the architecture of
the GPU memory system. After describing the architecture of existing systems, Chapters 3 and 4 provide an
overview of related research. Chapter 5 summarizes cross-cutting research impacting both the compute core
and memory system.
This book should provide a valuable resource for those wishing to understand the architecture of graphics
processor units (GPUs) used for acceleration of general-purpose applications and to those who want to obtain
an introduction to the rapidly growing body of research exploring how to improve the architecture of these
GPUs.
About SYNTHESIS
This volume is a printed version of a work that appears in the Synthesis
Digital Library of Engineering and Computer Science. Synthesis
books provide concise, original presentations of important research and
development topics, published quickly, in digital and print formats.
store.morganclaypool.com
A
A
M
O
D
T
•
F
U
N
G
•
R
O
G
E
R
S
G
E
N
E
R
A
L
-
P
U
R
P
O
S
E
G
R
A
P
H
General-Purpose
Graphics Processor
Architecture
I
C
S
P
R
O
C
E
S
S
O
R
A
R
C
H
I
T
E
C
T
U
R
E
M
O
R
G
A
N
&
C
L
A
Y
P
O
O
L
Tor M. Aamodt
Wilson Wai Lun Fung
Timothy G. Rogers
Synthesis Lectures on
Computer Architecture
General-Purpose
Graphics Processor Architectures
Synthesis Lectures on
Computer Architecture
Editor
Margaret Martonosi, Princeton University
Founding Editor Emeritus
Mark D. Hill, University of Wisconsin, Madison
Synthesis Lectures on Computer Architecture publishes 50- to 100-page publications on topics
pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware
components to create computers that meet functional, performance and cost goals. The scope will
largely follow the purview of premier computer architecture conferences, such as ISCA, HPCA,
MICRO, and ASPLOS.
General-Purpose Graphics Processor Architectures
Tor M. Aamodt, Wilson Wai Lun Fung, and Timothy G. Rogers
2018
Compiling Algorithms for Heterogenous Systems
Steven Bell, Jing Pu, James Hegarty, and Mark Horowitz
2018
Architectural and Operating System Support for Virtual Memory
Abhishek Bhattacharjee and Daniel Lustig
2017
Deep Learning for Computer Architects
Brandon Reagen, Robert Adolf, Paul Whatmough, Gu-Yeon Wei, and David Brooks
2017
On-Chip Networks, Second Edition
Natalie Enright Jerger, Tushar Krishna, and Li-Shiuan Peh
2017
Space-Time Computing with Temporal Neural Networks
James E. Smith
2017
iv
Hardware and Software Support for Virtualization
Edouard Bugnion, Jason Nieh, and Dan Tsafrir
2017
Datacenter Design and Management: A Computer Architect’s Perspective
Benjamin C. Lee
2016
A Primer on Compression in the Memory Hierarchy
Somayeh Sardashti, Angelos Arelakis, Per Stenström, and David A. Wood
2015
Research Infrastructures for Hardware Accelerators
Yakun Sophia Shao and David Brooks
2015
Analyzing Analytics
Rajesh Bordawekar, Bob Blainey, and Ruchir Puri
2015
Customizable Computing
Yu-Ting Chen, Jason Cong, Michael Gill, Glenn Reinman, and Bingjun Xiao
2015
Die-stacking Architecture
Yuan Xie and Jishen Zhao
2015
Single-Instruction Multiple-Data Execution
Christopher J. Hughes
2015
Power-Efficient Computer Architectures: Recent Advances
Magnus Själander, Margaret Martonosi, and Stefanos Kaxiras
2014
FPGA-Accelerated Simulation of Computer Systems
Hari Angepat, Derek Chiou, Eric S. Chung, and James C. Hoe
2014
A Primer on Hardware Prefetching
Babak Falsafi and Thomas F. Wenisch
2014
v
On-Chip Photonic Interconnects: A Computer Architect’s Perspective
Christopher J. Nitta, Matthew K. Farrens, and Venkatesh Akella
2013
Optimization and Mathematical Modeling in Computer Architecture
Tony Nowatzki, Michael Ferris, Karthikeyan Sankaralingam, Cristian Estan, Nilay Vaish, and
David Wood
2013
Security Basics for Computer Architects
Ruby B. Lee
2013
The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale
Machines, Second Edition
Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle
2013
Shared-Memory Synchronization
Michael L. Scott
2013
Resilient Architecture Design for Voltage Variation
Vijay Janapa Reddi and Meeta Sharma Gupta
2013
Multithreading Architecture
Mario Nemirovsky and Dean M. Tullsen
2013
Performance Analysis and Tuning for General Purpose Graphics Processing Units
(GPGPU)
Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen-mei Hwu
2012
Automatic Parallelization: An Overview of Fundamental Compiler Techniques
Samuel P. Midkiff
2012
Phase Change Memory: From Devices to Systems
Moinuddin K. Qureshi, Sudhanva Gurumurthi, and Bipin Rajendran
2011
Multi-Core Cache Hierarchies
Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar
2011
vi
A Primer on Memory Consistency and Cache Coherence
Daniel J. Sorin, Mark D. Hill, and David A. Wood
2011
Dynamic Binary Modification: Tools, Techniques, and Applications
Kim Hazelwood
2011
Quantum Computing for Computer Architects, Second Edition
Tzvetan S. Metodi, Arvin I. Faruque, and Frederic T. Chong
2011
High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities
Dennis Abts and John Kim
2011
Processor Microarchitecture: An Implementation Perspective
Antonio González, Fernando Latorre, and Grigorios Magklis
2010
Transactional Memory, Second Edition
Tim Harris, James Larus, and Ravi Rajwar
2010
Computer Architecture Performance Evaluation Methods
Lieven Eeckhout
2010
Introduction to Reconfigurable Supercomputing
Marco Lanzagorta, Stephen Bique, and Robert Rosenberg
2009
On-Chip Networks
Natalie Enright Jerger and Li-Shiuan Peh
2009
The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It
Bruce Jacob
2009
Fault Tolerant Computer Architecture
Daniel J. Sorin
2009