logo资料库

详解GPU架构-适用于已熟悉计算机结构的用户.pdf

第1页 / 共143页
第2页 / 共143页
第3页 / 共143页
第4页 / 共143页
第5页 / 共143页
第6页 / 共143页
第7页 / 共143页
第8页 / 共143页
资料共143页,剩余部分请下载后查看
Preface
Acknowledgments
Introduction
The Landscape of Computation Accelerators
GPU Hardware Basics
A Brief History of GPUs
Book Outline
Programming Model
Execution Model
GPU Instruction Set Architectures
NVIDIA GPU Instruction Set Architectures
AMD Graphics Core Next Instruction Set Architecture
The SIMT Core: Instruction and Register Data Flow
One-Loop Approximation
SIMT Execution Masking
SIMT Deadlock and Stackless SIMT Architectures
Warp Scheduling
Two-Loop Approximation
Three-Loop Approximation
Operand Collector
Instruction Replay: Handling Structural Hazards
Research Directions on Branch Divergence
Warp Compaction
Intra-Warp Divergent Path Management
Adding MIMD Capability
Complexity-Effective Divergence Management
Research Directions on Scalarization and Affine Execution
Detection of Uniform or Affine Variables
Exploiting Uniform or Affine Variables in GPU
Research Directions on Register File Architecture
Hierarchical Register File
Drowsy State Register File
Register File Virtualization
Partitioned Register File
RegLess
Memory System
First-Level Memory Structures
Scratchpad Memory and L1 Data Cache
L1 Texture Cache
Unified Texture and Data Cache
On-Chip Interconnection Network
Memory Partition Unit
L2 Cache
Atomic Operations
Memory Access Scheduler
Research Directions for GPU Memory Systems
Memory Access Scheduling and Interconnection Network Design
Caching Effectiveness
Memory Request Prioritization and Cache Bypassing
Exploiting Inter-Warp Heterogeneity
Coordinated Cache Bypassing
Adaptive Cache Management
Cache Prioritization
Virtual Memory Page Placement
Data Placement
Multi-Chip-Module GPUs
Crosscutting Research on GPU Computing Architectures
Thread Scheduling
Research on Assignment of Threadblocks to Cores
Research on Cycle-by-Cycle Scheduling Decisions
Research on Scheduling Multiple Kernels
Fine-Grain Synchronization Aware Scheduling
Alternative Ways of Expressing Parallelism
Support for Transactional Memory
Kilo TM
Warp TM and Temporal Conflict Detection
Heterogeneous Systems
Bibliography
Authors' Biographies
Blank Page
学霸图书馆
link:学霸图书馆
Synthesis Lectures on Computer Architecture Series ISSN: 1935-3235 Series Editor: Margaret Martonosi, Princeton University General-Purpose Graphics Processor Architecture Tor M. Aamodt, University of British Columbia Wilson Wai Lun Fung, Samsung Electronics Timothy G. Rogers, Purdue University Originally developed to support video games, graphics processor units (GPUs) are now increasingly used for general-purpose (non-graphics) applications ranging from machine learning to mining of cryptographic currencies. GPUs can achieve improved performance and efficiency versus central processing units (CPUs) by dedicating a larger fraction of hardware resources to computation. In addition, their general-purpose programmability makes contemporary GPUs appealing to software developers in comparison to domain- specific accelerators. This book provides an introduction to those interested in studying the architecture of GPUs that support general-purpose computing. It collects together information currently only found among a wide range of disparate sources.The authors led development of the GPGPU-Sim simulator widely used in academic research on GPU architectures. The first chapter of this book describes the basic hardware structure of GPUs and provides a brief overview of their history. Chapter 2 provides a summary of GPU programming models relevant to the rest of the book. Chapter 3 explores the architecture of GPU compute cores. Chapter 4 explores the architecture of the GPU memory system. After describing the architecture of existing systems, Chapters 3 and 4 provide an overview of related research. Chapter 5 summarizes cross-cutting research impacting both the compute core and memory system. This book should provide a valuable resource for those wishing to understand the architecture of graphics processor units (GPUs) used for acceleration of general-purpose applications and to those who want to obtain an introduction to the rapidly growing body of research exploring how to improve the architecture of these GPUs. About SYNTHESIS This volume is a printed version of a work that appears in the Synthesis Digital Library of Engineering and Computer Science. Synthesis books provide concise, original presentations of important research and development topics, published quickly, in digital and print formats. store.morganclaypool.com A A M O D T • F U N G • R O G E R S G E N E R A L - P U R P O S E G R A P H General-Purpose Graphics Processor Architecture I C S P R O C E S S O R A R C H I T E C T U R E M O R G A N & C L A Y P O O L Tor M. Aamodt Wilson Wai Lun Fung Timothy G. Rogers Synthesis Lectures on Computer Architecture
General-Purpose Graphics Processor Architectures
Synthesis Lectures on Computer Architecture Editor Margaret Martonosi, Princeton University Founding Editor Emeritus Mark D. Hill, University of Wisconsin, Madison Synthesis Lectures on Computer Architecture publishes 50- to 100-page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. The scope will largely follow the purview of premier computer architecture conferences, such as ISCA, HPCA, MICRO, and ASPLOS. General-Purpose Graphics Processor Architectures Tor M. Aamodt, Wilson Wai Lun Fung, and Timothy G. Rogers 2018 Compiling Algorithms for Heterogenous Systems Steven Bell, Jing Pu, James Hegarty, and Mark Horowitz 2018 Architectural and Operating System Support for Virtual Memory Abhishek Bhattacharjee and Daniel Lustig 2017 Deep Learning for Computer Architects Brandon Reagen, Robert Adolf, Paul Whatmough, Gu-Yeon Wei, and David Brooks 2017 On-Chip Networks, Second Edition Natalie Enright Jerger, Tushar Krishna, and Li-Shiuan Peh 2017 Space-Time Computing with Temporal Neural Networks James E. Smith 2017
iv Hardware and Software Support for Virtualization Edouard Bugnion, Jason Nieh, and Dan Tsafrir 2017 Datacenter Design and Management: A Computer Architect’s Perspective Benjamin C. Lee 2016 A Primer on Compression in the Memory Hierarchy Somayeh Sardashti, Angelos Arelakis, Per Stenström, and David A. Wood 2015 Research Infrastructures for Hardware Accelerators Yakun Sophia Shao and David Brooks 2015 Analyzing Analytics Rajesh Bordawekar, Bob Blainey, and Ruchir Puri 2015 Customizable Computing Yu-Ting Chen, Jason Cong, Michael Gill, Glenn Reinman, and Bingjun Xiao 2015 Die-stacking Architecture Yuan Xie and Jishen Zhao 2015 Single-Instruction Multiple-Data Execution Christopher J. Hughes 2015 Power-Efficient Computer Architectures: Recent Advances Magnus Själander, Margaret Martonosi, and Stefanos Kaxiras 2014 FPGA-Accelerated Simulation of Computer Systems Hari Angepat, Derek Chiou, Eric S. Chung, and James C. Hoe 2014 A Primer on Hardware Prefetching Babak Falsafi and Thomas F. Wenisch 2014
v On-Chip Photonic Interconnects: A Computer Architect’s Perspective Christopher J. Nitta, Matthew K. Farrens, and Venkatesh Akella 2013 Optimization and Mathematical Modeling in Computer Architecture Tony Nowatzki, Michael Ferris, Karthikeyan Sankaralingam, Cristian Estan, Nilay Vaish, and David Wood 2013 Security Basics for Computer Architects Ruby B. Lee 2013 The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle 2013 Shared-Memory Synchronization Michael L. Scott 2013 Resilient Architecture Design for Voltage Variation Vijay Janapa Reddi and Meeta Sharma Gupta 2013 Multithreading Architecture Mario Nemirovsky and Dean M. Tullsen 2013 Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU) Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen-mei Hwu 2012 Automatic Parallelization: An Overview of Fundamental Compiler Techniques Samuel P. Midkiff 2012 Phase Change Memory: From Devices to Systems Moinuddin K. Qureshi, Sudhanva Gurumurthi, and Bipin Rajendran 2011 Multi-Core Cache Hierarchies Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar 2011
vi A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Mark D. Hill, and David A. Wood 2011 Dynamic Binary Modification: Tools, Techniques, and Applications Kim Hazelwood 2011 Quantum Computing for Computer Architects, Second Edition Tzvetan S. Metodi, Arvin I. Faruque, and Frederic T. Chong 2011 High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities Dennis Abts and John Kim 2011 Processor Microarchitecture: An Implementation Perspective Antonio González, Fernando Latorre, and Grigorios Magklis 2010 Transactional Memory, Second Edition Tim Harris, James Larus, and Ravi Rajwar 2010 Computer Architecture Performance Evaluation Methods Lieven Eeckhout 2010 Introduction to Reconfigurable Supercomputing Marco Lanzagorta, Stephen Bique, and Robert Rosenberg 2009 On-Chip Networks Natalie Enright Jerger and Li-Shiuan Peh 2009 The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It Bruce Jacob 2009 Fault Tolerant Computer Architecture Daniel J. Sorin 2009
分享到:
收藏