详解GPU架构-适用于已熟悉计算机结构的用户.pdf

发布时间：2022-05-29 发布人：admin 分类：说明书资料大小：1.44M 资料格式：pdf 举报版权申诉

weixin_39840515-11421523-4744300845384436550.pdf-第1页.png

第1页 / 共143页

weixin_39840515-11421523-4744300845384436550.pdf-第2页.png

第2页 / 共143页

weixin_39840515-11421523-4744300845384436550.pdf-第3页.png

第3页 / 共143页

weixin_39840515-11421523-4744300845384436550.pdf-第4页.png

第4页 / 共143页

weixin_39840515-11421523-4744300845384436550.pdf-第5页.png

第5页 / 共143页

weixin_39840515-11421523-4744300845384436550.pdf-第6页.png

第6页 / 共143页

weixin_39840515-11421523-4744300845384436550.pdf-第7页.png

第7页 / 共143页

weixin_39840515-11421523-4744300845384436550.pdf-第8页.png

第8页 / 共143页

Preface

Acknowledgments

Introduction

The Landscape of Computation Accelerators

GPU Hardware Basics

A Brief History of GPUs

Book Outline

Programming Model

Execution Model

GPU Instruction Set Architectures

NVIDIA GPU Instruction Set Architectures

AMD Graphics Core Next Instruction Set Architecture

The SIMT Core: Instruction and Register Data Flow

One-Loop Approximation

SIMT Execution Masking

SIMT Deadlock and Stackless SIMT Architectures

Warp Scheduling

Two-Loop Approximation

Three-Loop Approximation

Operand Collector

Instruction Replay: Handling Structural Hazards

Research Directions on Branch Divergence

Warp Compaction

Intra-Warp Divergent Path Management

Adding MIMD Capability

Complexity-Effective Divergence Management

Research Directions on Scalarization and Affine Execution

Detection of Uniform or Affine Variables

Exploiting Uniform or Affine Variables in GPU

Research Directions on Register File Architecture

Hierarchical Register File

Drowsy State Register File

Partitioned Register File

RegLess

Memory System

First-Level Memory Structures

Scratchpad Memory and L1 Data Cache

L1 Texture Cache

Unified Texture and Data Cache

On-Chip Interconnection Network

Memory Partition Unit

L2 Cache

Atomic Operations

Memory Access Scheduler

Research Directions for GPU Memory Systems

Memory Access Scheduling and Interconnection Network Design

Caching Effectiveness

Memory Request Prioritization and Cache Bypassing

Exploiting Inter-Warp Heterogeneity

Coordinated Cache Bypassing

Adaptive Cache Management

Cache Prioritization

Virtual Memory Page Placement

Data Placement

Multi-Chip-Module GPUs

Crosscutting Research on GPU Computing Architectures

Thread Scheduling

Research on Assignment of Threadblocks to Cores

Research on Cycle-by-Cycle Scheduling Decisions

Research on Scheduling Multiple Kernels

Fine-Grain Synchronization Aware Scheduling

Alternative Ways of Expressing Parallelism

Support for Transactional Memory

Kilo TM

Warp TM and Temporal Conflict Detection

Heterogeneous Systems

Bibliography

Authors' Biographies

Blank Page

学霸图书馆

link:学霸图书馆

Synthesis Lectures on Computer Architecture Series ISSN: 1935-3235 Series Editor: Margaret Martonosi, Princeton University General-Purpose Graphics Processor Architecture Tor M. Aamodt, University of British Columbia Wilson Wai Lun Fung, Samsung Electronics Timothy G. Rogers, Purdue University Originally developed to support video games, graphics processor units (GPUs) are now increasingly used for general-purpose (non-graphics) applications ranging from machine learning to mining of cryptographic currencies. GPUs can achieve improved performance and efficiency versus central processing units (CPUs) by dedicating a larger fraction of hardware resources to computation. In addition, their general-purpose programmability makes contemporary GPUs appealing to software developers in comparison to domain- specific accelerators. This book provides an introduction to those interested in studying the architecture of GPUs that support general-purpose computing. It collects together information currently only found among a wide range of disparate sources.The authors led development of the GPGPU-Sim simulator widely used in academic research on GPU architectures. The first chapter of this book describes the basic hardware structure of GPUs and provides a brief overview of their history. Chapter 2 provides a summary of GPU programming models relevant to the rest of the book. Chapter 3 explores the architecture of GPU compute cores. Chapter 4 explores the architecture of the GPU memory system. After describing the architecture of existing systems, Chapters 3 and 4 provide an overview of related research. Chapter 5 summarizes cross-cutting research impacting both the compute core and memory system. This book should provide a valuable resource for those wishing to understand the architecture of graphics processor units (GPUs) used for acceleration of general-purpose applications and to those who want to obtain an introduction to the rapidly growing body of research exploring how to improve the architecture of these GPUs. About SYNTHESIS This volume is a printed version of a work that appears in the Synthesis Digital Library of Engineering and Computer Science. Synthesis books provide concise, original presentations of important research and development topics, published quickly, in digital and print formats. store.morganclaypool.com A A M O D T • F U N G • R O G E R S G E N E R A L - P U R P O S E G R A P H General-Purpose Graphics Processor Architecture I C S P R O C E S S O R A R C H I T E C T U R E M O R G A N & C L A Y P O O L Tor M. Aamodt Wilson Wai Lun Fung Timothy G. Rogers Synthesis Lectures on Computer Architecture

General-Purpose Graphics Processor Architectures

Synthesis Lectures on Computer Architecture Editor Margaret Martonosi, Princeton University Founding Editor Emeritus Mark D. Hill, University of Wisconsin, Madison Synthesis Lectures on Computer Architecture publishes 50- to 100-page publications on topics pertaining to the science and art of designing, analyzing, selecting and interconnecting hardware components to create computers that meet functional, performance and cost goals. The scope will largely follow the purview of premier computer architecture conferences, such as ISCA, HPCA, MICRO, and ASPLOS. General-Purpose Graphics Processor Architectures Tor M. Aamodt, Wilson Wai Lun Fung, and Timothy G. Rogers 2018 Compiling Algorithms for Heterogenous Systems Steven Bell, Jing Pu, James Hegarty, and Mark Horowitz 2018 Architectural and Operating System Support for Virtual Memory Abhishek Bhattacharjee and Daniel Lustig 2017 Deep Learning for Computer Architects Brandon Reagen, Robert Adolf, Paul Whatmough, Gu-Yeon Wei, and David Brooks 2017 On-Chip Networks, Second Edition Natalie Enright Jerger, Tushar Krishna, and Li-Shiuan Peh 2017 Space-Time Computing with Temporal Neural Networks James E. Smith 2017

iv Hardware and Software Support for Virtualization Edouard Bugnion, Jason Nieh, and Dan Tsafrir 2017 Datacenter Design and Management: A Computer Architect’s Perspective Benjamin C. Lee 2016 A Primer on Compression in the Memory Hierarchy Somayeh Sardashti, Angelos Arelakis, Per Stenström, and David A. Wood 2015 Research Infrastructures for Hardware Accelerators Yakun Sophia Shao and David Brooks 2015 Analyzing Analytics Rajesh Bordawekar, Bob Blainey, and Ruchir Puri 2015 Customizable Computing Yu-Ting Chen, Jason Cong, Michael Gill, Glenn Reinman, and Bingjun Xiao 2015 Die-stacking Architecture Yuan Xie and Jishen Zhao 2015 Single-Instruction Multiple-Data Execution Christopher J. Hughes 2015 Power-Eﬃcient Computer Architectures: Recent Advances Magnus Själander, Margaret Martonosi, and Stefanos Kaxiras 2014 FPGA-Accelerated Simulation of Computer Systems Hari Angepat, Derek Chiou, Eric S. Chung, and James C. Hoe 2014 A Primer on Hardware Prefetching Babak Falsaﬁ and Thomas F. Wenisch 2014

v On-Chip Photonic Interconnects: A Computer Architect’s Perspective Christopher J. Nitta, Matthew K. Farrens, and Venkatesh Akella 2013 Optimization and Mathematical Modeling in Computer Architecture Tony Nowatzki, Michael Ferris, Karthikeyan Sankaralingam, Cristian Estan, Nilay Vaish, and David Wood 2013 Security Basics for Computer Architects Ruby B. Lee 2013 The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition Luiz André Barroso, Jimmy Clidaras, and Urs Hölzle 2013 Shared-Memory Synchronization Michael L. Scott 2013 Resilient Architecture Design for Voltage Variation Vijay Janapa Reddi and Meeta Sharma Gupta 2013 Multithreading Architecture Mario Nemirovsky and Dean M. Tullsen 2013 Performance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU) Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, and Wen-mei Hwu 2012 Automatic Parallelization: An Overview of Fundamental Compiler Techniques Samuel P. Midkiﬀ 2012 Phase Change Memory: From Devices to Systems Moinuddin K. Qureshi, Sudhanva Gurumurthi, and Bipin Rajendran 2011 Multi-Core Cache Hierarchies Rajeev Balasubramonian, Norman P. Jouppi, and Naveen Muralimanohar 2011

vi A Primer on Memory Consistency and Cache Coherence Daniel J. Sorin, Mark D. Hill, and David A. Wood 2011 Dynamic Binary Modiﬁcation: Tools, Techniques, and Applications Kim Hazelwood 2011 Quantum Computing for Computer Architects, Second Edition Tzvetan S. Metodi, Arvin I. Faruque, and Frederic T. Chong 2011 High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities Dennis Abts and John Kim 2011 Processor Microarchitecture: An Implementation Perspective Antonio González, Fernando Latorre, and Grigorios Magklis 2010 Transactional Memory, Second Edition Tim Harris, James Larus, and Ravi Rajwar 2010 Computer Architecture Performance Evaluation Methods Lieven Eeckhout 2010 Introduction to Reconﬁgurable Supercomputing Marco Lanzagorta, Stephen Bique, and Robert Rosenberg 2009 On-Chip Networks Natalie Enright Jerger and Li-Shiuan Peh 2009 The Memory System: You Can’t Avoid It, You Can’t Ignore It, You Can’t Fake It Bruce Jacob 2009 Fault Tolerant Computer Architecture Daniel J. Sorin 2009

分享到：

赞收藏

资料库

详解GPU架构-适用于已熟悉计算机结构的用户.pdf

相关推荐

开发技术

热门标签

最新资料