logo资料库

CUDA Programming A Developer_'s Guide to Parallel Computing with....pdf

第1页 / 共591页
第2页 / 共591页
第3页 / 共591页
第4页 / 共591页
第5页 / 共591页
第6页 / 共591页
第7页 / 共591页
第8页 / 共591页
资料共591页,剩余部分请下载后查看
Front Cover
CUDA Programming: A Developer’s Guide to Parallel Computing with GPUs
Copyright
Contents
Preface
Chapter 1 - A Short History of Supercomputing
INTRODUCTION
VON NEUMANN ARCHITECTURE
CRAY
CONNECTION MACHINE
CELL PROCESSOR
MULTINODE COMPUTING
THE EARLY DAYS OF GPGPU CODING
THE DEATH OF THE SINGLE-CORE SOLUTION
NVIDIA AND CUDA
GPU HARDWARE
ALTERNATIVES TO CUDA
CONCLUSION
Chapter 2 - Understanding Parallelism with GPUs
INTRODUCTION
TRADITIONAL SERIAL CODE
SERIAL/PARALLEL PROBLEMS
CONCURRENCY
TYPES OF PARALLELISM
FLYNN’S TAXONOMY
SOME COMMON PARALLEL PATTERNS
CONCLUSION
Chapter 3 - CUDA Hardware Overview
PC ARCHITECTURE
GPU HARDWARE
CPUS AND GPUS
COMPUTE LEVELS
Chapter 4 - Setting Up CUDA
INTRODUCTION
INSTALLING THE SDK UNDER WINDOWS
VISUAL STUDIO
LINUX
MAC
INSTALLING A DEBUGGER
COMPILATION MODEL
ERROR HANDLING
CONCLUSION
Chapter 5 - Grids, Blocks, and Threads
WHAT IT ALL MEANS
THREADS
BLOCKS
GRIDS
WARPS
BLOCK SCHEDULING
A PRACTICAL EXAMPLE—HISTOGRAMS
CONCLUSION
Chapter 6 - Memory Handling with CUDA
INTRODUCTION
CACHES
REGISTER USAGE
SHARED MEMORY
CONSTANT MEMORY
GLOBAL MEMORY
TEXTURE MEMORY
CONCLUSION
Chapter 7 - Using CUDA in Practice
INTRODUCTION
SERIAL AND PARALLEL CODE
PROCESSING DATASETS
PROFILING
AN EXAMPLE USING AES
CONCLUSION
References
Chapter 8 - Multi-CPU and Multi-GPU Solutions
INTRODUCTION
LOCALITY
MULTI-CPU SYSTEMS
MULTI-GPU SYSTEMS
ALGORITHMS ON MULTIPLE GPUS
WHICH GPU?
SINGLE-NODE SYSTEMS
STREAMS
MULTIPLE-NODE SYSTEMS
CONCLUSION
Chapter 9 - Optimizing Your Application
STRATEGY 1: PARALLEL/SERIAL GPU/CPU PROBLEM BREAKDOWN
STRATEGY 2: MEMORY CONSIDERATIONS
STRATEGY 3: TRANSFERS
STRATEGY 4: THREAD USAGE, CALCULATIONS, AND DIVERGENCE
STRATEGY 5: ALGORITHMS
STRATEGY 6: RESOURCE CONTENTIONS
STRATEGY 7: SELF-TUNING APPLICATIONS
CONCLUSION
Chapter 10 - Libraries and SDK
INTRODUCTION
LIBRARIES
CUDA COMPUTING SDK
DIRECTIVE-BASED PROGRAMMING
WRITING YOUR OWN KERNELS
CONCLUSION
Chapter 11 - Designing GPU-Based Systems
INTRODUCTION
CPU PROCESSOR
GPU DEVICE
PCI-E BUS
GEFORCE CARDS
CPU MEMORY
AIR COOLING
LIQUID COOLING
DESKTOP CASES AND MOTHERBOARDS
MASS STORAGE
POWER CONSIDERATIONS
OPERATING SYSTEMS
CONCLUSION
Chapter 12 - Common Problems, Causes, and Solutions
INTRODUCTION
ERRORS WITH CUDA DIRECTIVES
PARALLEL PROGRAMMING ISSUES
ALGORITHMIC ISSUES
FINDING AND AVOIDING ERRORS
DEVELOPING FOR FUTURE GPUS
FURTHER RESOURCES
CONCLUSION
References
Index
CUDA Programming A Developer’s Guide to Parallel Computing with GPUs
This page intentionally left blank
CUDA Programming A Developer’s Guide to Parallel Computing with GPUs Shane Cook AMSTERDAM BOSTON HEIDELBERG LONDON NEW YORK OXFORD PARIS SAN DIEGO SAN FRANCISCO SINGAPORE SYDNEY TOKYO Morgan Kaufmann is an Imprint of Elsevier
Acquiring Editor: Todd Green Development Editor: Robyn Day Project Manager: Andre Cuello Designer: Kristen Davis Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA Ó 2013 Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrangements with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices, may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of products liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Application submitted British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. ISBN: 978-0-12-415933-4 For information on all MK publications visit our website at http://store.elsevier.com Printed in the United States of America 13 14 10 9 8 7 6 5 4 3 2 1
Contents Preface ................................................................................................................................................ xiii CHAPTER 1 A Short History of Supercomputing................................................1 Introduction ................................................................................................................ 1 Von Neumann Architecture........................................................................................ 2 Cray............................................................................................................................. 5 Connection Machine................................................................................................... 6 Cell Processor............................................................................................................. 7 Multinode Computing ................................................................................................ 9 The Early Days of GPGPU Coding ......................................................................... 11 The Death of the Single-Core Solution ................................................................... 12 NVIDIA and CUDA................................................................................................. 13 GPU Hardware ......................................................................................................... 15 Alternatives to CUDA .............................................................................................. 16 OpenCL ............................................................................................................... 16 DirectCompute .................................................................................................... 17 CPU alternatives.................................................................................................. 17 Directives and libraries ....................................................................................... 18 Conclusion ................................................................................................................ 19 CHAPTER 2 Understanding Parallelism with GPUs ......................................... 21 Introduction .............................................................................................................. 21 Traditional Serial Code ............................................................................................ 21 Serial/Parallel Problems ........................................................................................... 23 Concurrency.............................................................................................................. 24 Locality................................................................................................................ 25 Types of Parallelism ................................................................................................. 27 Task-based parallelism ........................................................................................ 27 Data-based parallelism ........................................................................................ 28 Flynn’s Taxonomy .................................................................................................... 30 Some Common Parallel Patterns.............................................................................. 31 Loop-based patterns ............................................................................................ 31 Fork/join pattern.................................................................................................. 33 Tiling/grids .......................................................................................................... 35 Divide and conquer ............................................................................................. 35 Conclusion ................................................................................................................ 36 CHAPTER 3 CUDA Hardware Overview........................................................... 37 PC Architecture ........................................................................................................ 37 GPU Hardware ......................................................................................................... 42 v
vi Contents CPUs and GPUs ....................................................................................................... 46 Compute Levels........................................................................................................ 46 Compute 1.0 ........................................................................................................ 47 Compute 1.1 ........................................................................................................ 47 Compute 1.2 ........................................................................................................ 49 Compute 1.3 ........................................................................................................ 49 Compute 2.0 ........................................................................................................ 49 Compute 2.1 ........................................................................................................ 51 CHAPTER 4 Setting Up CUDA ........................................................................ 53 Introduction .............................................................................................................. 53 Installing the SDK under Windows ......................................................................... 53 Visual Studio ............................................................................................................ 54 Projects ................................................................................................................ 55 64-bit users .......................................................................................................... 55 Creating projects ................................................................................................. 57 Linux......................................................................................................................... 58 Kernel base driver installation (CentOS, Ubuntu 10.4) ..................................... 59 Mac ........................................................................................................................... 62 Installing a Debugger ............................................................................................... 62 Compilation Model................................................................................................... 66 Error Handling.......................................................................................................... 67 Conclusion ................................................................................................................ 68 CHAPTER 5 Grids, Blocks, and Threads......................................................... 69 What it all Means ..................................................................................................... 69 Threads ..................................................................................................................... 69 Problem decomposition....................................................................................... 69 How CPUs and GPUs are different .................................................................... 71 Task execution model.......................................................................................... 72 Threading on GPUs............................................................................................. 73 A peek at hardware ............................................................................................. 74 CUDA kernels ..................................................................................................... 77 Blocks ....................................................................................................................... 78 Block arrangement .............................................................................................. 80 Grids ......................................................................................................................... 83 Stride and offset .................................................................................................. 84 X and Y thread indexes........................................................................................ 85 Warps ........................................................................................................................ 91 Branching ............................................................................................................ 92 GPU utilization.................................................................................................... 93 Block Scheduling ..................................................................................................... 95
Contents vii A Practical ExampledHistograms .......................................................................... 97 Conclusion .............................................................................................................. 103 Questions ........................................................................................................... 104 Answers ............................................................................................................. 104 CHAPTER 6 Memory Handling with CUDA .................................................... 107 Introduction ............................................................................................................ 107 Caches..................................................................................................................... 108 Types of data storage ........................................................................................ 110 Register Usage........................................................................................................ 111 Shared Memory ...................................................................................................... 120 Sorting using shared memory ........................................................................... 121 Radix sort .......................................................................................................... 125 Merging lists...................................................................................................... 131 Parallel merging ................................................................................................ 137 Parallel reduction............................................................................................... 140 A hybrid approach............................................................................................. 144 Shared memory on different GPUs................................................................... 148 Shared memory summary ................................................................................. 148 Questions on shared memory............................................................................ 149 Answers for shared memory ............................................................................. 149 Constant Memory ................................................................................................... 150 Constant memory caching................................................................................. 150 Constant memory broadcast.............................................................................. 152 Constant memory updates at runtime ............................................................... 162 Constant question .............................................................................................. 166 Constant answer ................................................................................................ 167 Global Memory ...................................................................................................... 167 Score boarding................................................................................................... 176 Global memory sorting ..................................................................................... 176 Sample sort........................................................................................................ 179 Questions on global memory ............................................................................ 198 Answers on global memory .............................................................................. 199 Texture Memory ..................................................................................................... 200 Texture caching ................................................................................................. 200 Hardware manipulation of memory fetches ..................................................... 200 Restrictions using textures ................................................................................ 201 Conclusion .............................................................................................................. 202 CHAPTER 7 Using CUDA in Practice............................................................ 203 Introduction ............................................................................................................ 203 Serial and Parallel Code......................................................................................... 203 Design goals of CPUs and GPUs ..................................................................... 203
分享到:
收藏