logo资料库

Heterogeneous Computing with OpenCL 2011.pdf

第1页 / 共295页
第2页 / 共295页
第3页 / 共295页
第4页 / 共295页
第5页 / 共295页
第6页 / 共295页
第7页 / 共295页
第8页 / 共295页
资料共295页,剩余部分请下载后查看
Front Cover
HeterogeneousComputing with OpenCL
Copyright
Contents
Foreword
Preface
Our Heterogeneous World
OpenCL
This Text
Acknowledgments
About the Authors
Chapter 1: Introduction to Parallel Programming
Introduction
OpenCL
The Goals of This Book
Thinking Parallel
Concurrency and Parallel Programming Models
Structure
Reference
Further Reading and Relevant Websites
Chapter 2: Introduction to OpenCL
Introduction
Platform and Devices
The Execution Environment
Memory Model
Writing Kernels
Full Source Code Example for Vector Addition
Summary
Reference
Chapter 3: OpenCL Device Architectures
Introduction
Hardware trade-offs
The architectural design space
Summary
References
Chapter 4: Basic OpenCL Examples
Introduction
Example Applications
Compiling OpenCL Host Applications
Summary
Chapter 5: Understanding OpenCL's Concurrency and Execution Model
Introduction
Kernels, Work-Items, Workgroups, and the Execution Domain
OpenCL Synchronization: Kernels, Fences, and Barriers
Queuing and Global Synchronization
The Host-Side Memory Model
The Device-Side Memory Model
Summary
Chapter 6: Dissecting a CPU/GPU OpenCL Implementation
Introduction
OpenCL on an AMD Phenom II X6
OpenCL on the AMD Radeon HD6970 GPU
Memory Performance Considerations in OpenCL
Summary
References
Chapter 7: OpenCL Case Study
Introduction
Convolution Kernel
Conclusions
Code Listings
Reference
Chapter 8: OpenCL Case Study
Introduction
Getting Video Frames
Processing a Video in OpenCL
Processing Multiple Videos with Multiple Special Effects
Display to Screen of Final Output
Summary
Chapter 9: OpenCL Case Study
Introduction
Choosing the Number of Workgroups
Choosing the Optimal Workgroup Size
Optimizing Global Memory Data Access Patterns
Using Atomics to Perform Local Histogram
Optimizing Local Memory Access
Local Histogram Reduction
The Global Reduction
Full Kernel Code
Performance and Summary
Chapter 10: OpenCL Case Study
Introduction
Overview of the Computation
GPU Implementation
CPU Implementation
Load Balancing
Performance and Summary
Kernel for Uniform Grid Creation
Kernels for Simulation
Chapter 11: OpenCL Extensions
Introduction
Overview of Extension Mechanism
Device Fission
Double Precision
References
Chapter 12: OpenCL Profiling and Debugging
Introduction
Profiling with Events
AMD Accelerated Parallel Processing Profiler
AMD Accelerated Parallel Processing KernelAnalyzer
Walking through the AMD APP Profiler
Debugging OpenCL Applications
Overview of gDEBugger
AMD Printf Extension
Conclusion
Chapter 13: WebCL
Introduction
Designing the Framework
WebCL Pilot Implementation
WebCL Hands-on
Web Photo Editor
Discussion
Summary
Reference
Further Reading and Relevant Websites
Index
Heterogeneous Computing with OpenCL
intentionally left blank
Heterogeneous Computing with OpenCL Benedict Gaster Lee Howes David R. Kaeli Perhaad Mistry Dana Schaa
Acquiring Editor: Todd Green Development Editor: Robyn Day Project Manager: Andre´ Cuello Designer: Joanne Blank Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA # 2012 Advanced Micro Devices, Inc. Published by Elsevier Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or any information storage and retrieval system, without permission in writing from the publisher. Details on how to seek permission, further information about the Publisher’s permissions policies and our arrange- ments with organizations such as the Copyright Clearance Center and the Copyright Licensing Agency, can be found at our website: www.elsevier.com/permissions. This book and the individual contributions contained in it are protected under copyright by the Publisher (other than as may be noted herein). Notices Knowledge and best practice in this field are constantly changing. As new research and experience broaden our understanding, changes in research methods or professional practices may become necessary. Practitioners and researchers must always rely on their own experience and knowledge in evaluating and using any information or methods described herein. In using such information or methods they should be mindful of their own safety and the safety of others, including parties for whom they have a professional responsibility. To the fullest extent of the law, neither the Publisher nor the authors, contributors, or editors, assume any liability for any injury and/or damage to persons or property as a matter of product liability, negligence or otherwise, or from any use or operation of any methods, products, instructions, or ideas contained in the material herein. Library of Congress Cataloging-in-Publication Data Heterogeneous computing with OpenCL / Benedict Gaster ... [et al.]. p. cm. ISBN 978-0-12-387766-6 1. Parallel programming (Computer science) 2. OpenCL (Computer program language) I. Gaster, Benedict. QA76.642.H48 2012 005.2’752–dc23 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library. 2011020169 ISBN: 978-0-12-387766-6 For information on all MK publications visit our website at www.mkp.com Printed in the United States of America 12 13 14 15 10 9 8 7 6 5 4 3 2 1
Contents Foreword ............................................................................................................... vii Preface .................................................................................................................... xi Acknowledgments ............................................................................................... xiii About the Authors ................................................................................................. xv CHAPTER 1 CHAPTER 2 CHAPTER 3 CHAPTER 4 CHAPTER 5 CHAPTER 6 CHAPTER 7 CHAPTER 8 CHAPTER 9 Introduction to Parallel Programming.................................. 1 Introduction to OpenCL..................................................... 15 OpenCL Device Architectures............................................ 41 Basic OpenCL Examples ................................................... 67 Understanding OpenCL’s Concurrency and Execution Model.............................................................. 87 Dissecting a CPU/GPU OpenCL Implementation ................ 123 OpenCL Case Study: Convolution..................................... 151 OpenCL Case Study: Video Processing ............................ 173 OpenCL Case Study: Histogram ....................................... 185 CHAPTER 10 OpenCL Case Study: Mixed Particle Simulation................ 197 CHAPTER 11 OpenCL Extensions......................................................... 211 CHAPTER 12 OpenCL Profiling and Debugging ..................................... 235 CHAPTER 13 WebCL .......................................................................... 255 This special section contributed by Jari Nikara, Tomi Aarnio, Eero Aho, and Janne Pietia¨inen v
intentionally left blank
Foreword For more than two decades, the computer industry has been inspired and motivated by the observation made by Gordon Moore (A.K.A “Moore’s law”) that the density of transistors on die was doubling every 18 months. This observation created the an- ticipation that the performance a certain application achieves on one generation of processors will be doubled within two years when the next generation of processors will be announced. Constant improvement in manufacturing and processor technol- ogies was the main drive of this trend since it allowed any new processor generation to shrink all the transistor’s dimensions within the “golden factor”, 0.3 (ideal shrink) and to reduce the power supply accordingly. Thus, any new processor generation could double the density of transistors, to gain 50% speed improvement (frequency) while consuming the same power and keeping the same power density. When better performance was required, computer architects were focused on using the extra tran- sistors for pushing the frequency beyond what the shrink provided, and for adding new architectural features that mainly aim at gaining performance improvement for existing and new applications. During the mid 2000s, the transistor size became so small that the “physics of small devices” started to govern the characterization of the entire chip. Thus fre- quency improvement and density increase could not be achieved anymore without a significant increase of power consumption and of power density. A recent report by the International Technology Roadmap for Semiconductors (ITRS) supports this observation and indicates that this trend will continue for the foreseeable future and it will most likely become the most significant factor affecting technology scaling and the future of computer based system. To cope with the expectation of doubling the performance every known period of time (not 2 years anymore), two major changes happened (1) instead of increasing the frequency, modern processors increase the number of cores on each die. This trend forces the software to be changed as well. Since we cannot expect the hardware to achieve significantly better performance for a given application anymore, we need to develop new implementations for the same application that will take advantage of the multicore architecture, and (2) thermal and power become first class citizens with any design of future architecture. These trends encourage the community to start looking at heterogeneous solutions: systems which are assembled from different sub- systems, each of them optimized to achieve different optimization points or to ad- dress different workloads. For example, many systems combine “traditional” CPU architecture with special purpose FPGAs or Graphics Processors (GPUs). Such an integration can be done at different levels; e.g., at the system level, at the board level and recently at the core level. Developing software for homogeneous parallel and distributed systems is consid- ered to be a non-trivial task, even though such development uses well-known para- digms and well established programming languages, developing methods, algorithms, debugging tools, etc. Developing software to support general-purpose vii
分享到:
收藏