logo资料库

OpenCL Programming Guide (英文版).pdf

第1页 / 共648页
第2页 / 共648页
第3页 / 共648页
第4页 / 共648页
第5页 / 共648页
第6页 / 共648页
第7页 / 共648页
第8页 / 共648页
资料共648页,剩余部分请下载后查看
Contents
Figures
Tables
Listings
Foreword
Preface
Acknowledgments.
About the Authors
Part I: The OpenCL 1.1 Language and API
1. An Introduction to OpenCL
What Is OpenCL, or . . . Why You Need This Book
Our Many-Core Future: Heterogeneous Platforms
Software in a Many-Core World
Conceptual Foundations of OpenCL
OpenCL and Graphics
The Contents of OpenCL
The Embedded Profile
Learning OpenCL
2. HelloWorld: An OpenCL Example
Building the Examples.
HelloWorld Example
Checking for Errors in OpenCL
3. Platforms, Contexts, and Devices
OpenCL Platforms
OpenCL Devices
OpenCL Contexts
4. Programming with OpenCL C
Writing a Data-Parallel Kernel Using OpenCL C
Scalar Data Types
Vector Data Types
Other Data Types
Derived Types
Implicit Type Conversions
Explicit Casts
Explicit Conversions
Reinterpreting Data as Another Type
Vector Operators
Qualifiers
Keywords
Preprocessor Directives and Macros
Restrictions
5. OpenCL C Built-In Functions
Work-Item Functions
Math Functions
Integer Functions
Common Functions
Geometric Functions
Relational Functions
Vector Data Load and Store Functions
Synchronization Functions
Async Copy and Prefetch Functions
Atomic Functions
Miscellaneous Vector Functions
Image Read and Write Functions
6. Programs and Kernels
Program and Kernel Object Overview
Program Objects
Kernel Objects
7. Buffers and Sub-Buffers
Memory Objects, Buffers, and Sub-Buffers Overview
Creating Buffers and Sub-Buffers
Querying Buffers and Sub-Buffers
Reading, Writing, and Copying Buffers and Sub-Buffers
Mapping Buffers and Sub-Buffers
8. Images and Samplers
Image and Sampler Object Overview
Creating Image Objects
Creating Sampler Objects
OpenCL C Functions for Working with Images
Transferring Image Objects
9. Events
Commands, Queues, and Events Overview
Events and Command-Queues
Event Objects
Generating Events on the Host
Events Impacting Execution on the Host
Using Events for Profiling
Events Inside Kernels
Events from Outside OpenCL
10. Interoperability with OpenGL
OpenCL/OpenGL Sharing Overview
Querying for the OpenGL Sharing Extension
Initializing an OpenCL Context for OpenGL Interoperability
Creating OpenCL Buffers from OpenGL Buffers
Creating OpenCL Image Objects from OpenGL Textures
Querying Information about OpenGL Objects
Synchronization between OpenGL and OpenCL
11. Interoperability with Direct3D
Direct3D/OpenCL Sharing Overview
Initializing an OpenCL Context for Direct3D Interoperability
Creating OpenCL Memory Objects from Direct3D Buffers and Textures
Acquiring and Releasing Direct3D Objects in OpenCL
Processing a Direct3D Texture in OpenCL
Processing D3D Vertex Data in OpenCL
12. C++ Wrapper API
C++ Wrapper API Overview
C++ Wrapper API Exceptions
Vector Add Example Using the C++ Wrapper API
13. OpenCL Embedded Profile
OpenCL Profile Overview
64-Bit Integers
Images
Built-In Atomic Functions
Mandated Minimum Single-Precision Floating-Point Capabilities
Determining the Profile Supported by a Device in an OpenCL C Program
Part II: OpenCL 1.1 Case Studies
14. Image Histogram
Computing an Image Histogram
Parallelizing the Image Histogram
Additional Optimizations to the Parallel Image Histogram
Computing Histograms with Half-Float or Float Values for Each Channel
15. Sobel Edge Detection Filter
What Is a Sobel Edge Detection Filter?
Implementing the Sobel Filter as an OpenCL Kernel
16. Parallelizing Dijkstra’s Single-Source Shortest-Path Graph Algorithm
Graph Data Structures
Kernels
Leveraging Multiple Compute Devices
17. Cloth Simulation in the Bullet Physics SDK
An Introduction to Cloth Simulation
Simulating the Soft Body
Executing the Simulation on the CPU
Changes Necessary for Basic GPU Execution
Two-Layered Batching
Optimizing for SIMD Computation and Local Memory
Adding OpenGL Interoperation
18. Simulating the Ocean with Fast Fourier Transform
An Overview of the Ocean Application
Phillips Spectrum Generation
An OpenCL Discrete Fourier Transform
A Closer Look at the FFT Kernel
A Closer Look at the Transpose Kernel
19. Optical Flow
Optical Flow Problem Overview
Sub-Pixel Accuracy with Hardware Linear Interpolation
Application of the Texture Cache
Using Local Memory
Early Exit and Hardware Scheduling
Efficient Visualization with OpenGL Interop
Performance
20. Using OpenCL with PyOpenCL
Introducing PyOpenCL
Running the PyImageFilter2D Example
PyImageFilter2D Code
Context and Command-Queue Creation
Loading to an Image Object
Creating and Building a Program
Setting Kernel Arguments and Executing a Kernel
Reading the Results
21. Matrix Multiplication with OpenCL
The Basic Matrix Multiplication Algorithm
A Direct Translation into OpenCL
Increasing the Amount of Work per Kernel
Optimizing Memory Movement: Local Memory
Performance Results and Optimizing the Original CPU Code
22. Sparse Matrix-Vector Multiplication
Sparse Matrix-Vector Multiplication (SpMV) Algorithm
Description of This Implementation
Tiled and Packetized Sparse Matrix Representation
Header Structure
Tiled and Packetized Sparse Matrix Design Considerations
Optional Team Information
Tested Hardware Devices and Results
Additional Areas of Optimization
A. Summary of OpenCL 1.1
The OpenCL Platform Layer
Contexts
Querying Platform Information and Devices
The OpenCL Runtime
Command-Queues
Buffer Objects
Create Buffer Objects
Read, Write, and Copy Buffer Objects
Map Buffer Objects
Manage Buffer Objects
Query Buffer Objects
Program Objects
Create Program Objects
Build Program Executable
Build Options
Query Program Objects
Unload the OpenCL Compiler
Kernel and Event Objects
Create Kernel Objects
Kernel Arguments and Object Queries
Execute Kernels
Event Objects
Out-of-Order Execution of Kernels and Memory Object Commands
Profiling Operations
Flush and Finish
Supported Data Types
Built-In Scalar Data Types
Built-In Vector Data Types
Other Built-In Data Types
Reserved Data Types
Vector Component Addressing
Vector Components
Vector Addressing Equivalencies
Conversions and Type Casting Examples
Operators
Address Space Qualifiers
Function Qualifiers
Preprocessor Directives and Macros
Specify Type Attributes
Math Constants
Work-Item Built-In Functions
Integer Built-In Functions
Common Built-In Functions
Math Built-In Functions
Geometric Built-In Functions
Relational Built-In Functions
Vector Data Load/Store Functions
Atomic Functions
Async Copies and Prefetch Functions
Synchronization, Explicit Memory Fence
Miscellaneous Vector Built-In Functions
Image Read and Write Built-In Functions
Image Objects
Create Image Objects
Query List of Supported Image Formats
Copy between Image, Buffer Objects
Map and Unmap Image Objects
Read, Write, Copy Image Objects
Query Image Objects
Image Formats
Access Qualifiers
Sampler Objects
Sampler Declaration Fields
OpenCL Device Architecture Diagram
OpenCL/OpenGL Sharing APIs
CL Buffer Objects > GL Buffer Objects
CL Image Objects > GL Textures
CL Image Objects > GL Renderbuffers
Query Information
Share Objects
CL Event Objects > GL Sync Objects
CL Context > GL Context, Sharegroup
OpenCL/Direct3D 10 Sharing APIs
Index
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
Q
R
S
T
U
V
W
Z
ptg
OpenCL Programming Guide
OpenGL® Series Visit informit.com/opengl for a complete list of available products The OpenGL graphics system is a software interface to graphics hardware. (“GL” stands for “Graphics Library.”) It allows you to create interactive programs that produce color images of moving, three- dimensional objects. With OpenGL, you can control computer-graphics technology to produce realistic pictures, or ones that depart from reality in imaginative ways. The OpenGL Series from Addison-Wesley Professional comprises tutorial and reference books that help programmers gain a practical understanding of OpenGL standards, along with the insight needed to unlock OpenGL’s full potential. ptg
OpenCL Programming Guide Aaftab Munshi Benedict R. Gaster Timothy G. Mattson James Fung Dan Ginsburg Upper Saddle River, NJ • Boston • Indianapolis • San Francisco New York • Toronto • Montreal • London • Munich • Paris • Madrid Capetown • Sydney • Tokyo • Singapore • Mexico City ptg
Editor-in-Chief Mark Taub Acquisitions Editor Debra Williams Cauley Development Editor Michael Thurston Managing Editor John Fuller Project Editor Anna Popick Copy Editor Barbara Wood Indexer Jack Lewis Proofreader Lori Newhouse Technical Reviewers Andrew Brownsword Yahya H. Mizra Dave Shreiner Publishing Coordinator Kim Boedigheimer Cover Designer Alan Clements Compositor The CIP Group Many of the designations used by manufacturers and sellers to distin- guish their products are claimed as trademarks. Where those designa- tions appear in this book, and the publisher was aware of a trademark claim, the designations have been printed with initial capital letters or in all capitals. The authors and publisher have taken care in the preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The publisher offers excellent discounts on this book when ordered in quantity for bulk purchases or special sales, which may include elec- tronic versions and/or custom covers and content particular to your business, training goals, marketing focus, and branding interests. For more information, please contact: U.S. Corporate and Government Sales (800) 382-3419 corpsales@pearsontechgroup.com For sales outside the United States please contact: International Sales international@pearson.com Visit us on the Web: informit.com/aw Cataloging-in-publication data is on file with the Library of Congress. Copyright © 2012 Pearson Education, Inc. All rights reserved. Printed in the United States of America. This pub- lication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, elec- tronic, mechanical, photocopying, recording, or likewise. For informa- tion regarding permissions, write to: Pearson Education, Inc. Rights and Contracts Department 501 Boylston Street, Suite 900 Boston, MA 02116 Fax: (617) 671-3447 ISBN-13: 978-0-321-74964-2 ISBN-10: 0-321-74964-2 Text printed in the United States on recycled paper at Edwards Brothers in Ann Arbor, Michigan. First printing, July 2011 ptg
Contents Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxi Listings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv Foreword. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxix Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xxxiii Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xli About the Authors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xliii Part I The OpenCL 1.1 Language and API . . . . . . . . . . . . . . .1 1. An Introduction to OpenCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 What Is OpenCL, or . . . Why You Need This Book . . . . . . . . . . . . . . . 3 Our Many-Core Future: Heterogeneous Platforms . . . . . . . . . . . . . . . . 4 Software in a Many-Core World . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Conceptual Foundations of OpenCL . . . . . . . . . . . . . . . . . . . . . . . . . 11 Platform Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Execution Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Memory Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Programming Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 OpenCL and Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 The Contents of OpenCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Platform API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Runtime API . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Kernel Programming Language . . . . . . . . . . . . . . . . . . . . . . . . . . 32 OpenCL Summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34 The Embedded Profile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 Learning OpenCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 v ptg
2. HelloWorld: An OpenCL Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Building the Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Prerequisites. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 Mac OS X and Code::Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Microsoft Windows and Visual Studio . . . . . . . . . . . . . . . . . . . . . 42 Linux and Eclipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 HelloWorld Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 Choosing an OpenCL Platform and Creating a Context . . . . . . . 49 Choosing a Device and Creating a Command-Queue . . . . . . . . . 50 Creating and Building a Program Object . . . . . . . . . . . . . . . . . . . 52 Creating Kernel and Memory Objects . . . . . . . . . . . . . . . . . . . . . 54 Executing a Kernel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Checking for Errors in OpenCL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3. Platforms, Contexts, and Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 OpenCL Platforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 OpenCL Devices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 OpenCL Contexts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 4. Programming with OpenCL C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Writing a Data-Parallel Kernel Using OpenCL C . . . . . . . . . . . . . . . . 97 Scalar Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 The half Data Type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 Vector Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 Vector Literals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 Vector Components. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 Other Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 Derived Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Implicit Type Conversions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 Usual Arithmetic Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . 114 Explicit Casts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Explicit Conversions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Reinterpreting Data as Another Type . . . . . . . . . . . . . . . . . . . . . . . . 121 Vector Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Arithmetic Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 Relational and Equality Operators . . . . . . . . . . . . . . . . . . . . . . . 127 vi Contents ptg
Bitwise Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Logical Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Conditional Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Shift Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 Unary Operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Assignment Operator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 Qualifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Function Qualifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 Kernel Attribute Qualifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134 Address Space Qualifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 Access Qualifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 Type Qualifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141 Preprocessor Directives and Macros . . . . . . . . . . . . . . . . . . . . . . . . . 141 Pragma Directives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Macros . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 5. OpenCL C Built-In Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Work-Item Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 Math Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153 Floating-Point Pragmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Floating-Point Constants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162 Relative Error as ulps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 Integer Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Common Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172 Geometric Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Relational Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 Vector Data Load and Store Functions . . . . . . . . . . . . . . . . . . . . . . . 181 Synchronization Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 Async Copy and Prefetch Functions . . . . . . . . . . . . . . . . . . . . . . . . . 191 Atomic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195 Miscellaneous Vector Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 Image Read and Write Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Reading from an Image . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Samplers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 Determining the Border Color . . . . . . . . . . . . . . . . . . . . . . . . . . 209 Contents vii ptg
分享到:
收藏