Contents
Figures
Tables
Listings
Foreword
Preface
Acknowledgments.
About the Authors
Part I: The OpenCL 1.1 Language and API
1. An Introduction to OpenCL
What Is OpenCL, or . . . Why You Need This Book
Our Many-Core Future: Heterogeneous Platforms
Software in a Many-Core World
Conceptual Foundations of OpenCL
OpenCL and Graphics
The Contents of OpenCL
The Embedded Profile
Learning OpenCL
2. HelloWorld: An OpenCL Example
Building the Examples.
HelloWorld Example
Checking for Errors in OpenCL
3. Platforms, Contexts, and Devices
OpenCL Platforms
OpenCL Devices
OpenCL Contexts
4. Programming with OpenCL C
Writing a Data-Parallel Kernel Using OpenCL C
Scalar Data Types
Vector Data Types
Other Data Types
Derived Types
Implicit Type Conversions
Explicit Casts
Explicit Conversions
Reinterpreting Data as Another Type
Vector Operators
Qualifiers
Keywords
Preprocessor Directives and Macros
Restrictions
5. OpenCL C Built-In Functions
Work-Item Functions
Math Functions
Integer Functions
Common Functions
Geometric Functions
Relational Functions
Vector Data Load and Store Functions
Synchronization Functions
Async Copy and Prefetch Functions
Atomic Functions
Miscellaneous Vector Functions
Image Read and Write Functions
6. Programs and Kernels
Program and Kernel Object Overview
Program Objects
Kernel Objects
7. Buffers and Sub-Buffers
Memory Objects, Buffers, and Sub-Buffers Overview
Creating Buffers and Sub-Buffers
Querying Buffers and Sub-Buffers
Reading, Writing, and Copying Buffers and Sub-Buffers
Mapping Buffers and Sub-Buffers
8. Images and Samplers
Image and Sampler Object Overview
Creating Image Objects
Creating Sampler Objects
OpenCL C Functions for Working with Images
Transferring Image Objects
9. Events
Commands, Queues, and Events Overview
Events and Command-Queues
Event Objects
Generating Events on the Host
Events Impacting Execution on the Host
Using Events for Profiling
Events Inside Kernels
Events from Outside OpenCL
10. Interoperability with OpenGL
OpenCL/OpenGL Sharing Overview
Querying for the OpenGL Sharing Extension
Initializing an OpenCL Context for OpenGL Interoperability
Creating OpenCL Buffers from OpenGL Buffers
Creating OpenCL Image Objects from OpenGL Textures
Querying Information about OpenGL Objects
Synchronization between OpenGL and OpenCL
11. Interoperability with Direct3D
Direct3D/OpenCL Sharing Overview
Initializing an OpenCL Context for Direct3D Interoperability
Creating OpenCL Memory Objects from Direct3D Buffers and Textures
Acquiring and Releasing Direct3D Objects in OpenCL
Processing a Direct3D Texture in OpenCL
Processing D3D Vertex Data in OpenCL
12. C++ Wrapper API
C++ Wrapper API Overview
C++ Wrapper API Exceptions
Vector Add Example Using the C++ Wrapper API
13. OpenCL Embedded Profile
OpenCL Profile Overview
64-Bit Integers
Images
Built-In Atomic Functions
Mandated Minimum Single-Precision Floating-Point Capabilities
Determining the Profile Supported by a Device in an OpenCL C Program
Part II: OpenCL 1.1 Case Studies
14. Image Histogram
Computing an Image Histogram
Parallelizing the Image Histogram
Additional Optimizations to the Parallel Image Histogram
Computing Histograms with Half-Float or Float Values for Each Channel
15. Sobel Edge Detection Filter
What Is a Sobel Edge Detection Filter?
Implementing the Sobel Filter as an OpenCL Kernel
16. Parallelizing Dijkstra’s Single-Source Shortest-Path Graph Algorithm
Graph Data Structures
Kernels
Leveraging Multiple Compute Devices
17. Cloth Simulation in the Bullet Physics SDK
An Introduction to Cloth Simulation
Simulating the Soft Body
Executing the Simulation on the CPU
Changes Necessary for Basic GPU Execution
Two-Layered Batching
Optimizing for SIMD Computation and Local Memory
Adding OpenGL Interoperation
18. Simulating the Ocean with Fast Fourier Transform
An Overview of the Ocean Application
Phillips Spectrum Generation
An OpenCL Discrete Fourier Transform
A Closer Look at the FFT Kernel
A Closer Look at the Transpose Kernel
19. Optical Flow
Optical Flow Problem Overview
Sub-Pixel Accuracy with Hardware Linear Interpolation
Application of the Texture Cache
Using Local Memory
Early Exit and Hardware Scheduling
Efficient Visualization with OpenGL Interop
Performance
20. Using OpenCL with PyOpenCL
Introducing PyOpenCL
Running the PyImageFilter2D Example
PyImageFilter2D Code
Context and Command-Queue Creation
Loading to an Image Object
Creating and Building a Program
Setting Kernel Arguments and Executing a Kernel
Reading the Results
21. Matrix Multiplication with OpenCL
The Basic Matrix Multiplication Algorithm
A Direct Translation into OpenCL
Increasing the Amount of Work per Kernel
Optimizing Memory Movement: Local Memory
Performance Results and Optimizing the Original CPU Code
22. Sparse Matrix-Vector Multiplication
Sparse Matrix-Vector Multiplication (SpMV) Algorithm
Description of This Implementation
Tiled and Packetized Sparse Matrix Representation
Header Structure
Tiled and Packetized Sparse Matrix Design Considerations
Optional Team Information
Tested Hardware Devices and Results
Additional Areas of Optimization
A. Summary of OpenCL 1.1
The OpenCL Platform Layer
Contexts
Querying Platform Information and Devices
The OpenCL Runtime
Command-Queues
Buffer Objects
Create Buffer Objects
Read, Write, and Copy Buffer Objects
Map Buffer Objects
Manage Buffer Objects
Query Buffer Objects
Program Objects
Create Program Objects
Build Program Executable
Build Options
Query Program Objects
Unload the OpenCL Compiler
Kernel and Event Objects
Create Kernel Objects
Kernel Arguments and Object Queries
Execute Kernels
Event Objects
Out-of-Order Execution of Kernels and Memory Object Commands
Profiling Operations
Flush and Finish
Supported Data Types
Built-In Scalar Data Types
Built-In Vector Data Types
Other Built-In Data Types
Reserved Data Types
Vector Component Addressing
Vector Components
Vector Addressing Equivalencies
Conversions and Type Casting Examples
Operators
Address Space Qualifiers
Function Qualifiers
Preprocessor Directives and Macros
Specify Type Attributes
Math Constants
Work-Item Built-In Functions
Integer Built-In Functions
Common Built-In Functions
Math Built-In Functions
Geometric Built-In Functions
Relational Built-In Functions
Vector Data Load/Store Functions
Atomic Functions
Async Copies and Prefetch Functions
Synchronization, Explicit Memory Fence
Miscellaneous Vector Built-In Functions
Image Read and Write Built-In Functions
Image Objects
Create Image Objects
Query List of Supported Image Formats
Copy between Image, Buffer Objects
Map and Unmap Image Objects
Read, Write, Copy Image Objects
Query Image Objects
Image Formats
Access Qualifiers
Sampler Objects
Sampler Declaration Fields
OpenCL Device Architecture Diagram
OpenCL/OpenGL Sharing APIs
CL Buffer Objects > GL Buffer Objects
CL Image Objects > GL Textures
CL Image Objects > GL Renderbuffers
Query Information
Share Objects
CL Event Objects > GL Sync Objects
CL Context > GL Context, Sharegroup
OpenCL/Direct3D 10 Sharing APIs
Index
A
B
C
D
E
F
G
H
I
K
L
M
N
O
P
Q
R
S
T
U
V
W
Z