Front Cover
HeterogeneousComputing with OpenCL
Copyright
Contents
Foreword
Preface
Our Heterogeneous World
OpenCL
This Text
Acknowledgments
About the Authors
Chapter 1: Introduction to Parallel Programming
Introduction
OpenCL
The Goals of This Book
Thinking Parallel
Concurrency and Parallel Programming Models
Structure
Reference
Further Reading and Relevant Websites
Chapter 2: Introduction to OpenCL
Introduction
Platform and Devices
The Execution Environment
Memory Model
Writing Kernels
Full Source Code Example for Vector Addition
Summary
Reference
Chapter 3: OpenCL Device Architectures
Introduction
Hardware trade-offs
The architectural design space
Summary
References
Chapter 4: Basic OpenCL Examples
Introduction
Example Applications
Compiling OpenCL Host Applications
Summary
Chapter 5: Understanding OpenCL's Concurrency and Execution Model
Introduction
Kernels, Work-Items, Workgroups, and the Execution Domain
OpenCL Synchronization: Kernels, Fences, and Barriers
Queuing and Global Synchronization
The Host-Side Memory Model
The Device-Side Memory Model
Summary
Chapter 6: Dissecting a CPU/GPU OpenCL Implementation
Introduction
OpenCL on an AMD Phenom II X6
OpenCL on the AMD Radeon HD6970 GPU
Memory Performance Considerations in OpenCL
Summary
References
Chapter 7: OpenCL Case Study
Introduction
Convolution Kernel
Conclusions
Code Listings
Reference
Chapter 8: OpenCL Case Study
Introduction
Getting Video Frames
Processing a Video in OpenCL
Processing Multiple Videos with Multiple Special Effects
Display to Screen of Final Output
Summary
Chapter 9: OpenCL Case Study
Introduction
Choosing the Number of Workgroups
Choosing the Optimal Workgroup Size
Optimizing Global Memory Data Access Patterns
Using Atomics to Perform Local Histogram
Optimizing Local Memory Access
Local Histogram Reduction
The Global Reduction
Full Kernel Code
Performance and Summary
Chapter 10: OpenCL Case Study
Introduction
Overview of the Computation
GPU Implementation
CPU Implementation
Load Balancing
Performance and Summary
Kernel for Uniform Grid Creation
Kernels for Simulation
Chapter 11: OpenCL Extensions
Introduction
Overview of Extension Mechanism
Device Fission
Double Precision
References
Chapter 12: OpenCL Profiling and Debugging
Introduction
Profiling with Events
AMD Accelerated Parallel Processing Profiler
AMD Accelerated Parallel Processing KernelAnalyzer
Walking through the AMD APP Profiler
Debugging OpenCL Applications
Overview of gDEBugger
AMD Printf Extension
Conclusion
Chapter 13: WebCL
Introduction
Designing the Framework
WebCL Pilot Implementation
WebCL Hands-on
Web Photo Editor
Discussion
Summary
Reference
Further Reading and Relevant Websites
Index