S05: High Performance Computing with CUDA
Data-parallel Algorithms &
Data-parallel Algorithms &
Data Structures
Data Structures
John Owens
UC Davis
One Slide Summary of Today
One Slide Summary of Today
GPUs are great at running many closely-coupled
but independent threads in parallel
The programming model is SPMD—single program,
multiple data
GPU computing boils down to:
Define a computation domain that generates many parallel
threads
This is the data structure
Iterate in parallel over that computation domain, running a
program over all threads
This is the algorithm
S05: High Performance Computing with CUDA
2
Outline
Outline
Data Structures
GPU Memory Model
Taxonomy
Algorithmic Building Blocks
Sample Application
Map
Gather & Scatter
Reductions
Scan (parallel prefix)
Sort, search, …
S05: High Performance Computing with CUDA
3
GPU Memory Model
GPU Memory Model
More restricted memory access than CPU
Allocate/free memory only before computation
Transfers to and from CPU are explicit
GPU is controlled by CPU, can’t initiate transfers, access
disk, etc.
To generalize …
GPUs are better at accessing data structures
CPUs are better at building data structures
As CPU-GPU bandwidth improves, consider doing data
structure tasks on their “natural” processor
S05: High Performance Computing with CUDA
4
GPU Memory Model
GPU Memory Model
Limited memory access during computation (kernel)
Registers (per fragment/thread)
Read/write
Shared memory (shared among threads)
Does not exist in general
CUDA allows access to shared memory btwn threads
Global memory (historical)
Read-only during computation
Write-only at end of computation (precomputed address)
Global memory (new)
Allows general scatter/gather (read/write)
– No collision rules!
– Exposed in all modern GPUs
S05: High Performance Computing with CUDA
5
Properties of GPU Data Structures
Properties of GPU Data Structures
To be efficient, must
support
Parallel read
Parallel write
Parallel iteration
Generalized arrays fit these
requirements
Dense (complete) arrays
Sparse (incomplete) arrays
Adaptive arrays
Virtual Domain
Page Table
Physical Memory
S05: High Performance Computing with CUDA
6
Think In Parallel
Think In Parallel
The GPU is a data-parallel processor
Thousands of parallel threads
Thousands of data elements to process
All data processed by the same program
SPMD computation model
Contrast with task parallelism and ILP
Best results when you “Think Data Parallel”
Design your algorithm for data-parallelism
Understand parallel algorithmic complexity and efficiency
Use data-parallel algorithmic primitives as building blocks
S05: High Performance Computing with CUDA
7
Data-Parallel Algorithms
Data-Parallel Algorithms
Efficient algorithms require efficient building blocks
This talk: data-parallel building blocks
Map
Gather & Scatter
Reduce
Scan
S05: High Performance Computing with CUDA
8