GPU并行算法.pdf

发布时间：2022-05-30 发布人：admin 分类：说明书资料大小：6.65M 资料格式：pdf 举报版权申诉

weixin_42606937-10722095-4744302543309515008.pdf-第1页.png

第1页 / 共57页

weixin_42606937-10722095-4744302543309515008.pdf-第2页.png

第2页 / 共57页

weixin_42606937-10722095-4744302543309515008.pdf-第3页.png

第3页 / 共57页

weixin_42606937-10722095-4744302543309515008.pdf-第4页.png

第4页 / 共57页

weixin_42606937-10722095-4744302543309515008.pdf-第5页.png

第5页 / 共57页

weixin_42606937-10722095-4744302543309515008.pdf-第6页.png

第6页 / 共57页

weixin_42606937-10722095-4744302543309515008.pdf-第7页.png

第7页 / 共57页

weixin_42606937-10722095-4744302543309515008.pdf-第8页.png

第8页 / 共57页

文本预览

S05: High Performance Computing with CUDA Data-parallel Algorithms & Data-parallel Algorithms & Data Structures Data Structures John Owens UC Davis

One Slide Summary of Today One Slide Summary of Today GPUs are great at running many closely-coupled but independent threads in parallel The programming model is SPMD—single program, multiple data GPU computing boils down to: Define a computation domain that generates many parallel threads This is the data structure Iterate in parallel over that computation domain, running a program over all threads This is the algorithm S05: High Performance Computing with CUDA 2

Outline Outline Data Structures GPU Memory Model Taxonomy Algorithmic Building Blocks Sample Application Map Gather & Scatter Reductions Scan (parallel prefix) Sort, search, … S05: High Performance Computing with CUDA 3

GPU Memory Model GPU Memory Model More restricted memory access than CPU Allocate/free memory only before computation Transfers to and from CPU are explicit GPU is controlled by CPU, can’t initiate transfers, access disk, etc. To generalize … GPUs are better at accessing data structures CPUs are better at building data structures As CPU-GPU bandwidth improves, consider doing data structure tasks on their “natural” processor S05: High Performance Computing with CUDA 4

GPU Memory Model GPU Memory Model Limited memory access during computation (kernel) Registers (per fragment/thread) Read/write Shared memory (shared among threads) Does not exist in general CUDA allows access to shared memory btwn threads Global memory (historical) Read-only during computation Write-only at end of computation (precomputed address) Global memory (new) Allows general scatter/gather (read/write) – No collision rules! – Exposed in all modern GPUs S05: High Performance Computing with CUDA 5

Properties of GPU Data Structures Properties of GPU Data Structures To be efficient, must support Parallel read Parallel write Parallel iteration Generalized arrays fit these requirements Dense (complete) arrays Sparse (incomplete) arrays Adaptive arrays Virtual Domain Page Table Physical Memory S05: High Performance Computing with CUDA 6

Think In Parallel Think In Parallel The GPU is a data-parallel processor Thousands of parallel threads Thousands of data elements to process All data processed by the same program SPMD computation model Contrast with task parallelism and ILP Best results when you “Think Data Parallel” Design your algorithm for data-parallelism Understand parallel algorithmic complexity and efficiency Use data-parallel algorithmic primitives as building blocks S05: High Performance Computing with CUDA 7

Data-Parallel Algorithms Data-Parallel Algorithms Efficient algorithms require efficient building blocks This talk: data-parallel building blocks Map Gather & Scatter Reduce Scan S05: High Performance Computing with CUDA 8

分享到：

赞收藏

资料库

GPU并行算法.pdf

相关推荐

开发技术

热门标签

最新资料