logo资料库

CUDA中文手册.pdf+技术文档+不带标签

第1页 / 共140页
第2页 / 共140页
第3页 / 共140页
第4页 / 共140页
第5页 / 共140页
第6页 / 共140页
第7页 / 共140页
第8页 / 共140页
资料共140页,剩余部分请下载后查看
NVIDIA CUDA 统一计算设备架构 编程指南 Version 1.1 11/29/2007 Version 1.1 I CUDA 编程指南
II Version 1.1 CUDA 编程指南
目 录 CUDA ...................................................................................................................................................... 1 1.1 .................................................................................................... 1 1.2 CUDA GPU .......................................................................................................... 3 1.3 ................................................................................................................................................... 6 ............................................................................................................................................................ 7 ............................................................................................................................... 7 ................................................................................................................................................... 7 .............................................................................................................................................. 7 ...................................................................................................................................... 8 2.3 ................................................................................................................................................. 10 ........................................................................................................................................................ 13 .......................................................................................... 13 ................................................................................................................................................. 14 ................................................................................................................................................. 15 ................................................................................................................................................. 16 ......................................................................................................................................... 16 ................................................................................................................................................ 17 多处理器 4.1 C ...................................................................................................................................... 17 4.2 ................................................................................................................................................. 17 ............................................................................................................................ 18 ............................................................................................................................ 19 ........................................................................................................................................ 21 ........................................................................................................................................ 21 4.2.5 NVCC ........................................................................................................................... 22 4.3 ..................................................................................................................................... 23 1 2 3 4 第第第第 第第第第 第第第第 第第第第 章章章章 章章章章 章章章章 章章章章 2.1 2.2 3.1 3.2 3.3 3.4 3.5 SIMD 2.2.2 2.2.1 简介简介简介简介 作为数据并行计算设备的图形处理器 :一种 计算的新架构 文档结构 编程模型编程模型编程模型编程模型 高度多线程协处理器 线程分批 线程块 线程块网格 内存模型 硬件实现硬件实现硬件实现硬件实现 具有片上共享内存的一组 执行模型 计算能力 多个设备 显示模式切换 应用编程接口 应用编程接口 应用编程接口 应用编程接口 编程语言扩展 语言扩展 函数类型限定符 变量类型限定符 执行配置 内置变量 使用 编译 共用运行时组件 内置向量类型 数学函数 时间函数 纹理类型 设备运行时组件 数学函数 编程指南 Version 1.1 4.2.1 4.2.2 4.2.3 4.2.4 4.3.1 4.3.2 4.3.3 4.3.4 4.4.1 CUDA ................................................................................................................................ 23 ........................................................................................................................................ 24 ........................................................................................................................................ 24 ........................................................................................................................................ 24 4.4 ..................................................................................................................................... 26 ........................................................................................................................................ 26 III
4.4.6 4.4.5 4.4.4 4.4.3 5.1.2 5.1.1 4.5.3 4.5.2 4.5.1 同步函数 类型转换函数 类型强制函数 纹理函数 原子函数 宿主运行时组件 常用概念 运行时 驱动程序 性能指南性能指南性能指南性能指南 指令性能 指令吞吐量 内存带宽 每块的线程数 宿主和设备之间的数据传送 纹理拾取与全局或常量内存读取 整体性能优化策略 矩阵乘法示例 矩阵乘法示例 矩阵乘法示例 矩阵乘法示例 概述 源码清单 源码攻略 技术规格技术规格技术规格技术规格 通用规范 浮点标准 数学函数数学函数数学函数数学函数 共用运行时组件 设备运行时组件 原子函数原子函数原子函数原子函数 算术函数 5.2 5.3 5.4 5.5 6.2 6.3 A.1 A.2 B.1 B.2 6 第第第第 章章章章 第第第第 章章章章 附录附录附录附录 附录附录附录附录 附录附录附录附录 IV .................................................................................................................................... 47 ........................................................................................................................................ 49 ......................................................................................................................................... 62 ................................................................................................................. 63 ......................................................................................................... 63 ................................................................................................................................. 64 ................................................................................................................................................ 67 6.1 ......................................................................................................................................................... 67 ................................................................................................................................................. 69 ................................................................................................................................................. 71 6.3.1 Mul() ............................................................................................................................................ 71 6.3.2 Muld() .......................................................................................................................................... 71 A ........................................................................................................................................................ 73 ................................................................................................................................................ 74 ................................................................................................................................................ 74 B ........................................................................................................................................................ 77 ..................................................................................................................................... 77 ..................................................................................................................................... 80 C ........................................................................................................................................................ 83 C.1 ................................................................................................................................................. 83 C.1.1 atomicAdd() ............................................................................................................................... 83 C.1.2 atomicSub() ............................................................................................................................... 83 C.1.3 atomicExch() ............................................................................................................................. 83 4.4.2 ........................................................................................................................................ 26 ................................................................................................................................ 26 ................................................................................................................................ 27 ........................................................................................................................................ 27 ........................................................................................................................................ 28 4.5 ....................................................................................................................................... 28 ........................................................................................................................................ 29 API ..................................................................................................................................... 32 API ................................................................................................................................. 39 5 ........................................................................................................................................................ 47 5.1 ................................................................................................................................................. 47 Version 1.1 CUDA 编程指南
C.1.4 atomicMin() ............................................................................................................................... 84 C.1.5 atomicMax() ............................................................................................................................... 84 C.1.6 atomicInc() ............................................................................................................................... 84 C.1.7 atomicDec() ............................................................................................................................... 84 C.1.8 atomicCAS() ............................................................................................................................... 84 C.2 ..................................................................................................................................................... 85 C.2.1 atomicAnd() ............................................................................................................................... 85 C.2.2 atomicOr() .................................................................................................................................. 85 C.2.3 atomicXor() ............................................................................................................................... 85 D ........................................................................................................................................... 87 D.1 .................................................................................................................................................. 87 D.1.1 cudaGetDeviceCount() .......................................................................................................... 87 D.1.2 cudaSetDevice() ...................................................................................................................... 87 D.1.3 cudaGetDevice() ...................................................................................................................... 87 D.1.4 cudaGetDeviceProperties() .............................................................................................. 88 D.1.5 cudaChooseDevice() .............................................................................................................. 89 附录附录附录附录 API 位函数 运行时运行时运行时运行时 参考参考参考参考 设备管理 线程管理 流管理 事件管理 内存管理 编程指南 D.2 .................................................................................................................................................. 89 D.2.1 cudaThreadSynchronize() .................................................................................................. 89 D.2.2 cudaThreadExit() ................................................................................................................... 89 D.3 ...................................................................................................................................................... 89 D.3.1 cudaStreamCreate() .............................................................................................................. 89 D.3.2 cudaStreamQuery() ................................................................................................................. 89 D.3.3 cudaStreamSynchronize() .................................................................................................. 89 D.3.4 cudaStreamDestroy() ............................................................................................................ 89 D.4 .................................................................................................................................................. 90 D.4.1 cudaEventCreate() ................................................................................................................. 90 D.4.2 cudaEventRecord() ................................................................................................................. 90 D.4.3 cudaEventQuery() ................................................................................................................... 90 D.4.4 cudaEventSynchronize() ..................................................................................................... 90 D.4.5 cudaEventDestroy() .............................................................................................................. 90 D.4.6 cudaEventElapsedTime() ..................................................................................................... 90 D.5 .................................................................................................................................................. 91 D.5.1 cudaMalloc() ............................................................................................................................. 91 D.5.2 cudaMallocPitch() ................................................................................................................. 91 CUDA Version 1.1 V
D.5.3 cudaFree() ................................................................................................................................. 91 D.5.4 cudaMallocArray() ................................................................................................................. 92 D.5.5 cudaFreeArray() ...................................................................................................................... 92 D.5.6 cudaMallocHost() ................................................................................................................... 92 D.5.7 cudaFreeHost() ........................................................................................................................ 92 D.5.8 cudaMemset() ............................................................................................................................. 92 D.5.9 cudaMemset2D() ........................................................................................................................ 92 D.5.10 cudaMemcpy() ............................................................................................................................. 93 D.5.11 cudaMemcpy2D() ........................................................................................................................ 93 D.5.12 cudaMemcpyToArray() ............................................................................................................ 94 D.5.13 cudaMemcpy2DToArray() ....................................................................................................... 94 D.5.14 cudaMemcpyFromArray() ....................................................................................................... 95 D.5.15 cudaMemcpy2DFromArray() .................................................................................................. 95 D.5.16 cudaMemcpyArrayToArray() ................................................................................................ 96 D.5.17 cudaMemcpy2DArrayToArray() ........................................................................................... 96 D.5.18 cudaMemcpyToSymbol() .......................................................................................................... 96 D.5.19 cudaMemcpyFromSymbol() ..................................................................................................... 96 D.5.20 cudaGetSymbolAddress() ..................................................................................................... 97 D.5.21 cudaGetSymbolSize() ............................................................................................................ 97 D.6 .......................................................................................................................................... 97 D.6.1 API .......................................................................................................................................... 97 D.6.2 API .......................................................................................................................................... 98 D.7 ................................................................................................................................................ 100 D.7.1 cudaConfigureCall() .......................................................................................................... 100 D.7.2 cudaLaunch() ........................................................................................................................... 100 D.7.3 cudaSetupArgument() .......................................................................................................... 100 纹理参考管理 低层 高层 执行控制 互操作性 互操作性 D.8 OpenGL ................................................................................................................................. 100 D.8.1 cudaGLRegisterBufferObject()..................................................................................... 100 D.8.2 cudaGLMapBufferObject() ................................................................................................ 101 D.8.3 cudaGLUnmapBufferObject() ............................................................................................ 101 D.8.4 cudaGLUnregisterBufferObject() ................................................................................ 101 D.9 Direct3D ................................................................................................................................. 101 D.9.1 cudaD3D9Begin() .................................................................................................................... 101 D.9.2 cudaD3D9End() ........................................................................................................................ 101 VI Version 1.1 CUDA 编程指南
D.9.3 cudaD3D9RegisterVertexBuffer() ................................................................................ 101 D.9.4 cudaD3D9MapVertexBuffer() ............................................................................................ 101 D.9.5 cudaD3D9UnmapVertexBuffer() ....................................................................................... 102 D.9.6 cudaD3D9UnregisterVertexBuffer() ........................................................................... 102 D.9.7 cudaD3D9GetDevice() .......................................................................................................... 102 D.10 ................................................................................................................................................ 102 D.10.1 cudaGetLastError() ............................................................................................................ 102 D.10.2 cudaGetErrorString() ........................................................................................................ 102 E ...................................................................................................................................... 103 E.1 ..................................................................................................................................................... 103 E.1.1 cuInit() ..................................................................................................................................... 103 E.2 ................................................................................................................................................. 103 E.2.1 cuDeviceGetCount() ............................................................................................................. 103 E.2.2 cuDeviceGet() ......................................................................................................................... 103 E.2.3 cuDeviceGetName() ............................................................................................................... 103 E.2.4 cuDeviceTotalMem() ............................................................................................................. 104 E.2.5 cuDeviceComputeCapability() ....................................................................................... 104 E.2.6 cuDeviceGetAttribute() ................................................................................................... 104 E.2.7 cuDeviceGetProperties() ................................................................................................. 105 E.3 ............................................................................................................................................. 106 E.3.1 cuCtxCreate() ......................................................................................................................... 106 E.3.2 cuCtxAttach() ......................................................................................................................... 106 E.3.3 cuCtxDetach() ......................................................................................................................... 106 E.3.4 cuCtxGetDevice() .................................................................................................................. 106 E.3.5 cuCtxSynchronize() ............................................................................................................. 106 E.4 ................................................................................................................................................. 106 E.4.1 cuModuleLoad() ...................................................................................................................... 106 E.4.2 cuModuleLoadData() ............................................................................................................. 107 E.4.3 cuModuleLoadFatBinary() ................................................................................................. 107 E.4.4 cuModuleUnload() .................................................................................................................. 107 E.4.5 cuModuleGetFunction() ...................................................................................................... 107 E.4.6 cuModuleGetGlobal() .......................................................................................................... 107 E.4.7 cuModuleGetTexRef() .......................................................................................................... 108 E.5 ..................................................................................................................................................... 108 CUDA Version 1.1 VII 附录附录附录附录 参考参考参考参考 API 错误处理 驱动程序驱动程序驱动程序驱动程序 初始化 设备管理 上下文管理 模块管理 流管理 编程指南
E.5.1 cuStreamCreate() .................................................................................................................. 108 E.5.2 cuStreamQuery() .................................................................................................................... 108 E.5.3 cuStreamSynchronize() ...................................................................................................... 108 E.5.4 cuStreamDestroy() ............................................................................................................... 108 E.6 ................................................................................................................................................. 108 E.6.1 cuEventCreate() .................................................................................................................... 108 E.6.2 cuEventRecord() .................................................................................................................... 108 E.6.3 cuEventQuery() ...................................................................................................................... 109 E.6.4 cuEventSynchronize() ........................................................................................................ 109 E.6.5 cuEventDestroy() .................................................................................................................. 109 E.6.6 cuEventElapsedTime() ........................................................................................................ 109 事件管理 执行控制 内存管理 E.7 ............................................................................................................................................... 109 E.7.1 cuFuncSetBlockShape() .................................................................................................... 109 E.7.2 cuFuncSetSharedSize() ...................................................................................................... 110 E.7.3 cuParamSetSize() .................................................................................................................. 110 E.7.4 cuParamSeti() ......................................................................................................................... 110 E.7.5 cuParamSetf() ......................................................................................................................... 110 E.7.6 cuParamSetv() ......................................................................................................................... 110 E.7.7 cuParamSetTexRef() ............................................................................................................. 110 E.7.8 cuLaunch() ................................................................................................................................ 110 E.7.9 cuLaunchGrid() ...................................................................................................................... 111 E.8 ................................................................................................................................................. 111 E.8.1 cuMemGetInfo() ...................................................................................................................... 111 E.8.2 cuMemAlloc() ........................................................................................................................... 111 E.8.3 cuMemAllocPitch() ............................................................................................................... 111 E.8.4 cuMemFree().............................................................................................................................. 112 E.8.5 cuMemAllocHost() .................................................................................................................. 112 E.8.6 cuMemFreeHost() .................................................................................................................... 112 E.8.7 cuMemGetAddressRange() ................................................................................................... 112 E.8.8 cuArrayCreate() .................................................................................................................... 113 E.8.9 cuArrayGetDescriptor() ................................................................................................... 114 E.8.10 cuArrayDestroy() .................................................................................................................. 114 E.8.11 cuMemset() ................................................................................................................................ 114 E.8.12 cuMemset2D() ........................................................................................................................... 114 VIII CUDA Version 1.1 编程指南
分享到:
收藏