logo资料库

Thrust:一个面向效率的CUDA编程库.pdf

第1页 / 共21页
第2页 / 共21页
第3页 / 共21页
第4页 / 共21页
第5页 / 共21页
第6页 / 共21页
第7页 / 共21页
第8页 / 共21页
资料共21页,剩余部分请下载后查看
Table of Contents
Introduction
1.1. Installation and Versioning
Vectors
2.1. Thrust Namespace
2.2. Iterators and Static Dispatching
Algorithms
3.1. Transformations
3.2. Reductions
3.3. Prefix-Sums
3.4. Reordering
3.5. Sorting
Fancy Iterators
4.1. constant_iterator
4.2. counting_iterator
4.3. transform_iterator
4.4. permutation_iterator
4.5. zip_iterator
Additional Resources
THRUST QUICK START GUIDE DU-06716-001_v6.0 | February 2014
TABLE OF CONTENTS Chapter 1. Introduction.........................................................................................1 1.1. Installation and Versioning.............................................................................. 1 Chapter  2.  Vectors............................................................................................... 2 2.1.  Thrust Namespace........................................................................................ 4 2.2. Iterators and Static Dispatching........................................................................4 Chapter 3. Algorithms........................................................................................... 7 3.1.  Transformations........................................................................................... 7 3.2.  Reductions................................................................................................ 10 3.3.  Prefix-Sums............................................................................................... 12 3.4.  Reordering................................................................................................ 12 3.5.  Sorting.....................................................................................................13 Chapter 4. Fancy Iterators....................................................................................14 4.1.  constant_iterator........................................................................................ 14 4.2.  counting_iterator........................................................................................ 15 4.3. transform_iterator.......................................................................................15 4.4. permutation_iterator................................................................................... 16 4.5.  zip_iterator............................................................................................... 17 Chapter 5. Additional Resources............................................................................ 18 www.nvidia.com Thrust Quick Start Guide DU-06716-001_v6.0 | ii
Chapter 1. INTRODUCTION Thrust is a C++ template library for CUDA based on the Standard Template Library (STL). Thrust allows you to implement high performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with CUDA C. Thrust provides a rich collection of data parallel primitives such as scan, sort, and reduce, which can be composed together to implement complex algorithms with concise, readable source code. By describing your computation in terms of these high- level abstractions you provide Thrust with the freedom to select the most efficient implementation automatically. As a result, Thrust can be utilized in rapid prototyping of CUDA applications, where programmer productivity matters most, as well as in production, where robustness and absolute performance are crucial. This document describes how to develop CUDA applications with Thrust. The tutorial is intended to be accessible, even if you have limited C++ or CUDA experience. 1.1. Installation and Versioning Installing the CUDA Toolkit will copy Thrust header files to the standard CUDA include directory for your system. Since Thrust is a template library of header files, no further installation is necessary to start using Thrust. In addition, new versions of Thrust continue to be available online through the GitHub Thrust project page. The version of Thrust included in this version of the CUDA Toolkit corresponds to version 1.7.0 from the Thrust project page. www.nvidia.com Thrust Quick Start Guide DU-06716-001_v6.0 | 1
Chapter 2. VECTORS Thrust provides two vector containers, host_vector and device_vector. As the names suggest, host_vector is stored in host memory while device_vector lives in GPU device memory. Thrust’s vector containers are just like std::vector in the C+ + STL. Like std::vector, host_vector and device_vector are generic containers www.nvidia.com Thrust Quick Start Guide DU-06716-001_v6.0 | 2
Vectors (able to store any data type) that can be resized dynamically. The following source code illustrates the use of Thrust’s vector containers. #include #include #include int main(void) { // H has storage for 4 integers thrust::host_vector H(4); // initialize individual elements H[0] = 14; H[1] = 20; H[2] = 38; H[3] = 46; // H.size() returns the size of vector H std::cout << "H has size " << H.size() << std::endl; // print contents of H for(int i = 0; i < H.size(); i++) std::cout << "H[" << i << "] = " << H[i] << std::endl; // resize H H.resize(2); std::cout << "H now has size " << H.size() << std::endl; // Copy host_vector H to device_vector D thrust::device_vector D = H; // elements of D can be modified D[0] = 99; D[1] = 88; // print contents of D for(int i = 0; i < D.size(); i++) std::cout << "D[" << i << "] = " << D[i] << std::endl; // H and D are automatically deleted when the function returns return 0; } As this example shows, the = operator can be used to copy a host_vector to a device_vector (or vice-versa). The = operator can also be used to copy host_vector to host_vector or device_vector to device_vector. Also note that individual elements of a device_vector can be accessed using the standard bracket notation. However, because each of these accesses requires a call to cudaMemcpy, they should be used sparingly. We’ll look at some more efficient techniques later. www.nvidia.com Thrust Quick Start Guide DU-06716-001_v6.0 | 3
Vectors It’s often useful to initialize all the elements of a vector to a specific value, or to copy only a certain set of values from one vector to another. Thrust provides a few ways to do these kinds of operations. #include #include #include #include #include #include int main(void) { // initialize all ten integers of a device_vector to 1 thrust::device_vector D(10, 1); // set the first seven elements of a vector to 9 thrust::fill(D.begin(), D.begin() + 7, 9); // initialize a host_vector with the first five elements of D thrust::host_vector H(D.begin(), D.begin() + 5); // set the elements of H to 0, 1, 2, 3, ... thrust::sequence(H.begin(), H.end()); // copy all of H back to the beginning of D thrust::copy(H.begin(), H.end(), D.begin()); // print D for(int i = 0; i < D.size(); i++) std::cout << "D[" << i << "] = " << D[i] << std::endl; return 0; } Here we’ve illustrated use of the fill, copy, and sequence functions. The copy function can be used to copy a range of host or device elements to another host or device vector. Like the corresponding STL function, thrust::fill simply sets a range of elements to a specific value. Thrust’s sequence function can be used to a create a sequence of equally spaced values. 2.1. Thrust Namespace You’ll notice that we use things like thrust::host_vector or thrust::copy in our examples. The thrust:: part tells the C++ compiler that we want to look inside the thrust namespace for a specific function or class. Namespaces are a nice way to avoid name collisions. For instance, thrust::copy is different from std::copy provided in the STL. C++ namespaces allow us to distinguish between these two copy functions. 2.2. Iterators and Static Dispatching In this section we used expressions like H.begin() and H.end() or offsets like D.begin() + 7. The result of begin() and end() is called an iterator in C++. In the case of vector containers, which are really just arrays, iterators can be thought of as www.nvidia.com Thrust Quick Start Guide DU-06716-001_v6.0 | 4
Vectors pointers to array elements. Therefore, H.begin() is an iterator that points to the first element of the array stored inside the H vector. Similarly, H.end() points to the element one past the last element of the H vector. Although vector iterators are similar to pointers they carry more information with them. Notice that we did not have to tell thrust::fill that it was operating on a device_vector iterator. This information is captured in the type of the iterator returned by D.begin() which is different than the type returned by H.begin(). When a Thrust function is called, it inspects the type of the iterator to determine whether to use a host or a device implementation. This process is known as static dispatching since the host/device dispatch is resolved at compile time. Note that this implies that there is no runtime overhead to the dispatch process. You may wonder what happens when a “raw” pointer is used as an argument to a Thrust function. Like the STL, Thrust permits this usage and it will dispatch the host path of the algorithm. If the pointer in question is in fact a pointer to device memory then you’ll need to wrap it with thrust::device_ptr before calling the function. For example: size_t N = 10; // raw pointer to device memory int * raw_ptr; cudaMalloc((void **) &raw_ptr, N * sizeof(int)); // wrap raw pointer with a device_ptr thrust::device_ptr dev_ptr(raw_ptr); // use device_ptr in thrust algorithms thrust::fill(dev_ptr, dev_ptr + N, (int) 0); To extract a raw pointer from a device_ptr the raw_pointer_cast should be applied as follows: size_t N = 10; // create a device_ptr thrust::device_ptr dev_ptr = thrust::device_malloc(N); // extract raw pointer from device_ptr int * raw_ptr = thrust::raw_pointer_cast(dev_ptr); Another reason to distinguish between iterators and pointers is that iterators can be used to traverse many kinds of data structures. For example, the STL provides a linked list container (std::list) that provides bidirectional (but not random access) iterators. www.nvidia.com Thrust Quick Start Guide DU-06716-001_v6.0 | 5
Vectors Although Thrust does not provide device implementations of such containers, it is compatible with them. #include #include #include #include int main(void) { // create an STL list with 4 values std::list stl_list; stl_list.push_back(10); stl_list.push_back(20); stl_list.push_back(30); stl_list.push_back(40); // initialize a device_vector with the list thrust::device_vector D(stl_list.begin(), stl_list.end()); // copy a device_vector into an STL vector std::vector stl_vector(D.size()); thrust::copy(D.begin(), D.end(), stl_vector.begin()); return 0; } For Future Reference: The iterators we’ve covered so far are useful, but fairly basic. In addition to these normal iterators, Thrust also provides a collection of fancy iterators with names like counting_iterator and zip_iterator. While they look and feel like normal iterators, fancy iterators are capable of more exciting things. We’ll revisit this topic later in the tutorial. www.nvidia.com Thrust Quick Start Guide DU-06716-001_v6.0 | 6
分享到:
收藏