CUDA卷积详解.pdf-资料库

CUDA CUFFT Library PG-00000-003_V1.0 June, 2007

CUFFT Library Confidential Information Published by NVIDIA Corporation 2701 San Tomas Expressway Santa Clara, CA 95050 PG-00000-003_V1.0 Notice This source code is subject to NVIDIA ownership rights under U.S. and international Copyright laws. This software and the information contained herein is PROPRIETARY and CONFIDENTIAL to NVIDIA and is being provided under the terms and conditions of a Non‐Disclosure Agreement. Any reproduction or disclosure to any third party without the express written consent of NVIDIA is prohibited. NVIDIA MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOURCE CODE FOR ANY PURPOSE. IT IS PROVIDED “AS IS” WITHOUT EXPRESS OR IMPLIED WARRANTY OF ANY KIND. NVIDIA DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOURCE CODE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOURCE CODE. U.S. Government End Users. This source code is a “commercial item” as that term is defined at 48 C.F.R. 2.101 (OCT 1995), consisting of “commercial computer software” and “commercial computer software documentation” as such terms are used in 48 C.F.R. 12.212 (SEPT 1995) and is provided to the U.S. Government only as a commercial end item. Consistent with 48 C.F.R.12.212 and 48 C.F.R. 227.7202‐1 through 227.7202‐4 (JUNE 1995), all U.S. Government End Users acquire the source code with only those rights set forth herein. Trademarks NVIDIA, CUDA, and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the United States and other countries. Other company and product names may be trademarks of the respective companies with which they are associated. Copyright © 2006–2007 by NVIDIA Corporation. All rights reserved. NVIDIA Corporation

Table of Contents CUFFT Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 CUFFT Types and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Type cufftHandle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Type cufftResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Type cufftReal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Type cufftComplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 CUFFT Transform Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 CUFFT Transform Directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 CUFFT API Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Function cufftPlan1d(). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 Function cufftPlan2d(). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Function cufftPlan3d(). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Function cufftDestroy() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Function cufftExecC2C() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Function cufftExecR2C() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Function cufftExecC2R() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Accuracy and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 CUFFT Code Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1D Complex-to-Complex Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1D Real-to-Complex Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2D Complex-to-Complex Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2D Complex-to-Real Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 3D Complex-to-Complex Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 PG-00000-003_V1.0 NVIDIA iii

CUFFT Library This document describes CUFFT, the NVIDIA® CUDA™ (compute unified device architecture) Fast Fourier Transform (FFT) library. The FFT is a divide‐and‐conquer algorithm for efficiently computing discrete Fourier transforms of complex or real‐valued data sets, and it is one of the most important and widely used numerical algorithms, with applications that include computational physics and general signal processing. The CUFFT library provides a simple interface for computing parallel FFTs on an NVIDIA GPU, which allows users to leverage the floating‐point power and parallelism of the GPU without having to develop a custom, GPU‐based FFT implementation. FFT libraries typically vary in terms of supported transform sizes and data types. For example, some libraries only implement Radix‐2 FFTs, restricting the transform size to a power of two, while other implementations support arbitrary transform sizes. This version of the CUFFT library supports the following features: 1D, 2D, and 3D transforms of complex and real‐valued data. Batch execution for doing multiple 1D transforms in parallel. 2D and 3D transform sizes in the range [2, 16384] in any dimension. 1D transform sizes up to 8 million elements. In‐place and out‐of‐place transforms for real and complex data. CUFFT Types and Definitions The next sections describe the CUFFT types and transform directions: “Type cufftHandle” on page 2 “Type cufftResult” on page 2 “Type cufftReal” on page 2 “Type cufftComplex” on page 3 PG-00000-003_V1.0 NVIDIA 1

CUDA CUFFT Library “CUFFT Transform Types” on page 3 “CUFFT Transform Directions” on page 4 Type cufftHandle typedef unsigned int cufftHandle; is a handle type used to store and access CUFFT plans. For example, the user receives a handle after creating a CUFFT plan and uses this handle to execute the plan. Type cufftResult typedef enum cufftResult_t cufftResult; is an enumeration of values used exclusively as API function return values. The possible return values are defined as follows: Return Values CUFFT_SUCCESS CUFFT_INVALID_PLAN CUFFT_ALLOC_FAILED CUFFT_INVALID_TYPE CUFFT_INVALID_VALUE CUFFT_INTERNAL_ERROR CUFFT_EXEC_FAILED CUFFT_SETUP_FAILED CUFFT_SHUTDOWN_FAILED CUFFT_INVALID_SIZE Any CUFFT operation is successful. CUFFT is passed an invalid plan handle. CUFFT failed to allocate GPU memory. The user requests an unsupported type. The user specifies a bad memory pointer. Used for all internal driver errors. CUFFT failed to execute an FFT on the GPU. The CUFFT library failed to initialize. The CUFFT library failed to shut down. The user specifies an unsupported FFT size. Type cufftReal typedef float cufftReal; is a single‐precision, floating‐point real data type. PG-00000-003_V1.0 NVIDIA 2

CUDA CUFFT Library Type cufftComplex typedef float cufftComplex[2]; is a single‐precision, floating‐point complex data type that consists of interleaved real and imaginary components. CUFFT Transform Types The CUFFT library supports complex‐ and real‐data transforms. The cufftType data type is an enumeration of the types of transform data supported by CUFFT: typedef enum cufftType_t { CUFFT_R2C = 0x2a, // Real to complex (interleaved) CUFFT_C2R = 0x2c, // Complex (interleaved) to real CUFFT_C2C = 0x29 // Complex to complex, interleaved } cufftType; For complex FFTs, the input and output arrays must interleave the real and imaginary parts (the cufftComplex type). The transform size in each dimension is the number of cufftComplex elements. The CUFFT_C2C constant can be passed to any plan creation function to configure a complex‐to‐complex FFT. For real‐to‐complex FFTs, the output array holds only the non‐ redundant complex coefficients. So for an N‐element transform, the output array holds N/2+1 cufftComplex terms. For higher‐ dimensional real transforms of the form N0×N1×...×Nn, the last dimension is cut in half such that the output data is N0×N1×...×(Nn/ 2+1) complex elements. Therefore, in order to perform an in‐place FFT, the user has to pad the input array in the last dimension to (Nn/ 2+1) complex elements or 2*(N/2+1) real elements. Note that the real‐to‐complex transform is implicitly forward. Passing the CUFFT_R2C constant to any plan creation function configures a real‐to‐ complex FFT. The requirements for complex‐to‐real FFTs are similar to those for real‐ to‐complex. In this case, the input array holds only the non‐redundant, N/2+1 complex coefficients from a real‐to‐complex transform. The output is simply N elements of type cufftReal. However, for an in‐ place transform, the input size must be padded to 2*(N/2+1) real 3 NVIDIA PG-00000-003_V1.0

CUDA CUFFT Library elements. The complex‐to‐real transform is implicitly inverse. Passing the CUFFT_C2R constant to any plan creation function configures a complex‐to‐real FFT. For 1D complex‐to‐complex transforms, the stride between signals in a batch is assumed to be the number of cufftComplex elements in the logical transform size. However, for real‐data FFTs, the distance between signals in a batch depends on whether the transform is in‐ place or out‐of‐place. For in‐place FFTs, the input stride is assumed to be 2*(N/2+1) cufftReal elements or N/2+1 cufftComplex elements. For out‐of‐place transforms, the input and output strides match the logical transform size (N) and the non‐redundant size (N/2+1), respectively. CUFFT Transform Directions The CUFFT library defines forward and inverse Fast Fourier Transforms according to the sign of the complex exponential term: #define CUFFT_FORWARD -1 #define CUFFT_INVERSE 1 For higher‐dimensional transforms (2D and 3D), CUFFT performs FFTs in row‐major or C order. For example, if the user requests a 3D transform plan for sizes X, Y, and Z, CUFFT transforms along Z, Y, and then X. The user can configure column‐major FFTs by simply changing the order of the size parameters to the plan creation API functions. CUFFT performs un‐normalized FFTs; that is, performing a forward FFT on an input data set followed by an inverse FFT on the resulting set yields data that is equal to the input scaled by the number of elements. Scaling either transform by the reciprocal of the size of the data set is left for the user to perform as seen fit. CUFFT API Functions The CUFFT API is modeled after FFTW (see http://www.fftw.org), which is one of the most popular and efficient CPU‐based FFT libraries. FFTW provides a simple configuration mechanism called a plan that completely specifies the optimal—that is, the minimum PG-00000-003_V1.0 NVIDIA 4

CUDA CUFFT Library floating‐point operation (flop)—plan of execution for a particular FFT size and data type. The advantage of this approach is that once the user creates a plan, the library stores whatever state is needed to execute the plan multiple times without recalculation of the configuration. The FFTW model works well for CUFFT because different kinds of FFTs require different thread configurations and GPU resources, and plans are a simple way to store and reuse configurations. The CUFFT library initializes internal data upon the first invocation of an API function. Therefore, all API functions could return the CUFFT_SETUP_FAILED error code if the library fails to initialize. CUFFT shuts down automatically when all user‐created FFT plans are destroyed. The CUFFT functions are as follows: “Function cufftPlan1d()” on page 5 “Function cufftPlan2d()” on page 6 “Function cufftPlan3d()” on page 7 “Function cufftDestroy()” on page 7 “Function cufftExecC2C()” on page 8 “Function cufftExecR2C()” on page 8 “Function cufftExecC2R()” on page 9 Function cufftPlan1d() cufftResult cufftPlan1d( cufftHandle *plan, int nx, cufftType type, int batch ); creates a 1D FFT plan configuration for a specified signal size and data type. The batch input parameter tells CUFFT how many 1D transforms to configure. Input plan nx type batch Number of transforms of size nx Pointer to a cufftHandle object The transform size (e.g., 256 for a 256-point FFT) The transform data type (e.g., CUFFT_C2C for complex to complex) 5 NVIDIA PG-00000-003_V1.0

资料库

CUDA卷积详解.pdf

相关推荐

开发技术

热门标签

最新资料