CUDA
CUFFT Library
PG-00000-003_V1.0
June, 2007
CUFFT Library
Confidential Information
Published by
NVIDIA Corporation
2701 San Tomas Expressway
Santa Clara, CA 95050
PG-00000-003_V1.0
Notice
This source code is subject to NVIDIA ownership rights under U.S. and international Copyright laws.
This software and the information contained herein is PROPRIETARY and CONFIDENTIAL to NVIDIA
and is being provided under the terms and conditions of a Non‐Disclosure Agreement. Any reproduction
or disclosure to any third party without the express written consent of NVIDIA is prohibited.
NVIDIA MAKES NO REPRESENTATION ABOUT THE SUITABILITY OF THIS SOURCE CODE FOR
ANY PURPOSE. IT IS PROVIDED “AS IS” WITHOUT EXPRESS OR IMPLIED WARRANTY OF ANY
KIND. NVIDIA DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOURCE CODE,
INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY, NONINFRINGEMENT, AND
FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT SHALL NVIDIA BE LIABLE FOR ANY
SPECIAL, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, OR ANY DAMAGES
WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF
CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION
WITH THE USE OR PERFORMANCE OF THIS SOURCE CODE.
U.S. Government End Users. This source code is a “commercial item” as that term is defined at 48 C.F.R.
2.101 (OCT 1995), consisting of “commercial computer software” and “commercial computer software
documentation” as such terms are used in 48 C.F.R. 12.212 (SEPT 1995) and is provided to the U.S.
Government only as a commercial end item. Consistent with 48 C.F.R.12.212 and 48 C.F.R. 227.7202‐1
through 227.7202‐4 (JUNE 1995), all U.S. Government End Users acquire the source code with only those
rights set forth herein.
Trademarks
NVIDIA, CUDA, and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation
in the United States and other countries. Other company and product names may be trademarks of the
respective companies with which they are associated.
Copyright
© 2006–2007 by NVIDIA Corporation. All rights reserved.
NVIDIA Corporation
Table of Contents
CUFFT Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
CUFFT Types and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Type cufftHandle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Type cufftResult . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Type cufftReal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Type cufftComplex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
CUFFT Transform Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
CUFFT Transform Directions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
CUFFT API Functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Function cufftPlan1d(). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
Function cufftPlan2d(). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Function cufftPlan3d(). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Function cufftDestroy() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Function cufftExecC2C() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Function cufftExecR2C() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Function cufftExecC2R() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Accuracy and Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
CUFFT Code Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1D Complex-to-Complex Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1D Real-to-Complex Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2D Complex-to-Complex Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2D Complex-to-Real Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3D Complex-to-Complex Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
PG-00000-003_V1.0
NVIDIA
iii
CUFFT Library
This document describes CUFFT, the NVIDIA® CUDA™ (compute
unified device architecture) Fast Fourier Transform (FFT) library. The
FFT is a divide‐and‐conquer algorithm for efficiently computing
discrete Fourier transforms of complex or real‐valued data sets, and it
is one of the most important and widely used numerical algorithms,
with applications that include computational physics and general
signal processing. The CUFFT library provides a simple interface for
computing parallel FFTs on an NVIDIA GPU, which allows users to
leverage the floating‐point power and parallelism of the GPU without
having to develop a custom, GPU‐based FFT implementation.
FFT libraries typically vary in terms of supported transform sizes and
data types. For example, some libraries only implement Radix‐2 FFTs,
restricting the transform size to a power of two, while other
implementations support arbitrary transform sizes. This version of the
CUFFT library supports the following features:
1D, 2D, and 3D transforms of complex and real‐valued data.
Batch execution for doing multiple 1D transforms in parallel.
2D and 3D transform sizes in the range [2, 16384] in any
dimension.
1D transform sizes up to 8 million elements.
In‐place and out‐of‐place transforms for real and complex data.
CUFFT Types and Definitions
The next sections describe the CUFFT types and transform directions:
“Type cufftHandle” on page 2
“Type cufftResult” on page 2
“Type cufftReal” on page 2
“Type cufftComplex” on page 3
PG-00000-003_V1.0
NVIDIA
1
CUDA
CUFFT Library
“CUFFT Transform Types” on page 3
“CUFFT Transform Directions” on page 4
Type cufftHandle
typedef unsigned int cufftHandle;
is a handle type used to store and access CUFFT plans. For example,
the user receives a handle after creating a CUFFT plan and uses this
handle to execute the plan.
Type cufftResult
typedef enum cufftResult_t cufftResult;
is an enumeration of values used exclusively as API function return
values. The possible return values are defined as follows:
Return Values
CUFFT_SUCCESS
CUFFT_INVALID_PLAN
CUFFT_ALLOC_FAILED
CUFFT_INVALID_TYPE
CUFFT_INVALID_VALUE
CUFFT_INTERNAL_ERROR
CUFFT_EXEC_FAILED
CUFFT_SETUP_FAILED
CUFFT_SHUTDOWN_FAILED
CUFFT_INVALID_SIZE
Any CUFFT operation is successful.
CUFFT is passed an invalid plan handle.
CUFFT failed to allocate GPU memory.
The user requests an unsupported type.
The user specifies a bad memory pointer.
Used for all internal driver errors.
CUFFT failed to execute an FFT on the GPU.
The CUFFT library failed to initialize.
The CUFFT library failed to shut down.
The user specifies an unsupported FFT size.
Type cufftReal
typedef float cufftReal;
is a single‐precision, floating‐point real data type.
PG-00000-003_V1.0
NVIDIA
2
CUDA
CUFFT Library
Type cufftComplex
typedef float cufftComplex[2];
is a single‐precision, floating‐point complex data type that consists of
interleaved real and imaginary components.
CUFFT Transform Types
The CUFFT library supports complex‐ and real‐data transforms. The
cufftType data type is an enumeration of the types of transform data
supported by CUFFT:
typedef enum cufftType_t {
CUFFT_R2C = 0x2a, // Real to complex (interleaved)
CUFFT_C2R = 0x2c, // Complex (interleaved) to real
CUFFT_C2C = 0x29 // Complex to complex, interleaved
} cufftType;
For complex FFTs, the input and output arrays must interleave the real
and imaginary parts (the cufftComplex type). The transform size in
each dimension is the number of cufftComplex elements. The
CUFFT_C2C constant can be passed to any plan creation function to
configure a complex‐to‐complex FFT.
For real‐to‐complex FFTs, the output array holds only the non‐
redundant complex coefficients. So for an N‐element transform, the
output array holds N/2+1 cufftComplex terms. For higher‐
dimensional real transforms of the form N0×N1×...×Nn, the last
dimension is cut in half such that the output data is N0×N1×...×(Nn/
2+1) complex elements. Therefore, in order to perform an in‐place
FFT, the user has to pad the input array in the last dimension to (Nn/
2+1) complex elements or 2*(N/2+1) real elements. Note that the
real‐to‐complex transform is implicitly forward. Passing the
CUFFT_R2C constant to any plan creation function configures a real‐to‐
complex FFT.
The requirements for complex‐to‐real FFTs are similar to those for real‐
to‐complex. In this case, the input array holds only the non‐redundant,
N/2+1 complex coefficients from a real‐to‐complex transform. The
output is simply N elements of type cufftReal. However, for an in‐
place transform, the input size must be padded to 2*(N/2+1) real
3
NVIDIA
PG-00000-003_V1.0
CUDA
CUFFT Library
elements. The complex‐to‐real transform is implicitly inverse. Passing
the CUFFT_C2R constant to any plan creation function configures a
complex‐to‐real FFT.
For 1D complex‐to‐complex transforms, the stride between signals in a
batch is assumed to be the number of cufftComplex elements in the
logical transform size. However, for real‐data FFTs, the distance
between signals in a batch depends on whether the transform is in‐
place or out‐of‐place. For in‐place FFTs, the input stride is assumed to
be 2*(N/2+1) cufftReal elements or N/2+1 cufftComplex elements.
For out‐of‐place transforms, the input and output strides match the
logical transform size (N) and the non‐redundant size (N/2+1),
respectively.
CUFFT Transform Directions
The CUFFT library defines forward and inverse Fast Fourier
Transforms according to the sign of the complex exponential term:
#define CUFFT_FORWARD -1
#define CUFFT_INVERSE 1
For higher‐dimensional transforms (2D and 3D), CUFFT performs
FFTs in row‐major or C order. For example, if the user requests a 3D
transform plan for sizes X, Y, and Z, CUFFT transforms along Z, Y, and
then X. The user can configure column‐major FFTs by simply changing
the order of the size parameters to the plan creation API functions.
CUFFT performs un‐normalized FFTs; that is, performing a forward
FFT on an input data set followed by an inverse FFT on the resulting
set yields data that is equal to the input scaled by the number of
elements. Scaling either transform by the reciprocal of the size of the
data set is left for the user to perform as seen fit.
CUFFT API Functions
The CUFFT API is modeled after FFTW (see http://www.fftw.org),
which is one of the most popular and efficient CPU‐based FFT
libraries. FFTW provides a simple configuration mechanism called a
plan that completely specifies the optimal—that is, the minimum
PG-00000-003_V1.0
NVIDIA
4
CUDA
CUFFT Library
floating‐point operation (flop)—plan of execution for a particular FFT
size and data type. The advantage of this approach is that once the
user creates a plan, the library stores whatever state is needed to
execute the plan multiple times without recalculation of the
configuration. The FFTW model works well for CUFFT because
different kinds of FFTs require different thread configurations and
GPU resources, and plans are a simple way to store and reuse
configurations.
The CUFFT library initializes internal data upon the first invocation of
an API function. Therefore, all API functions could return the
CUFFT_SETUP_FAILED error code if the library fails to initialize. CUFFT
shuts down automatically when all user‐created FFT plans are
destroyed.
The CUFFT functions are as follows:
“Function cufftPlan1d()” on page 5
“Function cufftPlan2d()” on page 6
“Function cufftPlan3d()” on page 7
“Function cufftDestroy()” on page 7
“Function cufftExecC2C()” on page 8
“Function cufftExecR2C()” on page 8
“Function cufftExecC2R()” on page 9
Function cufftPlan1d()
cufftResult
cufftPlan1d( cufftHandle *plan, int nx, cufftType type,
int batch );
creates a 1D FFT plan configuration for a specified signal size and data
type. The batch input parameter tells CUFFT how many 1D
transforms to configure.
Input
plan
nx
type
batch Number of transforms of size nx
Pointer to a cufftHandle object
The transform size (e.g., 256 for a 256-point FFT)
The transform data type (e.g., CUFFT_C2C for complex to complex)
5
NVIDIA
PG-00000-003_V1.0