C674x DSPLIB
From Texas Instruments Embedded Processors Wiki
Contents
[hide]
1 Introduction
o
o
o
o
o
o
1.1 Installation
1.2 Usage
1.3 Performance
1.4 Interruptibility
1.5 File Structure
1.6 Known Issues
2 Single-Precision Kernels
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
2.1 DSPF_sp_autocor (Autocorrelation)
2.2 DSPF_sp_biquad (Biquad Filter)
2.3 DSPF_sp_blk_move (Block Copy)
2.4 DSPF_sp_convol (Convolution)
2.5 DSPF_sp_dotp_cplx (Complex Dot Product)
2.6 DSPF_sp_dotprod (Dot Product)
2.7 DSPF_sp_fftSPxSP (Mixed Radix Forward FFT with Bit
Reversal)
2.8 DSPF_sp_fir_cplx (Complex FIR Filter)
2.9 DSPF_sp_fir_gen (FIR Filter)
2.10 DSPF_sp_fir_r2 (FIR Filter Alternate Implementation)
2.11 DSPF_sp_fircirc (FIR Filter with Circular Input)
2.12 DSPF_sp_ifftSPxSP (Mixed Radix Inverse FFT with Bit
Reversal)
2.13 DSPF_sp_iir (IIR Filter)
2.14 DSPF_sp_iirlat (Lattice IIR Filter)
2.15 DSPF_sp_lms (LMS Adaptive Filter)
2.16 DSPF_sp_mat_mul (Matrix Multiply)
2.17 DSPF_sp_mat_mul_cplx (Complex Matrix Multiply)
2.18 DSPF_sp_mat_trans (Matrix Transpose)
2.19 DSPF_sp_maxidx (Maximum Index)
2.20 DSPF_sp_maxval (Maximum Value)
2.21 DSPF_sp_minerr (VSELP Vocoder Codebook Search)
2.22 DSPF_sp_minval (Minimum Value)
o
o
o
o
2.23 DSPF_sp_vecmul (Vector Multiplication)
2.24 DSPF_sp_vecrecip (Vector Reciprocal)
2.25 DSPF_sp_vecsum_sq (Vector Sum of Squares)
2.26 DSPF_sp_w_vec (Vector Weighted Sum)
3 Miscellaneous Kernels
o
o
o
o
o
3.1 DSPF_blk_eswap16 (16-bit Endianness Swap)
3.2 DSPF_blk_eswap32 (32-bit Endianness Swap)
3.3 DSPF_blk_eswap64 (64-bit Endianness Swap)
3.4 DSPF_fltoq15 (Float to Q.15 Conversion)
3.5 DSPF_q15tofl (Q.15 to Float Conversion)
Introduction
The C674x DSPLIB is a partial C port of the C67x DSPLIB. The pre-existing
library was written in assembly and suffered from bugs and undocumented
code requirements. The new release is intended to correct these problems
and provide a more coherent and maintainable code base. Each kernel
includes source for a "natural" C and optimized C kernel, as well as a
sample project demonstrating its use.
Please note that the C674x DSPLIB is a floating point library. For fixed
point computation, the C674x core is fully compatible with the C64x+
DSPLIB.
Installation
Visit the C674x DSPLIB web page on ti.com:
Download the Windows Installer sprc900.zip
Download the Linux Installer sprc906.gz (tar -xzf sprc906.gz to
uncompress)
Usage
The DSPLIB contains a pre-compiled library file and C header. To use the
DSPLIB, simply include the library file, dsplib674x.lib, in your project
and include the header file in your C source:
#include "dsplib674x.h"
Note that the compiler must know to look for header files in your DSPLIB
installation folder. This is easily achieved with a compiler directive
similar to -i"C:\CCStudio_v3.3\c674x\dsplib_v12".
The DSPLIB can be re-built using the dsplib674x.pjt CCS project file. This
will pull in any modifications that you have made to the individual kernel
source files.
Performance
The performance of the optimized C kernels should be better than or
comparable to the performance of their ASM counterparts. Certain kernels
may retain their older assembly implementation if the C version can't
match its efficiency. Also, some FFT kernels have received new and
improved assembly implementations due to their performance-critical
nature. Detailed comparisons of the kernels' C674x and C67x
implementations are listed in a development notes spreadsheet. This file
is included in the docs folder of the C674x DSPLIB installation.
To benchmark the DSPLIB kernels, TI recommends the use of the C674x Cycle
Accurate Simulator, which is included in Code Composer Studio 3.3 with
Service Release 12 or later. After loading CCS, select
Profile->Clock->Enable. This will allow the kernel demonstration apps to
accurately display cycle counts. Otherwise, the cycle counts will likely
be incorrectly reported as zero.
Interruptibility
All code in the C674x DSPLIB is interrupt tolerant. Many of the C language
kernels are fully interruptible. To check whether a kernel is
interruptible, follow this procedure:
1. Open and build the kernel's demonstration project in the
src/DSPF_ folder in CCS.
2. If the project does not include the source file DSPF_.c,
it is an ASM language kernel and is not interruptible.
3. If the project does include the source file DSPF_.c,
building the project created a file named DSPF_.asm. Open
this file.
4. If the file DSPF_.asm contains a SPLOOP instruction, the
kernel is interruptible. Otherwise, the kernel is not
interruptible.
If a C language kernel is not interruptible, it may be possible to
sacrifice some performance to gain interruptibility. One method is to use
the -ms0 (or --opt_for_space) compiler directive. This will encourage the
compiler to use the SPLOOP opcode. In the top-level DSPLIB project, you
can apply this directive to a single source file (and not the entire
library) by right clicking the file in the project browser and selecting
"File Specific Options..." from the drop-down menu.
File Structure
In addition to the top-level library and project files, The DSPLIB
installation provides several source files for each kernel. Each file is
located in the src/DSPF_ folder within the DSPLIB installation.
Certain files may only be present for C or ASM language kernels.
DSPF_.c
DSPF_.asm
DSPF_.h
DSPF__cn.c
DSPF__cn.h
DSPF__opt.c
DSPF__d.c
C kernels only! Optimized C source for the
kernel. This code is used to build the
library.
ASM kernels only! Optimized ASM source for
the kernel. This code is used to build the
library. Note: building the demo app for
a C kernel may create a file with this
name.
C header for kernel. Included in the
library's top-level header file.
Natural C source for the kernel. This code
is functionally equivalent to the
optimized C source, but it is written to
maximize clarity rather than performance.
It is not included in the library itself.
C header for natural C implementation of
kernel.
ASM kernels only! Optimized C source for
the kernel. This code is provided as an
alternative implementation and is not
included in the library file.
Demonstration C source for the kernel. The
demonstration app shows a typical use case
for the kernel. It calls the optimized C
and natural C implementations to compare
results and efficiency. Note: the cycle
DSPF__legacy.asm
DSPF_.pjt
link.cmd
counts will only return non-zero values
when this code is run in Code Composer
Studio with the profiler enabled.
Assembly source for the same kernel from
the older C67x DSPLIB. This code is
provided purely for comparison purposes
within the demonstration app and is not
included in the library file. Some C67x
assembly kernels have bugs that cause them
to return incorrect results. In rare
cases, they may even crash the DSP. In this
case, the legacy kernel will not be called
in the demonstration app. If this file is
not present for a particular kernel, the
legacy ASM code is used in the library and
can be found in DSPF_.asm.
Code Composer Studio (CCS) project file
for the demonstration app. This file is
used to build the demonstration
application in CCS.
Linker command file for demonstration app
project. Used to build demonstration app.
Each demonstration app also makes use of a common source file that is used
to compare output data from the various implementations of the DSPLIB
kernel. This file, DSPF_util.c, is located in the src/DSPF_util folder
in the DSPLIB installation. This file is not used by the actual library
file itself.
Known Issues
Please refer to this topic.
Single-Precision Kernels
DSPF_sp_autocor (Autocorrelation)
The DSPF_sp_autocor kernel performs autocorrelation on input array x. The
result is stored in output array r.
Function
Parameters
void DSPF_sp_autocor(float *restrict r, float
*restrict x, const int nx, const int nr)
r
x
nx
nr
Pointer to output array. Must have nr elements.
Pointer to input array. Must have nx + nr elements
with nr 0-value elements at the beginning. Must be
double word aligned.
Length of input array. Must be an even number. Must
be greater than or equal to nr.
Length of autocorrelation sequence to calculate.
Must be divisible by 4 and greater than 0.
DSPF_sp_biquad (Biquad Filter)
The DSPF_sp_biquad kernel performs biquad filtering on input array x using
coefficient arrays a and b. The result is stored in output array y. A biquad
filter is defined as an IIR filter with three (3) forward coefficients
and two (2) feedback coefficients. The basic biquad transfer function can
be expressed as:
(TODO: biquad diagram?)
This kernel uses "delay" coefficients to simplify calculations. The delay
coefficients are defined as follows:
The delay coefficients must be pre-calculated before calling the
DSPF_sp_biquad kernel.
Function
Parameters
void DSPF_sp_biquad(float *restrict x, float *b,
float *a, float *delay, float *restrict y, const
int n)
x
b
a
Pointer to input array. Must be length n.
Pointer to forward coefficient array. Elements are
(in order) b0, b1, and b2 in the biquad equation.
Must be length 3.
Pointer to feedback coefficient array. Elements
are (in order) a0, a1, and a2 in the biquad
equation; a0 is not used. Must be an length 3.
Pointer to delay coefficient array. The delay
coefficients must be pre-calculated for the first
output sample according to the above equations. The
delay coefficients are overwritten by the kernel
when it returns. The array must be length 2.
Pointer to output array. Must be length n.
Length of input and output arrays. Must be an even
number.
delay
y
n
DSPF_sp_blk_move (Block Copy)
The DSPF_sp_blk_move kernel copies a specified number of data words from
input array x to output array y.
Function
Parameters
void DSPF_sp_blk_move(const float * x, float
*restrict y, const int n)
x
y
n
Pointer to input array. Must have n elements. Must
be double word aligned.
Pointer to output array. Must have n elements. Must
be double word aligned.
Length of input and output arrays. Must be an even
number and greater than 0.
DSPF_sp_convol (Convolution)
The DSPF_sp_convol kernel convolves input array x with coefficient array
h. The result is stored in output array y.
Function
Parameters
void DSPF_sp_convol(const float *x, const float
*h, float *restrict y, const short nh, const short
ny)
Pointer to input array. Must have ny + nh - 1
elements. Typically contains nh - 1 zero values at
the beginning and end of the array. Must be double
word aligned.
Pointer to coefficient array. Must have nh
elements.
x
h
y
nh
ny
Pointer to output array. Must have ny elements.
Must be double word aligned.
Length of coefficient array. Must be an even number
and greater than 0.
Length of input and output arrays. Must be an even
number and greater than 0.
DSPF_sp_dotp_cplx (Complex Dot Product)
The DSPF_sp_dotp_cplx kernel performs a dot product on two complex input
arrays. The real and imaginary portions of the result are stored to
separate output words.
Function
Parameters
void DSPF_sp_dotp_cplx(const float * x, const
float * y, int n, float * restrict re, float *
restrict im)
Pointer to first complex input array. Real and
imaginary elements are respectively stored at even
and odd index locations. Must have n * 2 elements.
Must be double word aligned.
Pointer to second complex input array. Real and
imaginary elements are respectively stored at even
and odd index locations. Must have n * 2 elements.
Must be double word aligned.
Number of complex values in input arrays. The
length of each array is actually 2 * n. Must be an
even number and greater than 0.
Pointer real output word. Typically the address of
a float variable.
Pointer imaginary output word. Typically the
address of a float variable.
x
y
n
re
im
DSPF_sp_dotprod (Dot Product)
The DSPF_sp_dotprod kernel performs a dot product on two input arrays and
returns the result.
Function
float DSPF_sp_dotprod(const float * x, const
float * y, const int n)