R
XAPP1033 (v1.0) December 5, 2007
Peak Cancellation Crest Factor Reduction 
Reference Design
Authors: Ed Hemphill, Steve Summerfield, George Wang, and Dave Hawke
Application Note: Virtex-5 and Virtex-4 Family
Summary
Introduction
This application note provides designers with a highly optimized solution for Crest Factor 
Reduction (CFR) that can be adapted to meet the needs of multiple air interfaces with minimum 
effort. The system-level performance of the Peak Cancellation method of CFR is shown to be 
better than other methods such as Peak Windowing and Noise Shaping. In addition, the Peak 
Cancellation method can be implemented more efficiently than the other methods, resulting in 
reduced overall cost.
Accompanying this application note are design files and test vectors for quickly evaluating the 
performance of the reference design within MATLAB®. Instructions on how to integrate the 
reference design into a larger system design are included. Design files are available for both 
Virtex™-4 and Virtex-5 device architectures.
The wireless industry is currently following an aggressive drive to reduce Capital Expenditure 
(CapEx) and Operating Expenditure (OpEx). Different dynamics can affect both of these to a 
lesser or greater extent. If a typical base station is broken down into its constituent components, 
it is estimated that an average of 40 to 60 percent of the overall CapEx cost is incurred with the 
radio cards. Since the radio shelf contains the power amplifiers, the radio portion of the design 
is also responsible for much of the OpEx incurred during the lifetime of the site. This is largely 
due to the low efficiency of the power amplifiers when operating in a highly linear region.
The OpEx cost is directly related to the power amplifier efficiency in the base station. Currently, 
a very small proportion of the DC power consumed by the base station is converted to radiated 
energy. The efficiency at which a power amplifier may be operated is a function of the 
transmitted signal. 3G signals have a high Peak to Average Power Ratio (PAPR) or Crest 
Factor. This imposes significant operating restrictions on the power amplifier. In order to handle 
the peaks, it is heavily backed off from its most efficient operating point. To increase efficiency, 
CFR algorithms can be used to decrease the PAPR of the transmitted signal prior to it entering 
the power amplifier. By doing so, the power amplifier can operate with less back off, and thus 
increased efficiency. Another method of improving the efficiency of the power amplifier is to use 
Digital Pre-Distortion (DPD). Rather than use digital signal processing to reduce the dynamic 
range of the transmitted signal (CFR), DPD is used to linearize the power amplifier itself. DPD 
is outside the scope of this document, but its reference is included as a widely used method of 
amplifier efficiency improvement.
In multi-carrier systems, such as WCDMA, TD-SCDMA and CDMA2000, the PAPR of the 
signal can be higher than in single carrier systems. In addition, the implementation of some 
CFR methods, such as Noise Shaping, are costly for multi-carrier systems. The peak 
cancellation CFR (PC-CFR) technique outlined in this application note is very well suited to 
multi-carrier systems, and can even be applied to radios where multiple standards may be 
required in the same radio transmission spectrum. 
This application note also illustrates the dramatic reduction in dynamic power between 
generations of FPGAs. This allows designers the ability to determine how much additional cost 
savings can be made when evaluating both Power Supply and Heatsinking needs of traditional 
chassis-mounted equipment and Remote Radio Head (RRH) applications.
© 2007 Xilinx, Inc. All rights reserved. XILINX, the Xilinx logo, and other designated brands included herein are trademarks of Xilinx, Inc. All other trademarks are the property 
of their respective owners.
XAPP1033 (v1.0) December 5, 2007
www.xilinx.com
 1
Description of Algorithm
Description of 
Algorithm
R
This section gives an overview of the PC-CFR algorithm followed by detailed descriptions of 
each main step in the algorithm. See OFDM for Wireless Multimedia Communications for an 
overview of PAPR reduction techniques, including peak cancellation [Ref 1].
Algorithm Overview
The peak cancellation method of CFR reduces the peak to average power ratio (PAPR) of a 
signal by subtracting spectrally shaped pulses from signal peaks that exceed a specified 
threshold. The cancellation pulses are designed to have a spectrum that matches that of the 
CFR input signal and therefore introduce negligible out-of-band interference. In general, the 
CFR input signal and cancellation pulses are complex, and the peak search (described in 
“Peak Detection,” page 4) is carried out on the signal magnitude.
Because the signals are complex, each cancellation pulse must be rotated to match the phase 
of the corresponding signal peak. The peak magnitude of a given cancellation pulse is set 
equal to the difference between the corresponding signal peak magnitude and the desired 
clipping threshold. This method reduces the signal peak magnitudes to the threshold value 
while preserving the signal phase. 
Figure 1 illustrates the peak cancellation process in the time domain. The top plot shows a 
section of the input signal magnitude. The horizontal line overlaid on the plot indicates the 
clipping threshold. Any peak that exceeds this threshold is a candidate for cancellation. The 
middle plot shows the magnitude of the cancellation pulse that is to be subtracted from the 
input signal. The bottom plot shows the magnitude of the output signal after subtracting the 
cancellation pulse from the input signal.
X-Ref Target - Figure 1
 
x 104
CFR Input Signal Magnitude
e
d
u
t
i
n
g
a
M
e
d
u
t
i
n
g
a
M
e
d
u
t
i
n
g
a
M
2
1
0
2
1
0
2
1
0
0
x 104
0
x 104
100
200
300
400
500
600
700
800
900
1000
Cancellation Pulse Magnitude
100
200
300
400
500
600
700
800
900
1000
CFR Output Signal Magnitude
0
100
200
300
500
400
Time (Samples)
600
700
800
900
1000
Figure 1: Time Domain View of Peak Cancellation
Figure 2 illustrates the characteristics of the peak cancellation method in the frequency domain 
for a typical multi-carrier configuration. The power spectral density (PSD) of the input signal is 
overlaid with the PSD of the cancellation pulse signal, also referred to as the clipping noise. The 
cancellation pulse illustrated in Figure 1 has frequency domain content as illustrated in 
Figure 2. In the case of a single carrier, the cancellation pulse would look much smoother. The 
XAPP1033 (v1.0) December 5, 2007
www.xilinx.com
 2
Description of Algorithm
R
somewhat noisy appearance of the cancellation pulse in the time domain is consistent with the 
non-symmetric multi-carrier spectrum in the frequency domain.
X-Ref Target - Figure 2
 
100
80
60
B
d
40
20
0
-20
 
PSD of Clipping Noise
 
Signal
Noise
-15
-10
-5
0
5
10
15
Frequency (MHz)
Figure 2: Frequency Domain View of Peak Cancellation
The peak cancellation method is similar to the noise shaping method of CFR that is described 
in XAPP921c, High Density WCDMA Digital Front End Reference Design [Ref 3]. In noise 
shaping, the signal is clipped and then subtracted from the original to produce a clipping noise. 
The clipping noise is filtered to confine its spectrum to that of the input signal. The spectrally 
shaped clipping noise is then subtracted from the original input signal to produce a PAPR 
reduced signal with minimal out-of-band degradation. 
Whereas the noise shaping method filters all samples of the clipping noise, the peak 
cancellation method filters only the peak samples of the clipping noise. Treating the peak 
samples as discrete delta functions allows the convolution to be replaced by a simple scaling of 
the filter impulse response. This results in less signal distortion because the time domain 
spread at the filter output is smaller compared to the noise shaping method. Because the 
filtering of the signal peaks is implemented via simple scaling of the filter impulse response, the 
computational burden is greatly reduced.
Algorithm Details
Figure 3 shows a block diagram of the PC-CFR algorithm. Peaks in the input signal are 
detected and cancelled to produce a reduced PAPR signal. The peak detect block works on the 
signal magnitudes to produce a peak location indicator along with magnitude and phase 
information for each peak. The difference between the peak magnitudes and the clipping 
threshold is generated by the peak scaling block. The magnitude difference is combined with 
the phase information to produce the complex weighting that is used to scale the cancellation 
pulse coefficients. The scaling and summation of a limited number of cancellation pulses 
replaces the more computationally intense convolution that is used in the noise shaping 
method. 
Throughout this application note, it is assumed that there are four cancellation pulse generators 
(CPGs) per iteration, which is a convenient choice for four clocks per sample. There is no 
inherent limitation to the algorithm regarding the number of CPGs per iteration. Choosing the 
number of CPGs to match the number of clocks per sample is done for hardware 
efficiency.Each CPG outputs an unscaled version of the cancellation pulse waveform aligned 
XAPP1033 (v1.0) December 5, 2007
www.xilinx.com
 3
Description of Algorithm
R
X-Ref Target - Figure 3
 
High
PAPR
Signal
with a peak location. Each CPG can cancel only one peak at a time. The length of the 
cancellation pulse combined with the number of CPGs determines the rate at which signal 
peaks can be cancelled. The allocator block controls the distribution of CPGs to incoming 
peaks. When a new peak is detected, the allocator assigns an available CPG to the 
cancellation of that peak. If all CPGs are busy when a new peak is detected, it will not be 
cancelled. Multiple iterations of the algorithm are necessary to eliminate the peaks that were 
not cancelled during an earlier pass of the algorithm. The final step in the algorithm is to 
subtract the summation of the CPG outputs from a delayed version of the input signal.
Peak
Detect
Peak
Locations
Allocator
Delay
Mag
Phase
Peak
Scaling
Reduced
PAPR
Signal
×
×
CPG #1
CPG #2
CPG #3
CPG #4
Sum
×
×
Figure 3: Block Diagram of PC-CFR Algorithm
Peak Detection
There are multiple ways to define a signal peak. One common method defines a peak as any 
sample that has magnitude greater than its neighboring samples. This method has the 
advantage that it is simple to implement and results in a fixed delay from the detection of the 
peak to the peak location. However, it has the disadvantage that it may result in the detection of 
many local peaks in a single over-threshold region. Attempting to cancel many closely spaced 
peaks at once can lead to constructive interference of the cancellation pulses and leads to peak 
regrowth. Moreover, the allocation of CPGs will be less than optimal because a cluster of peaks 
may consume all the CPG resources. 
An alternate method of detecting peaks is based on finding the highest peak within an over-
threshold region. This has the advantage that only one peak is detected per over-threshold 
region thus reducing the effects of peak regrowth and improving the CPG allocation statistics. 
This method is illustrated in Figure 4. Note that multiple peaks exist in the second over-
threshold region, but only the highest peak is selected for cancellation. The disadvantage of 
this method is the variable delay from the peak location to the detection of the peak. This is 
because the algorithm must wait for the signal to cross below the clipping threshold before 
declaring the highest peak in that region. The length of the delay is a function of the signal 
characteristics and ratio of sampling rate to occupied bandwidth. The maximum delay from 
signal peak to threshold crossing increases as the ratio of sampling rate to occupied bandwidth 
increases. 
Performance can be improved by using a detection threshold that is slightly higher than the 
desired clipping threshold. This allows the algorithm to ignore peaks that are just barely 
crossing the threshold and focus on peaks that exceed the threshold by some delta. The main 
reason this provides improvement is the fact that some peak regrowth occurs, which can result 
in many near threshold peaks. Allocating CPG resources to these small peaks during a second 
XAPP1033 (v1.0) December 5, 2007
www.xilinx.com
 4
Description of Algorithm
R
iteration would provide minimal PAPR reduction with the risk of missing larger peaks that were 
not cancelled during the first iteration.
X-Ref Target - Figure 4
 
e
d
u
t
i
n
g
a
M
 
l
a
n
g
S
i
Peak Scaling
8
7
6
5
4
3
2
1
0
0
Example Signal Peaks
Selected
Selected
Not
Selected
5
10
15
Time (Samples)
Figure 4: Illustration of Peak Detection Method
The peak scaling step in the algorithm determines the complex scaling applied to the 
cancellation pulse coefficients for each peak. The magnitude of the scaling is equal to the 
difference between the signal peak and the clipping threshold. The phase is set equal to that of 
the signal peak. Mathematically this is expressed in Equation 1.
(
x γ–
) e jθ
×
Equation1
In this equation, α is the complex scaling value, |x| is the magnitude of the signal peak, γ is the 
clipping threshold, and θ is the phase of the signal peak.
α
=
Allocator
The allocator controls the assignment of CPG resources to the task of canceling incoming 
peaks. During startup, all CPGs are available. When the first peak arrives, the allocator assigns 
the first CPG to cancel it and then tags that CPG as being allocated. Once allocated, a CPG 
becomes unavailable for the length of the cancellation pulse (in samples). When subsequent 
peaks arrive, the allocator steps through the status of each CPG and assigns the first one 
available. Peaks that arrive when all CPGs are currently busy will not get cancelled and must be 
picked up by a subsequent iteration of the algorithm.
There are times when the input signal exhibits a high density of over-threshold peaks in clusters 
(for example, two non-adjacent carriers). This can lead to less than optimal allocation of CPGs 
and contribute to high peak regrowth. To mitigate the degradation, an allocator spacing 
parameter is used to prevent cancellation of peaks that are closer than some specified distance 
from an already allocated peak.
Cancellation Pulse Generator
Each cancellation pulse generator, or CPG, produces an unscaled copy of the stored 
cancellation pulse. The cancellation pulse is designed to occupy the same frequency bands as 
the input signal. The cancellation pulse coefficients can be obtained using any preferred filter 
design methodology and are computed off-line before being written to the PC-CFR design. 
Memory that is external to the design may be used to store multiple sets of cancellation pulse 
XAPP1033 (v1.0) December 5, 2007
www.xilinx.com
 5
Description of Algorithm
R
coefficients corresponding to pre-determined carrier configurations. Transferring a selected set 
of coefficients into the PC-CFR memory can be handled with some simple multiplexing 
circuitry. Handling configurations that are not pre-determined requires additional processing as 
outlined in the remainder of this section.
For multi-carrier configurations, it is useful to first design a prototype filter that is matched to the 
spectrum of a single carrier. Frequency shifted replicas of the prototype filter are then placed at 
each carrier center frequency before being summed to create a composite multi-band filter. An 
example of this process is illustrated in Figure 5. The prototype filter in this case was obtained 
using the firls function in MATLAB followed by windowing with a Kaiser window. In this example, 
the prototype filter is shifted to six different center frequencies to match the spectrum of a six-
carrier input signal. Mathematically, the composite multi-carrier coefficients, h(k), are generated 
as shown in Equation 2.
h k( )
=
M
∑
i 1=
j2π k N 2⁄
–(
e
)fi fs⁄
g k( ) k
=
0 1 2…N 1–
,
,
Equation2
In this equation, M is the number of carriers, N is the filter length, fi is the carrier frequency of 
the ith carrier, fs is the sampling frequency, and g(k) is the prototype filter.
Although the design of the prototype filter requires some rather complex computations, the 
frequency shifting and summing can be done in firmware using Equation 2. The prototype filter 
can be pre-calculated and then stored in memory, and the frequency shifting and adding can be 
performed either in an external processor or by additional circuitry in the FPGA (not included in 
the PC-CFR design).
As in any filter design, a trade-off exists between cancellation pulse length and frequency 
response characteristics. Achieving sharp transition bands in the frequency domain comes at 
the expense of long filter lengths, which for PC-CFR limits the density of peaks that can be 
cancelled. Conversely, requiring a shorter filter length comes at the expense of wider transition 
bands. It may be acceptable to allow some out-of-band leakage to reduce filter length as long 
as the final signal complies with the spectral emission mask (SEM) and adjacent channel 
leakage ratio (ACLR) requirements. The fact that the clipping noise power is usually 
significantly lower (for example, 20 dB) than the signal power helps in this process.
Magnitude Response of Prototype Filter
-8
-6
-4
-2
0
2
4
6
8
10
Magnitude Response of Multiband Filter
X-Ref Target - Figure 5
 
0
-50
-100
-10
0
-50
)
B
d
(
 
e
d
u
t
i
n
g
a
M
)
B
d
(
 
e
d
u
t
i
n
g
a
M
-100
-10
-8
-6
-4
0
-2
2
Frequency (MHz)
4
6
8
10
Figure 5: Multi-band Filter Creation from Prototype Filter
XAPP1033 (v1.0) December 5, 2007
www.xilinx.com
 6
CFR Performance
R
CFR 
Performance
One of the key features of the PC-CFR algorithm is its ability to support multiple air interface 
standards simply by changing the prototype filter. In fact, multiple prototype filters could be 
combined to support multiple air interfaces simultaneously. For example, a 5 MHz WCDMA 
carrier could coexist with a 10 MHz WiMAX carrier by shifting the individual prototype filters to 
the corresponding center frequencies of each carrier.
This section summarizes the performance of the PC-CFR algorithm using TD-SCDMA as an 
example. The methodology and assumptions are included, as well as detailed performance 
results. “Comparison to Other Methods,” page 11, compares the performance of the PC-CFR 
method with two other popular methods: peak windowing CFR (PW-CFR) and noise shaping 
CFR (NS-CFR). Although the results shown are for TD-SCDMA, the general conclusions are 
expected to hold for other air interface standards such as WCDMA, WiMAX, and 3GPP LTE.
Methodology and Assumptions
The results presented in this section were obtained using Gaussian baseband data per TD-
SCDMA time slot. Each time slot contains 864 chips worth of data; the last 16 of which are 
zeroed to model the TD-SCDMA guard period. The chip rate is 1.28 Mcps, and the CFR output 
sample rate is 76.8 Msps for an interpolation factor of 60 samples per chip. Up to six active 
carriers may be present and each carrier occupies a bandwidth of 1.6 MHz. A total bandwidth 
of 10 MHz is allocated for six adjacent carriers and 15 MHz for six non-adjacent carriers. The 
baseband data for each carrier is interpolated by 60 and pulse shaped using a square-root 
raised-cosine (RRC) filter with roll-off parameter equaling 0.22 as defined in 3GPP TS 25.105 
[Ref 2]. 
The 3GPP TS 25.105 specification defines the EVM measurement interval to be one time slot. 
The results presented here are based on 10 time slots worth of data, where each time slot has 
equal average power. The reason for doing this is that 864 chips worth of data are not sufficient 
to provide statistically significant PAPR results at the 0.01% probability of clip point. In order to 
obtain reasonably accurate complementary cumulative distribution function (CCDF) curves, it 
is necessary to run the simulations for 8640 chips.
The baseline requirements for the CFR performance are listed in Table 1. The spectral 
emission mask (SEM) is modified from the one defined in 3GPP TS 25.105 to be consistent 
with a more stringent ACLR requirement of 60 dB.
In this document, the following definition of EVM is used:
EVM 100
=
×
σ
x αy–
----------------
σ
x
Equation3
In this equation, σx is the standard deviation of the CFR input signal and σx-αy is the standard 
deviation of the error between the CFR input signal and a scaled version of the CFR output 
signal. The scaling term α is obtained by performing a least-squares fit between the CFR input 
and output signals. This definition results in a single measure of EVM for the composite multi-
carrier waveform. There may be some variation between carriers when measuring EVM after 
RRC matched filtering, but the variations are typically within a few tenths of a percent of the 
composite multi-carrier EVM.
Table  1:  CFR Performance Requirements
Parameter
PAPR Reduction
EVM
Requirement
Comments
> 3.0 dB @ 0.01% in 10 MHz bandwidth
> 2.8 dB @ 0.01% in 15 MHz bandwidth
≤ 7%
XAPP1033 (v1.0) December 5, 2007
www.xilinx.com
 7
CFR Performance
R
Table  1:  CFR Performance Requirements
Parameter
Requirement
Comments
ACLR
SEM
> 60 dB
> 40 dB attenuation at 0.8 MHz offset
> 60 dB attenuation beyond 1.0 MHz offset
Exceeds the 3 GPP TS 
25.105 requirements
Results for four different carrier configurations are presented. The definition of each 
configuration is listed in Table 2. An emphasis is placed on the six non-adjacent carrier case 
because it covers what is believed to be a challenging yet realistic carrier configuration. The two 
non-adjacent carriers case is typically the worst case scenario in terms of stressing a CFR 
algorithm, but this case may not be very common in a TD-SCDMA system. The three adjacent 
and six adjacent carrier cases are expected to be more common, and typically result in better 
CFR performance than when the carriers are not adjacent.
Table  2:  Carrier Configurations
Description
Carrier Center Frequencies (MHz)
Six non-adjacent carriers
Two non-adjacent carriers
Three adjacent carriers
Six adjacent carriers
PC-CFR Performance
[-6.4, -3.2, 0, 1.6, 3.2, 6.4]
[-4.0, 4.0]
[-1.6, 0, 1.6]
[-4.0, -2.4, -0.8, 0.8, 2.4, 4.0]
This section summarizes the performance of the PC-CFR method. In all cases, the number of 
cancellation pulse generators is four and the length of the cancellation pulse is 255. Although 
not shown, good results can also be obtained using a different number of CPGs per iteration. 
Tradeoffs between the number of CPGs per iteration, filter length, and the number of iterations 
can be made to tune performance. The cancellation pulse was designed using the firls function 
in MATLAB with Fpass=0.9/Fs and Fstop=1.3×Fpass followed by windowing with a Kaiser window 
(β=5). Results are presented based on using either two or three iterations of the algorithm.
Table 3 shows the PAPR reduction (dPAPR) versus EVM performance of the PC-CFR 
algorithm when using only two iterations for the six non-adjacent carrier case. All dPAPR results 
are referenced at the 0.01% probability of clip point. The PAPR of the CFR input signal is 9.91 
dB. The clipping ratio is defined as the ratio of the clipping threshold to the standard deviation 
of the CFR input signal expressed in dB. The performance when using three iterations is shown 
in Table 4. With the exception of the highest EVM, there is no improvement in going from two 
iterations to three iterations for this case.
The upper and lower ACLR values are calculated as described in 3GPP TS 25.105 [Ref 2]. The 
upper ACLR is measured using the first adjacent channel to the right of the highest active 
carrier. The lower ACLR is measured using the first adjacent channel to the left of the highest 
active carrier. For cases where the highest active carrier is adjacent to another active carrier, 
and assuming the carriers have equal power, the lower ACLR should be close to 0 dB.
As was mentioned in “Cancellation Pulse Generator,” page 5, a trade-off exists between 
cancellation pulse length and frequency response characteristics. Longer pulse lengths can 
provide better spectral performance at the expense of increased EVM. Even when the filter 
length is held constant, a tradeoff exists between frequency-domain performance and time-
domain performance. For example, if Fpass = 0.6/Fs and Fstop = 1.0/Fs, then the upper ACLR in 
XAPP1033 (v1.0) December 5, 2007
www.xilinx.com
 8