Signal Reconstruction using Sparse Tree Representations
Chinh La and Minh N. Do
Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
ABSTRACT
Recent studies in linear inverse problems have recognized the sparse representation of unknown signal in a
certain basis as an useful and effective prior information to solve those problems.
In many multiscale bases
(e.g. wavelets), signals of interest (e.g. piecewise-smooth signals) not only have few significant coefficients, but
also those significant coefficients are well-organized in trees. We propose to exploit the tree-structured sparse
representation as additional prior information for linear inverse problems with limited numbers of measurements.
We present numerical results showing that exploiting the sparse tree representations lead to better reconstruction
while requiring less time compared to methods that only assume sparse representations.
1. INTRODUCTION
Recently, there have been a series of remarkable work showing that one can effectively exploit the knowledge that
the unknown signal has a sparse representation in a certain basis, and use it as a prior information for linear inverse
problems1–5 (see also6 for a related result in sampling). The information that an unknown signal can be well
approximated using only K coefficients indicates that perhaps the M signal samples can be reconstructed from
only K linear measurements (K ≪ M ). The main difficulty, however, lies in the fact that the K coefficients used
in the approximation must be chosen adaptively, i.e., by computing all M coefficients and keeping only the K most
significant ones; whereas the measurements in inverse problems are typically collected non-adaptively. Despite
this difficulty, it is shown in2, 3 that under certain conditions, with high probability, the sparse representation
can be reconstructed accurately using O(K log M ) non-adaptive linear measurements.
These new results could have great impact in a wide range of inverse problems, from remote sensing to medical
imaging, where due to physical constraints only a limited number of measurements can be acquired from the
unknown object. Note that these results only rely on the fact that the unknown object has sparse representation
or fast decay in a certain transform.
Fig. 1 shows an example of sparse representations where only a few coefficients in the transform domain
are significant (i.e., have magnitude above a certain threshold). The power of the sparse representations is
demonstrated via accurate reconstruction of the original signal by keeping only K most significant coefficients,
where K is much smaller than the number M of samples in the original signals. Furthermore, theoretical studies
show that the convergence rates for such nonlinear approximations are optimal for certain classes of signals, such
as the class of piecewise smooth signals.7
Examining the sparse representations in Fig. 1(c), we notice that not only do they contain few significant
coefficients, but also those significant coefficients are well-organized in trees. Significant wavelet coefficients
of piecewise-smooth signals or images are localized at discontinuity points.8
In fact, these embedded tree
structures for significant wavelet coefficients have been used successfully in compression,9 modeling,10, 11 and
approximation.12
We propose to exploit the tree-structured sparse representation as additional prior information for linear
inverse problems with limited numbers of measurements. Heuristically, a general sparse representation with K
coefficients can be specified with 2K numbers: K for the values of the significant coefficients and another K for
the locations of these coefficients. If the K significant coefficients are known to be organized in trees then the
indexing cost is significantly reduced and hence the total description is reduced. Exploiting this embedded tree
structure in addition to the sparse representation prior in inverse problems would potentially lead to: (1) better
reconstructed signals; (2) reconstruction using fewer measurements; and (3) faster reconstruction algorithms.
40
30
20
10
0
−10
−20
−30
Original (M = 256)
50
100
150
200
250
40
30
20
10
0
−10
−20
−30
Reconstructed from K = 32 coefficients
50
100
150
200
250
(a)
Wavelet coefficients
(b)
25
24
23
22
21
(c)
Figure 1. Examples of sparse representations: (a) original signal with M = 256 samples; (b) wavelet approximation
using K = 32 coefficients; (c) wavelet transform.
In this paper, we introduce a new pursuit algorithm which tracks for significant coefficients on trees, and thus
is named Tree-based Orthogonal Matching Pursuit (TOMP). We also present experimental results showing that
TOMP run faster and give better reconstructions compared to existing algorithms such as Basis Pursuit13 and
Orthogonal Matching Pursuit.14
The organization of this paper is as follows. Section 2 formulates the inverse problem with sparse tree
representation. Section 3 reviews prior methods used for the inverse problem, namely Orthogonal Matching
Pursuit (OMP) and Basis Pursuit (BP). Our new method, named Tree-based Orthogonal Matching Pursuit
(TOMP), is introduced in Section 4. Section 5 investigates numerical results from the experiments employing
the TOMP algorithm. We then conclude our paper in Section 6.
2. PROBLEM FORMULATION
Suppose that there is an unknown length-M signal s that we can only make a limited number of K linear
measurements. Normally, K is much smaller than M . Moreover, these measurements are non-adaptive, using
K measurement kernels mi’s as follows:
hmi, si = di,
i = 1, 2, · · · , K,
(1)
where di’s are the measurement data. The matrix M consists of all mi’s as its rows is called the measurement
matrix. Matrix M has size K × M and is non-adaptive, independent of the signal s. Equation (1) can be
rewritten as:
We suppose that s has a basis expansion as
M s = d.
s = T c,
so solving c will give s. Let N = M T , then our measurement equation in (2) becomes
N c = d.
(2)
(3)
(4)
0.2
0.15
0.1
0.05
0
−0.05
−0.1
−0.15
−0.2
0
A sparse signal x
Upper Level
1
1
Nodes
1
1
1
0
1
50
100
150
200
250
Lower Level
(a) Sparse structure.
(b) Tree structure.
1
1
1
0
0
0
0
0
Figure 2. The sparse and tree structures of an example wavelet coefficient set.
For the case of interest where T is a wavelet transform matrix, c can be divided into two parts as
c = l
x ,
(5)
where l contains all the scaling coefficients, and x contains all the wavelet coefficients of s. We furthermore
suppose that N can be constructed so that l and x are measured separately. This scheme is called the hybrid
compress sensing in.3, 4 Specifically, we suppose that (4) can be written as
N c
def= I
0
0 A l
x = l
b def= d.
Thus, our inverse problem becomes solving for the wavelet coefficient vector x from the equation
Ax = b.
(6)
(7)
We recall that under the limited measurement condition, A has much less rows than columns; i.e. (7) is a
under-determined system. To solve this inverse problem, we notice that x has the following two special properties.
P1 Vector x has sparse structure; i.e. only few entries in x are nonzero or significant.
P2 Those significant entries of x are well organized in a tree structure (see below).
The tree structure of x is depicted in Figure 2(b). All wavelet coefficients in x are positioned in a tree where
levels (from top to bottom) are associated with the wavelet scales (from coarse to fine). In this paper, we use
the terms “father” and “child” to denote the obvious relationships among the nodes on the tree. In Figure 2(b),
nodes of value 1 stand for nonzero coefficients, and nodes of 0 stand for zero coefficients.
As can be seen from Figure 1(c), property P2 is due to the fact that nonzero coefficients correspond to
discontinuity points in the original signal s. From this, we notice that if a coefficient is nonzero or significant
then its father and all of its ancestors are likely nonzero or significant. Exploiting this property P2 is the key
point of our proposed algorithm, whereas existing algorithms such as BP and OMP only employ P1 to solve the
inverse problem (7).
3. EXISTING METHODS
3.1. Basis Pursuit (BP)
In the Basis Pursuit algorithm, which has been studied extensively in,1–4, 13 the solution of (7) is obtained by
solving the following l1-norm minimization problem.
ˆxBP = arg min
kxk1
x
subject to Ax = b
(8)
As described in,8, 13 problem (8) can be converted into a linear programming problem which can be imple-
mented easily, for example using the function linprog in Matlab.
Figure 3(a) and Figure 3(c) show the reconstructed results of piecewise second degree polynomials using the
BP algorithm. The original test signal is of length 256 with two discontinuity points, one is in the middle and
one is due to the border effect∗. In this experiment, the 5-level Daubechies 3 wavelet decomposition is applied.
Therefore, in the wavelet domain, l has length 8, while x has length 248. The total number of measurements is
equal to 8 (number of scaling coefficients) plus the number of rows of A. As could be seen, BP reconstruction
using 8 + 55 measurements gives a noisy reconstruction. However, BP reconstruction using 8 + 80 measurements,
about one third of the signal’s length, can provide good reconstruction with SNR = 33.99 dB.
An extensive study of BP reconstructions with varying number of measurements is plotted in Figure 3(e).
The same wavelet decomposition as the above experiment is used. The reconstructed SNRs are averaged over 10
randomly generated A’s and different positions of discontinuity points in the unknown signal s. A key observation
in4 suggests that acceptable BP reconstruction can be made from the number of measurements which is three
times the number of nonzero entries in x. This matches with the result in Figure 3(e), since the number of
nonzeros in x is observed to vary from 20 to 30.
3.2. Orthogonal Matching Pursuit (OMP)
As described in5, 14 OMP is a greedy method to solve equation (7). In this method, the measurement matrix A
is considered as a dictionary with all columns ai’s of A as atoms. In each iteration, OMP selects out from the
dictionary one atom that minimizes the difference, also called the residual, between b and its approximation.
Specially, start with r0 = b, the OMP algorithm selects the k-th atom as
And it updates the residual as
λk = arg max
i
|hrk−1, aii|.
rk = b − P
span{aλ1 ,aλ2 ,...,aλ
k }b.
(9)
(10)
Here, PV denotes the orthogonal projection onto the subspace V . The OMP algorithm in parallel applies
Ideally, the
the Gram-Schmidt orthogonalization upon chosen atoms for efficient computation of projections.14
number of iterations is equal to the number of nonzeros in x and then the residual becomes zero.
Numerical experiments with OMP reconstructions, using the same setup as the BP method described above,
are shown in Figure 3(b), (d), and (e). Through these experiments, we observe that using the same number of
measurements OMP can provide reconstructions that are compatible to those by BP. However, OMP requires
much less execution time than BP.
∗We use the periodic extension for all wavelet transforms in this paper.
s (dash) and shat (solid) with A has 55 rows
s (dash) and shat (solid) with A has 55 rows
0.1
0.05
0
−0.05
−0.1
−0.15
BP: 1.75 sec
SNR = 16.88 dB
0.1
0.05
0
−0.05
−0.1
−0.15
OMP: 0.469 sec
SNR = 16.07 dB
50
100
150
200
250
50
100
150
200
250
(a) BP reconstruction from 55 mea-
surements.
(b) OMP reconstruction from 55
measurements.
s (dash) and shat (solid) with A has 80 rows
s (dash) and shat (solid) with A has 80 rows
0.1
0.05
0
−0.05
−0.1
−0.15
BP: 2.766 sec
SNR = 33.99 dB
0.1
0.05
0
−0.05
−0.1
−0.15
OMP: 0.422 sec
SNR = 35.71 dB
50
100
150
200
250
50
100
150
200
250
(c) BP reconstruction from 80 mea-
surements.
(d) OMP reconstruction from 80
measurements.
60
50
40
30
20
10
)
B
d
(
R
N
S
e
g
a
r
e
v
A
0
0
20
BP (dash) and OMP (solid)
Basis Pursuit
OMP
40
Number of rows of A
60
80
100
120
(e) SNRs of BP and OMP reconstruc-
tions with varying number of measure-
ments.
Figure 3. Reconstructions of piecewise second degree polynomial using BP and OMP using different number of measure-
ments (counting the part for the 248 wavelet coefficients only).
4. TREE-BASED ORTHOGONAL MATCHING PURSUIT
The BP and OMP methods described in Section 3 only exploit the property P1 of the wavelet representation
x in the inverse problem Ax = b. We now motive an improved version of OMP, named Tree-based Orthogonal
Matching Pursuit (TOMP), that in additionally exploits the property P2 of x.
The main idea of TOMP is to select from A the columns associated with nodes on the tree that are corre-
sponding to significant coefficients in x. Let Λ be the index set of all columns {ai} of matrix A, and Λk be
the index set of all ai’s that are selected from the beginning until after step k. At a k-th iteration step, TOMP
selects a new set of columns (or atoms) of A via a two-stage selection process. In the first stage, like OMP,
TOMP exams the inner products of the current residual with the remaining atoms.
(k)
i = |hrk−1, aii|, where
c
i = max{c(k)
c∗(k)
: i ∈ Λ \ Λk−1}.
i ∈ Λ \ Λk−1,
i
Based on these inner products, TOMP narrows the search down to a small set of potential candidates:
Sk = {i : c(k)
i ≥ αc∗(k)
i
},
(11)
(12)
(13)
where α is a relaxation coefficient, 0 < α ≤ 1.
As we mention in Section 2, due to the property P2, if ai is the next selected atom then all of the ancestors
of ai are likely correspond to nonzero or significant entries in x, and thus should also be selected. We denote Fi
as the index collection of ai and its ancestors.
In the second stage, TOMP selects from the candidate set Sk the atom that together with its ancestors
provide the maximal reduction of residual, or
And then we add the whole family Fik of the chosen node ik to our collection Λk:
ik = arg min
i∈Sk
||b − Pspan{al:l∈Λk−1∪Fi}b||2.
Λk = Λk−1 ∪ Fik .
The residue is updated by deducting the projection of b onto the selected family:
rk = b − Pspan{al:l∈Λk}b
(14)
(15)
(16)
Similar to OMP, in TOMP we apply a Gram-Schmidt orthogonalization step to each newly selected atom to
obtain efficient computation for projections.
Equations (14)-(16) mark the major differences between OMP and our proposed TOMP. Intuitively, the
OMP algorithm sequentially selects the next “good” atom based only on its alignment (i.e. large inner product)
with the current residual. In contrast, the TOMP algorithm sequentially selects the whole next ”good” family
of atoms. The property P2 of x can be interpreted that if an atom is “good” then it should have a “good track
record”, i.e. all of its ancestors should also be “good”. Thus, we can see that by realizing on more information
when making decision, TOMP is more robust than OMP in selecting correct atoms. This is especially true when
there are only limited number of measurements.
The iteration steps of TOMP are repeated until the number of selected atoms in Λk reaches a predefined
portion (e.g. half) of the number of rows in A. This is because we cannot expect to recover a signal with more
degrees of freedom than the number of measurements. Another stopping rule in the noisy case is krkk2
2 ≤ ε
where ε is an optionally selected limit according to the noise level in measurement data. For example, in the
case when b is affected by an additive white Gaussian noise of variance σ2, ε can be chosen as N σ2, the noise
energy, where N is the length of b. These choices of the stopping limits regularize the effects of noise upon the
reconstructed signal ˆs and will be investigated later in Section 5.
Summary of the TOMP algorithm
INPUT:
• An N × M measurement matrix A where N ≪ M
• A data vector b of length N
OUTPUT:
• A reconstruction vector ˆx of length M that solves Ax = b and has sparse tree structure
PROCEDURE:
1. Initialize the residual r0 = b, the index set Λ0 = ⊘, and the counter k = 1. Create the index set Λ of all
columns in A.
2. Find columns of A which have large inner products with the residual:
c(k)
i = |hrk−1, aii|, where
i = max{c(k)
c∗(k)
Sk = {i : c
i ≥ αc∗(k)
i
(k)
i ∈ Λ \ Λk−1,
: i ∈ Λ \ Λk−1},
}.
i
(17)
(18)
(19)
3. For each i in Sk, form the family set Fi that consists of i and all of its ancestors.
4. Search among the candidate set Sk for the item that together with its family maximize the reduction of
residual:
5. Set Λk = Λk−1 ∪ Fik .
6. Update the residual:
ik = arg min
i∈Sk
||b − Pspan{al:l∈Λk−1∪Fi}b||2
rk = b − Pspan{al:l∈Λk}b
(20)
(21)
7. Compare krkk2
2 with a preselected limit ε, and compare the number of selected items in Λk with a preselected
limit ρN . If these limits are not reached then increase k by one and return to step 2.
8. Locations of nonzero coefficients of ˆx are listed in Λk. The values of those coefficients are in the expansion:
Pspan{al:l∈Λk}b = Xλ∈Λk
ˆxλaλ
(22)
Finally, we put ˆx and the scaling vector l together, then apply the inverse wavelet transform to get a
reconstruction ˆs of the original signal s.
0.5
0.4
0.3
0.2
0.1
0
−0.1
−0.2
−0.3
−0.4
−0.5
0.8
0.6
0.4
0.2
0
−0.2
shat
shat
s
0.5
0.4
0.3
0.2
0.1
0
−0.1
−0.2
−0.3
−0.4
−0.5
5
10
15
20
25
30
5
10
15
SNR = 6.60 dB
20
25
30
0.5
0.4
0.3
0.2
0.1
0
−0.1
−0.2
−0.3
−0.4
−0.5
5
10
15
20
SNR = 60.00 dB
25
30
(a) Original signal
(b) Reconstructed signal using OMP
(c) Reconstructed signal using TOMP
x
xhat
xhat
1
0.8
0.6
0.4
0.2
0
−0.2
−0.4
0.8
0.6
0.4
0.2
0
−0.2
5
10
15
20
25
30
5
10
15
20
25
30
5
10
15
20
25
30
(d) Original Haar wavelet coefficients
(e) Rec. wave. coefs. using OMP
(f) Rec. wave. coefs. using TOMP
Figure 4. Examples of reconstructions using OMP (orthogonal matching pursuit) and TOMP (tree-based orthogonal
matching pursuit). Both methods reconstruct the wavelet coefficients from 8 linear measurements.
5. EXPERIMENTAL RESULTS AND DISCUSSIONS
5.1. Experiments in no-noise cases
Throughout this section, there is no measurement noise and the stopping limit of OMP and TOMP is selected as
ε ∼= 0. In the first experiment, we test the OMP and TOMP reconstruction algorithms with piecewise-constant
signals and Haar wavelet transform. Figure 4(a) shows an example of a piecewise-constant signal of length
M = 32 with a discontinuity at sample k = 15. Figure 4(d) shows the signal in the wavelet domain with K = 5
non-zero coefficients, which can be organized in a tree. Suppose that N randomly chosen linear measurements
of the signal are given. Figure 4(b) and (c) shows example reconstructions with N = 8 random measurements.
In this case, OMP fails to reconstruct the original signal, while TOMP perfectly reconstructs the signal.
In the second experiment, OMP and TOMP are used to reconstruct a piecewise second degree polynomial
signal with two discontinuities. The wavelet transform is 5-level with Daubechies 3 wavelets. The original
signal of length 256 are reconstructed is reconstructed from a small number of measurements, only 40 for 248
wavelet coefficients. As in Figure 5(b) and Figure 5(c), TOMP captures the essential structure of the wavelet
representation, while OMP fails. This leads to a reconstruction SNR gain of about 9 dB by using TOMP over
OMP.
5.2. Experiments in noisy cases
In this section, signals are reconstructed from noisy data using TOMP and OMP algorithms. White Gaussian
noise is added to the data b at SNR = 10 dB. The stopping limit ε is selected in consideration of the noise energy
E = N σ2, where N is the length of b and σ2 is the noise variance. In our experiments, the values used for ε are
0.5 E, E and 2 E.
Figure 6 shows reconstructed results of a piecewise 3rd degree polynomial from noisy data using TOMP
and OMP. Since the wavelet decomposition is 5-level Daubechies 4, which has higher order than the signal, the