logo资料库

message passing algorithm(Message Passing Algorithms for Compres....pdf

第1页 / 共5页
第2页 / 共5页
第3页 / 共5页
第4页 / 共5页
第5页 / 共5页
资料共5页,全文预览结束
Message Passing Algorithms Sensing: I. Motivation for Compressed and Construction David L. Donoho Department Stanford of Statistics University Department Arian Maleki of Electrical Stanford University Engineering Department of Electrical Engineering Andrea Montanari and Department of Statistics Stanford University Abstract-In a recent paper, the authors proposed a new class iterative thresholding sparse signals from a small set of linear measurements of low-complexity algorithms for recon­ structing [I]. The new algorithms are broadly referred to as AMP, for approximate message passing. papers describing with the related literature, and new empirical evidence. AMP from standard sum-product belief propagation, extension in several directions. formal calculations This is the first of two conference connection of the original framework, the present paper outlines the derivation of and its based on statistical of these algorithms, mechanics methods. In particular, the derivation extensions We also discuss relations with to justify a suitable AMP by applying joint distribution propagation sum-product belief Sl, S2, ... , S N. over the variables In Section II we explain as follows: The paper is organized for used in this paper. We then derive the AMP associated the basis pursuit problem in Section III. IV, we consider the AMP for the basis pursuit the notations algorithm In Section denoising the algorithm of the elements explain rigorous (BPDN) or Lasso problem. setting to the Bayesian of So is known, in Section We will also generalize where the distribution V. Finally calculations based on non­ we will the connection statistical with formal VI. in Section methods and can be proofs are omitted Due to space limitations, mechanics found in a longer version of this paper [4]. I. INTRODUCTION Let So be a vector in R N . We observe of this vector through So from (y, A). Although is underdetermined, the underlying n < N linear the matrix A, y = Aso. the system signal can is 'simple' if it exactly in an appropriate sense. postulates or approximately A specific that s is exactly or approximately notion measurements The goal is to recover of equations still or 'structured' of 'simplicity' sparse. be recovered The f 1 minimization, also known as the basis pursuit [2], has attracted underdetermined systems. optimization problem: attention for its success in solving such It consists in solving the following minimize Ilslh , subject to As = y . (1) Many variations The interested reader While LP has polynomial generic imaging complexity through for use of this problem are too complex can be obtained data analysis. such as magnetic (LP) algorithms. programming standard in resonance Low computational of The solution linear complexity, LP solvers large scale applications, and seismic iterative ing choice for such applications. approaches referred final conclusion tuned iterative worse sparsity-undersampling of that paper is rather thresholding to [3] for a survey and detailed have been proposed. thresholding algorithms has made them an appeal­ of these algorithms tradeoff an algorithm to the low complexity than basis pursuit. that appears Recently [1], we proposed offer the best of both worlds: algorithm, thresholding [1]. This algorithm basis pursuit a broader family of algorithms, message approximate passing, is in fact an instance that was called of AMP, for in [1]. The goal of this paper is and the reconstruction power of the is The comparison. of iterative have a significantly disappointing: optimally II. NOTATIONS . . . represent a, b, c, . . . denote indices The letters and i, j, k, a, i element elements Si, Xi, and So,i respectively. in [NJ == {I, ... , of the matrix A will be indicated by Ya, in [nJ == {I, ... , n} N}. The indices as Aai. The y, s, x, and So are indicated of the vectors The ratio 6 = n / N is a measure of indeterminacy of the Whenever derivation, entry of A should scale as 1/.;n. In for the sake of simplicity we refer to the large system the case where N, n ---> 00 with 6 fixed. we assume system of equations. limit we consider In this limit the typical the concrete that Aai E {+ 1 / .;n, -1/ .;n}. This assumption and only simplifies are developed from the perform thousands well even in the medium size problems the calculations. Although they with 'just' of measurements large system limit, of variables and hundreds in practice, [5]. is not crucial, the algorithms III. AMP FOR THE BASIS PURSUIT basis pursuit problem as in 4 steps: In this section we consider a joint distribution of AMP proceeds over (Sl, ... , S N), parame­ with the problem the the defined in Eq. (1). The derivation (1) Construct by f3 E R+, associated terized and write down the corresponding sum-product (2) Show, by a central large system limit, approximated Derive the update rules for these parameters. (3) Take the limit f3 ---> 00 and get the appropriate basis pursuit problem. (4) Approximate system limit. argument, messages with two scalar that for the can well be parameters. the message The resulting of interest algorithm. passing algorithm is AMP. the sum-product by the families rules for the large limit theorem rules for
A. Construction of the graphical model where the distribution parameters are given by (6) of v;�� (Si). It is Motivated by this lemma, we consider the computation the means and the variances a family convenient to introduce of the messages of densities 2} Z,B(X, b) exp -,8lsl -2 b (s -x) . 1 { ,8 (7) Also let F,B and G,B denote its mean and variance, i.e., We consider the following SI, S2,'" SN the variables joint probability distribution over 1,B(s; x, b) == 1 N n J-L(ds) = Z II exp(-,8lsiD II c5{Ya=(As)a}' i=1 a=1 denotes (2) a Dirac distribution on the hyper­ associated around the solution As we to the heuristics In order to introduce is unique and we have access yield a well defined measure. of J-L, we can therefore low-complexity Here c5{Ya=(As)a} plane Ya = (Ax)a. Products of such distributions with distinct hyperplanes let ,8 � 00, the mass of J-L concentrates of (1). If the minimizer solve (1). Belief marginals provides marginals. consider graph G = (V, F, E) with variable nodes F = [n] and edges E = [N] x [n] = {(i, a) : i E [NJ, a E [n]}. Hence G is the complete with N variable the joint distribution graph. propagation are probability the present real line. The update rules for these densities the factor nodes V = [N], factor bipartite graph nodes. It is clear that according (2) is structured {vi-+ahEV,aEC to this factor with the edges of this graph are the belief Associated messages case, messages and {va-+ihEV,aEC, In over the propagation, nodes and n factor measures belief are propagation for approximating such V;��(Si) � e-,B!Si! II VLi(Si), V�-+i(Si) � J II V]-+a(Si) c5{Ya-(As)a} ds, (4) number and the the iteration b=Ja j=Ji (3) denotes where a superscript symbol � denotes identity up to a normalization B. Large system limit between constant!. probability distributions tightly. There­ assume that it is independent l,From Eq. (6), we expect fLa to concentrate of the edge (i, a). fore we t, the messages Lemma 111.2. Suppose that at iteration ftom nodes are v�-+i = ¢�-+i' the factor nodes to the variable with ¢�-+i defined as in Eq. (5) with parameters Z�-+i and fg-+i = ft. Then at the next iteration we have {I + O(s� /n)}, V;��(Si) = ¢���(Si) ¢��� (Si) == 1,B(Si; L AbiZb-+i' ft) . b=Ja of these messages The mean and the variances are given by t+l F (" A t . At) Xi-+a = 13 L.... biZb-+i' T , TLa = ,8 G,B (L AbiZLi; ft). b=Ja b=Ja A key remark is that in the large system limit, the messages with variances Gaussian v;-+a ( . ) are accurately densities and a Laplace approxi­ density. measure We that, given two v�-+i ( . ) are approximately of order N, and the messages mated by the product state this J-Ll and J-L2 over JR, their Kolmogorov IIJ-Ll - J-L211K == sUPaER of a Gaussian below. Recall IJ-Ll (- 00 , a] -J-L2( - 00 , a]l· fact formally The first Lemma is an estimate distance is given by v�-+i' of the messages of the distribution 3 dvj -+a ( S j) ::; Ct uniformly Lemma 111.1. Let x;-+a and (Tj-+a/,8) be, respectively, mean and the variance further J Is j 1 C: such that exists a constant II At va-+i 'f'a-+i K -Nl/2(At .)3' Ta--"''t V]-+a' Assume in N, n. Then there _ J..t II < C: the C. Large,8 limit In the limit ,8 � 00, we can simplify the functions and G,B. Consider sign(x)( alternative lxl - b)+. It is well known that this admits the characterization the soft thresholding function F,B ry(x; b) = ry(X; b) = argminsER {lSi + 2 1 b (s -X)2 } . (9) that corre­ that defines F,B (x; b) is F,B(x;b) � ry(x;b). The by the maximum value of the exponent, In the ,8 � 00 limit, the integral dominated sponds to s* = ry(x;b) and therefore variance by approximating can occur. holds and G,B(x; b) = 8(1/,8). On the other hand, if s* = 0, 1,B(s; x, b) to can be approximated G,B(x; b) = 8(1/,82) (which is negligible). discussion (and hence the function If s* -=I-0, then a Gaussian by a Laplace 1,B(s; x, b) near S*. Two cases approximation leading distribution, We summarize this in the following. G,B(x; b)) the density can be estimated 1 More precisely, given two non-negative same space, we write p(s)�q(s) that p(s) = a q(s) for every s E O. functions p, q : 0 ---+ IR, over the if there exists a positive constant 13-+00 a such lim F,B(x;,8) = ry(x;b) , lim ,8G,B(x;,8) = bry'(x;b). 13-+00 (5) Lemma III.3. For bounded x, b, we have
the following equiva­ (for large (3): lent form for the message passing algorithm Lemmas III. 1 ,111.2, and III.3 suggest t+1 ("" A t At) Xi---+a = "I L..-t biZb---+i; 7 , Z�---+i == Ya -LAajx;---+a, #i At N At+1 7"" , ("" A t At) N8 L..-t"l L..-t biZb---+i;7 . 7 = b¥-a (10) (11) (12) D. From message passing i=l b to AMP The updates in Eqs. (11), (12) are easy to implement complex algorithm is still rather it requires is to further simplify the update equations. The goal of this In order to track 2nN messages. the overall but nevertheless because section to justify the approximation can be approximated we assume that the messages as xLa = x� + 8x�---+a + 0(1/N), Z�---+i = z� + 8Z�---+i + 0(1/ N), with 8x�---+a' 8Z�---+i = O( ffi) (here the 0 ( . ) errors are assumed uniform in the choice of the edge). We also consider of the form X��� b¥-a with {"It ( . ) hEIN = "It(LAbiZLi)' Z�---+i == Ya -LAajx;---+a, (13) #i a sequence of differendiable nonlinear func­ E. Comments Threshold 'parameter by the recursion level. free' algorithm. The threshold The derivation presented here yields a level ft is updated point of view that ft is a parameter to be optimized. This point in Eq. (16). One could take the alternative of view was adopted points of view coincide be advantageous Mathematical in [1], [5]. It is expected in the large system limit, sequence that the two but it might to consider derivation limit (large systems, a general of thresholds. of AMP. We showed that in a and large (3) the sum-product simplified to (14), (15). We concern just a single step of As such they do not prove that the sum­ procedure. that our results specific update rules can be significantly should emphasize the iterative product principle while negligible a finite number of iterations. case, but it is nevertheless at each step, conjure up to become large after We do not expect this to be the an open mathematical problem. messages it could be that the error terms in our approximation, by Eqs. (14), (15). In are carefully tracked IV. AMP FOR BPDN/LASSO minimize >'llsl11 + �IIY -Asll�· (17) The derivation one in the previous mentioning of the corressponding section. We therefore AMP is similar to the to limit ourself a few differences. a general message passing algorithms Another popular reconstruction procedure in compressed sensing is the following problem tions with bounded derivatives. derived at the end of the previous Notice that the algorithm Eqs. (12), is of this form, albeit with "It non-differentiable 1 N at 2 points. z II exp( -(3)'lsil) II exp { -"2(Ya -(AS)a)2} ds. nonlinear i=l a=l simplicity, variables S = (Sl, ... , (3 f.£(ds) = But this does not change the result, continuous. In the interest model. the differentiable SN) are Lipschitz we just discuss functions as long as the of n As before we define a joint density distribution on the section, cf. Eqs. (11), described algo­ equations that the asymptotic above holds for the message passing Lemma 111.4. Suppose behavior in the paragraph rithm (13). Then x� and z� satisfY the folloWing X�+l = "It (L AiaZ� + x!) + oN(1), t _ "" t 1 t-1' * t-1 t-1 Za -Ya-L..-tAajxj + -gza ("It-1(A Z +X ))+oN(1), where the oN(1) terms vanish as N, n --t 00. the resulting As a consequence, can be written algorithm j a in the vector notation as follows: Xt+1 = "I(A* zt + xt; ft) , zt = Y _ Axt + -gzt-1("I'(A* zt-1 + x�-l; ft-1)), 1 (15) (14) where ( . ) denotes the average We also get the following recursion of a vector. for f: At-1 At = � ( '(A* t-1 + t. At-1)) 7 8 "I Z X ,7 . (16) with the solution of on its mode concentrates The mode of this distribution coincides the problem (17) and the distribution as (3 --t 00. The sum-product v:��(s D�---+i(Si)� J exp { -�(Ya -(AS)a)2} 1J. dV]---+a(sj). i)�exp(-(3>'lsil) II VLi(Si), algorithm b¥-a is Jr ' Proceeding as above, we derive an asymptotically equivalent form of the belief get the following algorithm for N --t 00 and (3 --t 00. We in the vector notation: propagation xt = "I(xt + A* zt; >. + -/) , zt+1 = Y _ Axt + -gzt("I'(xt-1 + A*zt-1),) (19) (18) 1 which generalize computed iteratively as follows Eqs. (14) and (15). The threshold level is ,t+1 = - >. +-/ -("I' (Azt + xt; ,t + >.)) . (20) 8 Notice that the only deviation previous from the algorithm in the for the threshold is in the recursion section level.
V. AMP FOR RECONSTRUCTION W ITH PRIOR INFORMATION In the two cases we discussed so far, the distribution of the to estimate signal So was not known. This is a very natural assumption. Nevertheless, it might be possible narios the input distribution. may be used to improve the recovery case of known signal distribution the other approaches. iterative theresholding provides This extra information algorithms. Also, the for a benchmark and practical in specific sce­ Here, if x E JRN, F(x; b) E JRN is the vector F(x; b) = (F1(Xi;b),F2(X2;b), ... ,FN(XN;b)). Analogously b)) (derivative (Fi (Xi; b), F�(X2; b), ... , with respect Finally, is computed to the first argument). iteratively FN(XN; being taken the threshold level F'(x) = as follows 1 'Yt+1 = J (G(Azt+xt;'Yt+).)). (25) A. Comments The AMP algorithm we define a very simple for these situations. In this section in this section algorithm more complex than the ones in the previous x aN be the joint probability distribution main difference is replaced latter is not hard to construct to evaluate. does not admit, in general, accurate function 'T](' ) F( .). While the expectation a closed form expression, it approximations is marginally sections. is that the soft thresholding with the conditional to consider described that are easy The fJ Let a = a1 x a2 . . . 1 n Sl, S2, ... , S of the variables the distribution N. It is then natural p(ds) = z II exp { -"2(Ya -(AS)a)2} II ai(dsi), of s, when Y = As + w since p is the a posteriori is observed. Here, and is independent w is a noise vector with i.i.d. of s. The sum-product distribution update rules are a=l i=l N )� II DLi(Si) ai(dsi), v;��(dSi V!-->i(Si)� J exp { -�(Ya -(AS)a)2} I] v]-->a(dsj). bo/-a Jr ' normal entries V I. RELATED WORK In this section approach we would like with earlier present these lines of work evolved there was so far little to clarify the relation of the Each of and results from different in the literature. motivations, -if any -contact between them. A. Other message passing algorithms At each t, the message v;�� (dsi) is a probability measure on to ai. with respect a non-negative Notice that the above update rules are well defined. iteration JR, and the first equation The message v!-->i(Si) is instead function a density) Clearly (equivalently, the case studied measurable given by the second equation. in the previous gives its density section corresponds problems sensing However such a proposal see for instance [6]. faced two major difficulties. was suggested to the standard algorithm (1) According the the sum-product measures over the real line JR, cf. Eqs. (3), (4). This is impractical a computational message­ passing from point of view. (A low complexity prescription, should be probability problem was used in [7]). for a related algorithm messages used in The use of message passing for compressed algorithms before, to ai� exp( -fJlsil). In order to derive ing algorithm, over JR the simplified version of the message pass­ family of measures we introduce the following 1 { fJ 2} fi(ds; x, b) :::= z,e(x, b) exp -2b (s -x) ai(ds), (21) by i E [N], x E JR, b E JR+ (fJ is fixed here). indexed mean and the variance of this distribution define the functions The nodes. In other words, unless the underlying matrix bipartite (2) The factor graph on which the sum-product graph with N variable run is the complete n function is sparse [8], the graphical to update N n messages depend on N or n input messages. Again computationally. per iteration, algorithm is nodes, and model is very dense. This requires this is very expensive and each message update (here Z '" fi ( . ; x, b )) Fi(X;b) :::=E!i(.;x,b) These functions (Z), Gi(x;b) :::=Var!i(.;x,b) (Z). (22) have a natural estimation-theoretic interpreta­ with variance with distribution noise tion. Let Xi�be a random variable assume that Yi = Xi + Wi is observed bl fJ. The above functions cond!,!ional expectation that Yi = x: Fi(X; b) = E(XiIYi = x), Gi(x; b) = Var(XilY = x). ai, and with Wi gaussian are -respectively- the variance and conditional of Xi, given The previous pages show that problem (2) does not add of the the high density to (1), but in fact solves it! Indeed, graph leads to approximately nodes to variable messages variance. the means, again because are in tum parametrized It turns out that is is sufficient nodes, via central Gaussian of the high density. Problem (2) is also solved by the high density limit theorem. by two numbers: Gaussian mean and to keep track only of messages from factor nature of the from the same node graph, since all the messages of the graph are very similar departing with each other. with the use of belief One last key difficulty propagation in The approach described AMP (in vector notation) in Section III yields the following compressed sensing was (3) The use of belief requires the vector So. For most applications, as in Eq. (2). A first justification propagation The solution of this problem lies in using a Laplace prior of this choice lies in the fact to define a prior on no good prior is available.
of a more general equivalence During the last few months, several papers investigated related optimal probability concentrates non-linearity to the soft threshold measure of the basis pursuit This is indeed an instance between replica that, as (3 -> 00, the resulting around its mode, that is the solution problem (1). A deeper reason for this choice is that it is TJ(x; ()), intimately which is step-by-step in a minimax sense [1], [5]. B. Historical and statistical physics background There is a well studied compressed [18], [19]. In view ofthe discussion that these results formalism has several techniques concrete, the sum-product algorithm corresponds through approximation message passing the performances in statistical physics, and its fixed points In the context put forward advantages (2) It is intimately (3) It actually algorithms; of these algorithms. connection between and its assumptions and cavity methods. and message passing can be recovered simulations; statistical algorithms to the Bethe­ problems sensing [9]. In par­ points of the Bethe free energy. the Bethe-Peierls approximation is also physics ticular, Peierls are stationary of spin glass theory, referred to as the 'replica symmetric cavity method'. ACKNOWLEDGMENT using the replica method [17], above, it is not surprising from the state evolution in [1]. Let us mention that the latter over the replica method: (1) It is more can be checked quantitatively related to efficient allows to predict The Bethe-Peierls approximation postulates This work was partially supported by a Terman fellowship, on quantities that are equivalent and which are in correspondence with local the NSF CAREER award CCF-0743978 DMS-0806211. and the NSF grant a set of non­ to the sum­ equations messages, linear product marginals. In the special graph (the celebrated equations Thouless, reduce to the so-called Anderson cases of spin glasses on the complete Sherrington-Kirkpatrick model), these TAP equations, named after and Palmer who first used them [10]. The original TAP equations where a set of non-linear (i.e. for local magnetizations Thouless, expectations of a single and Palmer first recognized that enough in the spin glass REFERENCES [1] D. L. Donoho, A. Maleki and A. Montanari, "Message Passing Algo­ rithms for Compressed Sensing," Proc. Natl. Acad. Sci. (2009), by basis pursuit," [2] S. S. Chen, D. L. Donoho, and M. A. Saunders, Comput., Tuned Iterative submitted [3] A. Maleki and D. L. Donoho, "Optimally SIAM J. on Scientific for Compressed Sensing," 20 (1998) 33-61 Algorithms selected areas in signal processing, accepted "Atomic decomposition Reconstruction 2009. to IEEE journal of for publication, "How to Design Message In preparation (2010) "Message Passing Al­ and Validation," IEEE [4] D. L. Donoho, A. Maleki and A. Montanari, Sensing," for Compressed Algorithms Passing [5] D. L. Donoho, A. Maleki and A. Montanari, gorithms Inform. for Compressed Theory Workshop, [6] D. Baron, S. Sarvotham, Sensing: Cairo, January 2010 and R. G. Baraniuk, II. Analysis "Bayesian Compressive to IEEE Transactions on Signal Sensing via Belief Propagation," Processing (2009) accepted [7] Y. Lu, A. Montanari, "Counter ment", SIGMETRICS, Braids: Annapolis, June 2008 B. Prabhakar, S. Dharmapurikar and A. Kabbani, A Novel Counter Architecture for Per-Flow Measure­ [8] R. Berinde, A. C. Gilbert, P. Indyk, H. Karloff and M. J. Strauss, "Com­ with combinatorics: bining geometry recovery," Proc. of the 46th Annual Allerton 2008 September a unified approach Conference, to sparse signal Monticello, IL, [9] M. Mezard and A. Montanari, Press, Oxford, 2009 Oxford University Iriformation, physics, and computation, [10] D. J. Thouless and P. W. Anderson and R. G. Palmer", "Solution of Verlag, 'Solvable [11] M. Talagrand, model of a spin glass'," Phil. Mag., 35 (1977) 593-601 Spin Glasses: A Challenge/or Mathematicians, 2003 Berlin, W.T. Freeman, and Y. Weiss, "Constructing [12] J.S. Yedidia, IEEE and generalized free energy algorithms," belief propagation approximations Trans. Inf. Theory, 51 (2005) 2282-2313. Springer­ [13] M. Opper and O. Winther, "From Naive Mean Field Theory to the TAP in Advanced mean field methods: theory and practice, MIT Equations," Press, 2001 [14] Y. Kabashima, of the TAP equations Anderson equations variable). naive mean field is not accurate model, and corrected tion term that is analogous than 30 years after the original justification in spin glass theory, although [11]. While the connection Bethe-Peierls of research received [13], [14], [15]. C. State evolution only sparse attention. approximation and replica [12], the algorithmic it by adding the so called Onsager reac­ to the last term in Eq. (15). More paper, a complete mathematical remains important partial an open problem results exist and between belief propagation stimulated a considerable amount uses of TAP equations have exceptions Remarkable include calculations message passing In the context of coding theory, algorithms for density evolution evolution through density [16]. The common is that the underlying graph are analyzed justification is random and sparce, the large system limit. is exact, hence it is asymptotically graphs. locally In the case of trees density and hence converges exact for sparse random to a tree in evolution State evolution is the analog of density of dense graphs. For definitions we refer to the [1], [5]. The success be ascribed calls for new mathematical ideas. to the locally tree-like evolution in the case and results on state evolution belief propagation," "A CDMA multiuser algorithm J. Phys. A, 36 (2003) 11111-11121 detection on the basis of of state evolution of the graph, and structure cannot [15] J. P. Neirotti and D. Saad, "Improved Europhys. message passing Lett. 71 (2005) 866-872 for inference in densely connected [16] TJ. Richardson systems", and R. Urbanke, Modern Coding Theory, Cambridge The fixed points of state evolution describe the output of the AMP, when the latter corresponding large number of iterations (independent n, N). It is well known, within statistical the fixed point equations do indeed coincide tions obtained approach, through a completely the replica is run for a sufficiently of the dimensions mechanics [9], that non-rigorous method (in its replica-symmetric different form). with the equa­ University Press, Cambridge [17] S. Rankan, A. K. Fletcher, via the Replica MAP Estimation Sensing", [18] Y. Kabashima, arXiv:0906.3234 (2009) limit for compressed Mech. (2009) L09003 and V. K. Goyal "Asymptotic Method and Applications Analysis of to Compressed T. Wadayama, and T. Tanaka "A typical reconstruction sensing based on Lp-norm minimization," J.Stat. [19] D. Baron, D. Guo, and S. Shamai, '''A Single-letter Characterization of Optimal Noisy Compressed Conference, Monticello, Proc. of the 47th Annual Allerton Sensing", IL, September 2009
分享到:
收藏