logo资料库

fisher information matrix.pdf

第1页 / 共24页
第2页 / 共24页
第3页 / 共24页
第4页 / 共24页
第5页 / 共24页
第6页 / 共24页
第7页 / 共24页
第8页 / 共24页
资料共24页,剩余部分请下载后查看
A User Manual for the Fisher Information Matrix Jet Propulsion Laboratory, California Institute of Technology, Pasadena, CA 91109 (Dated: Feb 9 2007) Michele Vallisneri The Fisher-matrix formalism is used routinely in the literature on gravitational-wave detection to characterize the parameter-estimation performance of gravitational-wave measurements, given parametrized models of the waveforms, and assuming detector noise of known colored Gaussian distribution. Unfortunately, the Fisher matrix can be a poor predictor of the amount of information obtained from typical observations, especially for waveforms with several parameters and relatively low expected signal-to-noise ratios, or for waveforms depending weakly on one or more parameters, when their priors are not taken into proper consideration. I discuss these pitfalls and describe practical recipes to recognize them and cope with them. Specifically, I answer the following questions: (i) What is the significance of (quasi-)singular Fisher matrices, and how do we deal with them? (ii) When is it necessary to take into account prior probability distributions for the source parameters? (iii) When is the signal-to-noise ratio high enough to believe the Fisher-matrix result? PACS numbers: 04.80.Nn, 95.55.Ym, 02.50.Tt I. INTRODUCTION Over the last two decades, the prevailing attitude in the gravitational-wave (GW) source-modeling community has been one of pre-data positioning: in the absence of confirmed detections, the emphasis has been on exploring which astrophysical systems, and which of their properties, would become accessible to GW observations with the sensitivities afforded by planned (or desired) future experiments, with the purpose of committing theoretical effort to the most promising sources, and of directing public advocacy to the most promising detectors. In this positioning and in this exploration, the expected accuracy of GW source parameters, as determined from the signals to be observed, is often employed as a proxy for the amount of physical information that can be gained from detection campaigns. However, predicting the parameter-estimation performance of future observations is a complex matter, even with the benefit of accurate theoretical descriptions of the expected waveforms and of faithful characterizations of the noise and response of detectors; and in practice, the typical source modeler has had much less to go with. The main problem is that there are few analytical tools that can be applied generally to the problem, before resorting to relatively cumbersome numerical simulations that involve multiple explicit realizations of signal-plus-noise datasets. In the source-modeling community, the analytical tool of choice has been the Fisher information matrix Fij[h] = (hi, hj): here hi(t) is the derivative of the gravitational waveform h(t) of interest with respect to the i-th source parameter θi, and “(·,·)” is a signal product weighted by the expected power spectral density of detector noise, as −1 described in Sec. II B. Now, it is usually claimed that the inverse Fisher matrix F ij [h0] represents the covariance matrix of parameter errors in the parameter-estimation problem for the true signal h0. This statement can be interpreted in three slightly different ways (all correct), which we examine in detail in Sec. II, and preview here: 1. The inverse Fisher matrix F −1 ij [h0] is a lower bound (generally known as the Cram´er–Rao bound) for the error covariance of any unbiased estimator of the true source parameters. Thus, it is a frequentist error (see Sec. II A): for any experiment characterized by the true signal h0 and a certain realization n of detector noise, the −1 parameter estimator ˆθ is a single-valued function of the total detector output s = n + h0, and F ij [h0] is a lower bound on the covariance (i.e., the fluctuations) of ˆθ in an imaginary infinite sequence of experiments with different realizations of noise. The Cram´er–Rao bound is discussed in Sec. II C. 2. The inverse Fisher matrix F −1 ij [h0] is the frequentist error covariance for the maximum-likelihood (ML) parameter estimator ˆθML, assuming Gaussian noise, in the limit of strong signals (i.e., high signal-to-noise ratio SNR) or, equivalently, in the limit in which the waveforms can be considered as linear functions of their parameters (we shall refer to this limit as the linearized-signal approximation, or LSA). This well-known result is rederived in Sec. II D. 3. The inverse Fisher matrix F −1 ij [h0] represents the covariance (i.e., the multidimensional spread around the mode) of the posterior probability distribution p(θ0|s) for the true source parameters θ0, as it can be inferred (in Bayesian fashion) from a single experiment with true signal h0, assuming Gaussian noise, in the high-SNR limit (or in the LSA), and in the case where any prior probabilities for the parameters are constant over the
2 parameter range of interest. Properly speaking, the inverse Fisher matrix is a measure of uncertainty rather than error, since in any experiment the mode will be displaced from the true parameters by an unknown amount due to noise.1 See Sec. II E for a rederivation of this result. As pointed out by Jaynes [1], while the numerical identity of these three different error-like quantities has given rise to much confusion, it arises almost trivially from the fact that in a neighborhood of its maximum, the signal likelihood p(s|θ0) is approximated by a normal probability distribution with covariance F −1 ij . In this paper, I argue that the Cram´er–Rao bound is seldom useful in the work of GW analysts (Sec. II C), and while the high-SNR/LSA frequentist and Bayesian results are legitimate, they raise the question of whether the signals of interest are strong (or linear) enough to warrant the limit, and of what happens if they are not. In addition, if we possess significant information about the prior distribution (or even the allowed range) of source parameters, it is really only in the Bayesian framework that we can fold this information reliably into the Fisher result (Sec. II D). Thus, I recommend the Bayesian viewpoint as the most fruitful way of thinking about the Fisher-matrix result (although I will also derive parallel results from the frequentist viewpoint). Of course, the study of Bayesian inference for GW parameter-estimation problems need not stop at the leading-order (Fisher-matrix) expression for the posterior likelihood: Markov Chain Monte Carlo (MCMC) algorithms [2] can provide very reliable results, immune from any considerations about signal strength, but they require a significant investment of time to implement them, and of computational resources to run them. By contrast, the Fisher-matrix formalism is singularly economical (especially since it does not involve explicit realizations of the noise, as MCMC methods must necessarily do), and it is clear that it will always be the first recourse of the GW analyst. To use it reliably, however, we must understand the limits of its applicability. The purpose of this paper is to explore these limits. I do so by addressing three questions: 1. What is the significance of singular or ill-conditioned Fisher matrix that often appear in estimation problems with several source parameters, and how do we deal with them? Can we still believe the Fisher result? (See Sec. IV.) 2. When is it necessary to take into account the prior probability distributions for the parameters, perhaps specified trivially by giving their allowed ranges? (See Sec. V.) 3. When is the high-SNR/LSA approximation warranted? (As we shall see, the high-SNR limit is equivalent to the LSA.) That is, how strong a signal will we need to measure if we are to believe the Fisher-matrix result as its uncertainty? (See Sec. VI.) In addition, I discuss the extension of the LSA beyond the leading order, in both the frequentist and Bayesian parameter-estimation frameworks (Sec. VII), in a form that the adventurous GW analyst can use to test the reliability of the Fisher result (but higher-order derivatives and many-indexed expressions start to mount rapidly, even at the next-to-leading order). To be sure, these questions were posed already in the seminal treatments of GW detection by Finn [3] and Cutler and Flanagan [4], but it appears that the cautions suggested by these authors have gone largely unheeded. My treatment follows closely theirs, as well as the classic texts on the statistical analysis of noisy data (e.g., Refs. [5, 6]). I am indebted to Jaynes and Bretthorst [1, 7] for their enlightening (if occasionally blunt) perspective on frequentist and Bayesian parameter estimation. While most of this paper could be described fairly as a critical review of well- established knowledge, as far as I know the procedures and viewpoints described in Secs. V and VI are original, and Sec. IV is offered to help clarify an often-encountered but little-understood situation; last, although higher- order expressions for the variance appear in Ref. [4] and elsewhere, I consider it useful to restate them in a notation compatible with the rest of this paper. Whenever my discussion requires a practical example, I consider signals from inspiraling binaries of two black holes, both of mass 10M, as described by the restricted post-Newtonian approximation for adiabatic, circular inspirals (see Sec. III); in my examples, I assume detection and parameter estimation are performed on Initial-LIGO [8] data, and I adopt the LIGO noise curve of Ref. [9, Table IV]. Throughout, I use geometric units; I assume the Einstein summation convention for repeated indices; and I do not distinguish between covariant and contravariant indices, except in Sec. VII. 1 In the LSA/high-SNR limit with negligible priors, the posterior probability mode, seen as a frequentist statistic, coincides with the ML estimator; thus its fluctuations are again described by the inverse Fisher matrix.
3 II. THREE ROADS TO THE FISHER MATRIX In this section I discuss the “three roads” to the inverse Fisher matrix as a measure of uncertainty for GW observations: the Cram´er–Rao bound (Sec. II C), the high-SNR/LSA limit for the frequentist covariance of the ML estimator (Sec. II D), and the high-SNR/LSA limit for the single-experiment covariance of the Bayesian posterior distribution (Sec. II E). Sections II A and II B are refreshers about frequentist and Bayesian parameter estimation, and about the analytical expression of Gaussian-noise likelihood for GW signals. A. A refresher on the frequentist and Bayesian frameworks The frequentist (or orthodox) approach to parameter estimation for GW signals can be summed up as follows: 1. We are given the detector data s, consisting of the true signal h0 = h(θ0) (where θ0 is the vector of the true system parameters) plus additive noise n. 2. We select a point estimator ˆθ(s): that is, a function of detector data that (it is hoped) approximates the true values of source parameters, except for the statistical error due to the presence of noise. One important example of point estimator is the ML estimator ˆθML, which maximizes the likelihood p(s|θ) of observing the measured data s given a value θ of the true parameters. For additive noise, this likelihood coincides with the probability of a noise realization n = s − h(θ). 3. We characterize statistical error as the fluctuations of ˆθ(s), computed over a very long series of independent experiments where the source parameters are kept fixed, while detector noise n is sampled from its assumed probability distribution (often called sampling distribution). The estimator ˆθ is usually chosen according to one or more criteria of optimality: for instance, unbiasedness requires that ˆθ(s)n (the average of the estimator over the noise probability distribution) be equal to θ0. A rather different approach is that of Bayesian inference: 1. We do not assume the true value of the system parameters, but we posit their prior probability distribution p(θ). 2. Given the data s, we do not compute estimators, but rather the full posterior probability distribution p(θ|s), using Bayes’ theorem p(θ|s) = p(s|θ) × p(θ)/p(s), where p(s) = p(s|θ) p(θ) dθ. 3. We characterize statistical error in a single experiment by the spread of the posterior distribution p(θ|s). The differences between the frequentist and Bayesian approaches are not only mathematical, but also epistemic: “frequentists” view probabilities essentially as the relative frequencies of outcomes in repeated experiments, while “Bayesians” view them as subjective2 indexes of certainty for alternative propositions. For an introduction to the contrasting views, I refer the reader to the excellent treatise (very partial to the Bayesian worldview) by Jaynes [1], and to Ref. [4] for a more GW-detection–oriented disucssion. Once actual detections are made, the Bayesian approach of computing posterior probability distributions for the signal parameters given the observed data seems more powerful than the frequentist usage of somewhat arbitrary point estimators; the latter will always result in throwing away useful information, unless the chosen estimators are sufficient statistics (i.e., unless the likelihood depends on the data only through the estimators). As for statistical error, it seems preferable to characterize it from the data we have (actually, from the posterior distributions that we infer from the data), rather than from the data we could have obtained (i.e., from the sampling distribution of estimators in a hypothetical ensemble of experiments). As Cutler and Flanagan [4] point out, however, it is in the current pre-data regime that we seek to compute expected parameter accuracies; in the absence of actual confirmed-detection datasets, it seems acceptable to consider ensembles of possible parameter-estimation experiments, and to use frequentist statistical error as an inverse measure of potential physical insight. The best solution, bridging the two approaches, would undoubtedly be to examine the frequentist distribution of some definite measure of Bayesian statistical error; unfortunately, such a hybrid study is generally unfeasible, given the considerable computational requirements of even single-dataset Bayesian analyses. 2 Only in the sense that subjects with different prior assumptions could come to different conclusions after seeing the same data; indeed, Bayesian statistics describes how prior assumptions become deterministically modified by the observation of data.
B. Likelihood for GW signals in Gaussian noise Under the assumption of stationary and Gaussian detector noise, the likelihood log p(s|θ) can be obtained very simply from a noise-weighted inner product of the detector output and of the signal h(θ) (see for instance Ref. [4, Eq. (2.3)]): 4 the weighting is performed with respect to the expected power spectral density of detector noise by defining the noise-weighted inner product of two real-valued signals as (2) where ˜h(f) and ˜g(f) are the Fourier transforms of h(t) and g(t), “∗” denotes complex conjugation, and Sn(f) is the one-sided power spectral density of the noise. From the definition of Sn(f) as ˜n(f)∗˜n(f ), we get the useful property 2 Sn(|f|) δ(f − f )n = 1 (h, g) = 4 Re df, 0 p(s|θ) ∝ e −(s−h(θ),s−h(θ))/2; +∞ ˜h(f)∗˜g(f) Sn(f) (h, n)(n, g) n = (h, g), (1) (3) where again ·n denotes averaging over the probability distribution of the noise. C. First road: Derivation and critique of the Cram´er–Rao bound The derivation in this section is inspired by the treatment of Ref. [1, p. 518], and it is given for simplicity in the case of one source parameter. We wish to pose a bound on the frequentist estimator variance : n (4) 2 ˆθ(s) ˆθ(s) − u(s) v(s) p(s|θ0) ds, var ˆθ = to do this, we consider the ensemble product the likelihood of observing the noise realization n = s − h0. Setting v(s) = ˆθ(s) − (5) where p(s|θ0) is the likelihood of observing the detector output s given the true source parameter θ0, or equivalently n, we obtain a bound on v, vn ≡ var ˆθ from the Schwarz inequality: u(s), v(s) ˆθ(s) n = (6) This inequality is true for any function u(s) of the data, and it becomes an equality when u(s) ∝ v(s). Since we wish to derive a bound that applies generally to all estimators, we should not have (or try) to provide too much detail about ˆθ (and therefore v(s)). A simple assumption to make on ˆθ is that it is an unbiased estimator: . var ˆθ ≡ v, vn ≥ u, v2 u, un n ˆθ(s) n = θ0 ⇒ ∂θ0 ˆθ(s) n = 1. (7) How does this help us? It turns out that we can write a function u(s) whose ensemble product with any other function w(s) yields the derivative ∂θ0w(s)n; this function is just u(s) = ∂θ0 log p(s|θ0), because w(s)[∂θ0 log p(s|θ0)] p(s|θ0) ds = w(s) ∂θ0 p(s|θ0) ds = ∂θ0 w(s)p(s|θ0) ds = ∂θ0w(s)n, (8) assuming of course3 that we can exchange integration and differentiation with respect to θ0. For any s, u(s) encodes the local relative change in the likelihood function as θ is changed. It follows that u(s), v(s)n = ∂θ0 n = 1, so ˆθ(s) 3 This assumption fails for some (mildly) pathological likelihood functions, which can provide counterexamples to the Cram´er–Rao bound.
from Eq. (6) we get4 var ˆθ ≥ ∂θ0 log p(s|θ0), ∂θ0 log p(s|θ0) 1 , n 5 (9) ≡ 1 u(s), u(s) which is the unbiased-estimator version of the Cram´er–Rao bound. Schwarz inequality by providing the derivative of the bias b(θ0) with respect to θ0: If the estimator is biased, we can still use the ˆθ(s)n = θ0 + b(θ0) ⇒ ∂θ0ˆθ(s)n = 1 + ∂θ0b(θ0), and therefore var ˆθ ≥ (1 + ∂θ0b)2 ∂θ0 log p(s|θ0), ∂θ0 log p(s|θ0) covarn(ˆθi, ˆθl) ≥ Generalizing to a multidimensional expression is straightforward, if verbose: ∂i log p(s|θ0) ∂l log p(s|θ0) where the Fisher information matrix is defined by δim + ∂mbi(θ0) −1 mj F . n δjl + ∂jbl(θ0) , ∂l∂m log p(s|θ) (10) (11) (12) . n n Fil = = − (13) (∂i log p(s|θ0)) p(s|θ0) ds, and remembering that p(s|θ0) ds = ∂i1 = 0. With the help of Eqs. (1) and (3), we can compute the Fisher matrix for GW signals The second equality is established by taking the gradient of ∂i in additive Gaussian noise, which is the familiar expression Fij = (hi, hj). The full expression (12) for the Cram´er–Rao bound, which includes the effects of bias, has interesting consequences, for it implies that biased estimators can actually outperform 5 unbiased estimators, since the ∂mbi(θ0) can be negative. Unfortunately, we have no handle on these derivatives without explicitly choosing a particular estimator (which goes against the idea of having a generic bound), so the Cram´er–Rao bound can only give us a definite result for the subclass of unbiased estimators. As pointed out by Cutler and Flanagan [4, App. A 5], it follows that the bound cannot be used to place absolute limits on the accuracy of estimators (i.e., lower bounds on frequentist error), which would exclude or severely limit the possibility of inferring the physical properties of sources from their emitted GWs. Even if the lower bound for unbiased estimators is very discouraging, there is always a chance that a biased estimator could do much better, so we cannot use the bound to prove “no go” theorems. Going back to Eq. (6), we note that the bound is satisfied as an equality when u(s) ∝ v(s) ⇒ ∂θ0 log p(s|θ0) = q(θ0)[ˆθ(s) − ˆθ(s)n]. By integrating, we obtain a relation between the likelihood and the estimator: p(s|θ0) = m(s) Z(θ0) e −l(θ0) ˆθ(s); (14) (15) the estimation problems (i.e., the pairings of given likelihoods and chosen estimators) for which this relation holds true are said to belong to the exponential family, and they are the only ones for which the Cram´er–Rao bound is satisfied exactly as an equality. Equation (15) generalizes trivially to multidimensional problems by replacing the exponential with exp{−lk(θ0)ˆθk(s)}. Unfortunately, for a given p(s|θ0) there is no guarantee that any unbiased estimator exists that satisfies Eq. (15) and that therefore can actually achieve the bound; all we can say in general 4 To obtain Eq. (9), we need to notice also that for any w(s), u(s), w(s)nn = 0, since w(s)n does not depend on s (but only on θ0), and the integral of Eq. (8) reduces to w(s)n 5 This is true even if we evaluate the performance of estimators on the basis of their quadratic error ∂θ0 p(s|θ0)ds = w(s)n∂θ0 1 = 0. R ´ ` F −1 δim + ∂mbi(θ0) ´ ` δjl + ∂j bl(θ0) mj (ˆθi − θ0i)(ˆθl − θ0l)n ≥ bi(θ0)bl(θ0) + rather than on the basis of their variance.
about the performance of unbiased estimators is that they will underperform the Cram´er–Rao bias, but we do not know how badly. As discussed above, the bound tells us nothing in general about biased estimators. It follows that the bound cannot be used to establish guaranteed levels of accuracy (i.e., upper bounds on frequentist error), which would prove the possibility of inferring the physical properties of sources from their GWs. We can only do so if we can identify a specific estimator that achieves the bound. In the next section we shall see that the ML estimator6 does so in the high-SNR limit, where waveforms can be approximated accurately as linear functions of their parameters within the region of parameter space where p(s|θ) is not negligible (so the high-SNR limit coincides with the limit in which the LSA is accurate). We conclude that the Cram´er–Rao bound is never useful to the GW analyst as a proper bound, whether to make positive or negative expected-accuracy statements; where it is useful, it reduces to the high-SNR/LSA result for the ML estimator. 6 D. Second road: Derivation and critique of the frequentist high-SNR/LSA result We denote the true signal as h0 (so s = h0 + n), and expand the generic waveform h(θ) around h0, normalizing (h0, h0) (also known in this context as signal signals by the optimal signal-to-noise ratio of the true signal, A = strength): h(θ) = h0 + θkhk + θjθkhjk/2 + ··· = A(¯h0 + θk¯hk + θjθk¯hjk/2 + ··· ); (16) here we are translating source parameters as needed to have h(0) = h0, defining hi = ∂ih|θ=0, hij = ∂ij h|θ=0 (and so on), and ¯h0 = h0/A, ¯hk = hk/A (and so on).7 The likelihood is then given by8 The ML equations ∂jp(s|θML) = 0 are given by −(s−h(θ),s−h(θ))/2 = exp p(s|θ) ∝ e (n, ¯hj) + ˆθk(n, ¯hjk) + ˆθk ˆθl(n, ¯hjkl)/2 + ··· 0 = 1 A − (n, n)/2 − A2 θjθk(¯hj, ¯hk) + θjθkθl(¯hj, ¯hkl) + ··· θj(n, ¯hj) + θjθk(n, ¯hjk)/2 + θjθkθl(n, ¯hjkl)/3! + ··· + A /2 . ˆθk(¯hj , ¯hk) + ˆθk ˆθl (¯hj, ¯hkl)/2 + (¯hk, ¯hjl) − (17) (18) , + ··· where we have divided everything by A2, and we omit the “ML” superscript for conciseness. A careful study of Eq. (18) shows that it can be solved in perturbative fashion by writing ˆθML as a series in 1/A, j j /A + ˆθ and by collecting the terms of the same order in Eq. (18), k (¯hj, ¯hk) = 0, j = ˆθ ˆθML (2) (1) O(1/A) : O(1/A2) : O(1/A3) : (1) (n, ¯hj) − ˆθ k (n, ¯hjk) − ˆθ ˆθ . . . (1) j /A2 + ˆθ (3) j /A3 + ··· , − ˆθ (1) k ˆθ (1) l (¯hj, ¯hkl)/2 + (¯hk, ¯hjl) (2) k (¯hj, ¯hk) = 0, thus the ML solution ˆθML j is given by −1(¯hl, n) − (n, ¯hik)(¯hk, ¯hl) (¯hi, ¯hkl)/2 + (¯hk, ¯hil) (¯hk, ¯hm) −1(¯hm, n)(¯hl, ¯hn) (19) (20) + (21) −1(¯hn, n) ˆθML j = 1 −1(¯hk, n) + (¯hj, ¯hk) A 1 −1 ··· A2 (¯hj , ¯hi) + ··· 1 A3 6 Indeed, Eq. (15) implies that if both an efficient (i.e., bound-achieving) unbiased estimator and the ML estimator exist, they must coincide. To show this, we notice that if the ML estimator exists, the log-derivative ∂i log p(s|θ) = −∂ilk(θ)(ˆθk − θk) must be zero at θ = ˆθML, from which it follows that ˆθk = ˆθML k . 7 The statistical uncertainty in the estimated signal strength can still be handled in this notation by taking one of the ¯hk to lie along ¯h0; the corresponding θk represents a fractional correction to the true A. 8 Formally, it is troubling to truncate the series expression for the exponent at any order beyond quadratic, since the integral of the truncated likelihood may become infinite; the important thing to keep in mind, however, is that the series need converge only within a limited parameter range determined self-consistently by the truncated-likelihood estimator, by compact parameter ranges, or (in the Bayesian case) by parameter priors. Similar considerations apply to the derivation of the higher-order corrections given in Sec. VII.
Thus we see that the limit of large A (i.e., high SNR) coincides with the linearized-signal approximation (LSA) where only the first derivatives of the signals are included. In the LSA, the likelihood is just −(n, n)/2 − θjθk(hj , hk)/2 + θj(hj, n) −(n, n)/2 − A2θjθk(¯hj, ¯hk)/2 + Aθj (¯hj, n) p(s|θ) ∝ exp = exp (LSA), (22) 7 and the ML estimator is given by (23) Since (¯hk, n)n = 0, we see also that the ML estimator is unbiased. The variance of ˆθML is then obtained by averaging ˆθML over noise realizations, (LSA). ˆθML j = (1/A)(¯hj , ¯hk) ˆθML j k −1(¯hk, n) ˆθML j ˆθML k n = = 1 −1 A2 (¯hj, ¯hl) 1 A2 (¯hj, ¯hl) (¯hl, n)(n, ¯hm, n) −1(¯hl, ¯hm)(¯hm, ¯hk) n(¯hm, ¯hk) −1 = −1 = 1 −1 A2 (¯hj , ¯hl) (LSA), (24) and it coincides with the mean quadratic error in the frequentist sense. In Eq. (24), the second equality follows from j − θ0j becomes small enough that Eq. (3). The interpretation of the limit is that, for strong signals, the typical ˆθML the log-likelihood is accurately described by the product of detector data and a linearized signal. Equation (24) is the standard Fisher-information–matrix result, and it implies that in the high-SNR/LSA limit the ML estimator achieves the Cram´er–Rao bound. As we shall see in Sec. VII, the next-order correction to the variance scales as 1/A4, not 1/A3. This is because all O(1/A3) terms contain odd numbers of n, whose products vanish under the ensemble average. The fact itself that there is a next-order correction shows that for generic A the ML estimator does not achieve the bound. The fact that the Cram´er–Rao bound is achieved in the high-SN/LSA limit, but not beyond it, can also be seen in the light of Eq. (15), which encodes a standard form for the estimation problems in the exponential family. To −(s−h0,s−h0)/2 and Z(θ) = eθjθk(hj ,hk)/2; it remains to express the LSA likelihood in this form, we can set m(s) = e establish that −lj(θ)ˆθML (s) = θj(hj, s − h0), j (25) which is satisfied by lj(θ) = −(hj, hk)θk [see Eq. (23)]. Now, if additional terms are added to Eq. (22), beginning (s) comes to be a nonlinear function of the signal, such that no −lj(θ) can multiply with terms cubic in the θi, ˆθML it to in the right way to reconstruct the likelihood. It then follows that the estimation problem moves outside the exponential family, and the Cram´er–Rao bound cannot be achieved. It is possible (but perhaps not desirable, as we shall see shortly) to modify the ML estimator to take into account prior knowledge about the expected distribution of sources. The resulting maximum-posterior estimator ˆθMP is defined as the mode of the posterior probability p(θ|s) = p(s|θ)p(θ)/p(s), j This is a biased estimator: in the high-SN/LSA limit, and with a Gaussian prior p(θ) ∝ exp{−Pij(θi−θP centered at θP (the only prior that can be easily handled analytically), we find ˆθMP = maxlocθ p(θ|s) = maxlocθ p(s|θ)p(θ). −1 (Pjk/A2) θP (¯hi, ¯hj) + Pij /A2 bMP i = θMP i (27) thus the ˆθMP becomes unbiased for A → ∞ (indeed, in that limit ˆθMP tends to ˆθML). For the frequentist variance we find (LSA/Gaussian prior); n = k i )(θj −θP (26) j )/2} − ˆθMP i ˆθMP j n ˆθMP i n ˆθMP j n = ˆθMP i 1 A2 = j − bMP ˆθMP bMP j (¯hi, ¯hk) + Pik/A2 −1 (¯hk, ¯hl) n i −1 (¯hl, ¯hj) + Plj /A2 (LSA w/prior), (28) which coincides9 with the generalized Cram´er–Rao bound of Eq. (12), proving that the estimation problem defined by the LSA likelihood and ˆθMP belongs to the exponential family. 9 Note that the Fisher matrix that must be substituted into Eq. (12) is still −∂j ∂kp(s|θ)n = (hj, hk), and not −∂j ∂k[p(s|θ)p(θ)]n = (hj|hk) + Pjk. The prior distribution does not concern the Cram´er–Rao bound, which is computed from the likelihood alone for a fixed known value of the true source parameters. Instead, we happen to be using an estimator that takes into in account prior information, which enters into the Cram´er–Rao bound via the derivative of the bias.
The reason why ˆθMP is not too useful to characterize future parameter-estimation performance is that we expect a reasonable measure of error to converge to the effective width of the prior in the limit of vanishing signal strength. Instead, in the absence of any information from the experiment, ˆθMP becomes stuck at the mode of the prior, and its variance [in Eq. (28)] tends to zero. This behavior occurs for any nonuniform prior.10 8 E. Third-road Derivation of the Bayesian high-SNR/LSA result We now wish to show that in any single experiment, if the high-SNR/LSA limit is warranted (and if the parameter priors are uniform over the parameter region of interest), the inverse Fisher-information matrix yields the variance of the Bayesian posterior probability distribution. To do so, we rewrite Eq. (18) in terms of normalized parameters ¯θi = A θi, p(s|θ) ∝ exp − (¯hj , ¯hk)¯θj ¯θk + −(n, n)/2 + We can build the variance from the posterior mean 1 A (n, ¯hj)¯θj + 1 (n, ¯hjk)¯θj ¯θk/2 + A 1 (¯hj, ¯hkl)¯θj ¯θk ¯θl + A2 (¯hjk, ¯hlm)¯θj ¯θk ¯θl ¯θm/4 + ≡ ¯θi p ¯θi p(s|θ) dθ 1 A2 (n, ¯hjkl)¯θj ¯θk ¯θl/3! + O(1/A3) 2 A2 (¯hj, ¯hklm)¯θj ¯θk ¯θl ¯θm/3! + O(1/A3) /2 . (29) (30) and the quadratic moment ¯θi ¯θj (31) where “·p” denotes integration over p(s|θ). The idea is to proceed in perturbative fashion, writing the moments as series in  = 1/A: taking ¯θip as an example, ∞ p = ¯θi ¯θj p(s|θ) dθ p(s|θ) dθ ⇒ ¯θi(n) p = ∂n¯θip ∂n . =0 Since  appears at both the numerator and denominator of Eq. (30), we write ¯θi p = n=0 ¯θi p = ¯θi p(0) dθ +  p(0) dθ +  ∂p(0) ¯θi ∂p(0) ∂ dθ + 2 ∂ dθ + 2 2 2 ∂2p(0) ¯θi ∂2 dθ + ··· ∂2 dθ + ··· ∂2p(0) (where the argument of p implies that the (n)-th derivative is evaluated at  = 0), and therefore p(s|θ) dθ (n) p ¯θi n n! (32) (33) (34) (0) (1) ¯θi ¯θi p = p = . . . ¯θi p(0) dθ p(0) dθ, ¯θi ∂p(0) ∂ dθ − ¯θi p(0) dθ × ∂p(0) ∂ dθ p( = 0) dθ; similar expressions hold for ¯θi ¯θjp, and a general expression for the (n)-th–order contribution is given in Sec. VII B. The  → 0 limit coincides with the limit of large signal strengths, or of vanishing derivatives higher than the first, since in that case Eq. (29) truncates to Eq. (22). In this limit, (0) p = (¯hi, ¯hj) ¯θi p = ¯θi −1(n, ¯hj) (LSA) (35) 10 For uniform priors (e.g., rectangular distributions corresponding to the allowed parameter ranges), ˆθMP actually becomes undefined in the A → 0 limit.
分享到:
收藏