The expectation of this matrix is
E
Result 6.4. The quadratic frequentist bound for the Fisher score is the Cram´er–Rao bound:
164 Performance bounds for parameter estimation
When the frequentist bias is zero, then the Cram´er–Rao bound is
Q() ≥ J1F(), (6.51)
where JF() is the Fisher information.
This bound is sometimes also called the deterministic Cram´er–Rao bound, in order to differentiate it from the corresponding Bayesian (i.e., stochastic) bound in Result6.7. It is the most celebrated of all quadratic frequentist bounds.
If repeated measurements carry information about one fixed and deterministic param-eter through the product pdf'M
i=1 p(yi), then the sensitivity matrix remains fixed and the Fisher information matrix scales with M. Consequently the Cram´er–Rao bound decreases as M−1.
6.3.1 Nuisance parameters
There are many ways to show the effect of nuisance parameters on error bounds, and some of these are given by Scharf (1991, pp. 231–233). But perhaps the easiest, if not most general, way to establish the effect is this. Begin with the Fisher informa-tion matrix J() and the corresponding Cram´er–Rao bound for frequentist unbiased estimators, Q() ≥ J−1(). (In order to simplify the notation we do not subscript J as JF.) The (i, i)th element of J() is Jii() and the (i, i)th element of J−1() is denoted by (J−1)ii().
From the definition of a score we see that Jii() is the Fisher information for the ith element of , when only the ith parameter in the parameter vector is unknown.
The Cauchy–Schwarz inequality says that (yHJy)(xHJx)≥ |yHJx|2. Choose x= uk
and y= J−1uk, with uk the kth Euclidean basis vector. Then (J−1)ii()Jii() ≥ 1, or (J−1)ii() ≥ 1/Jii(). These results actually generalize to show that any r-by-r-dimensional submatrix of the p-by- p inverse J−1() is more positive definite than the inverse of the corresponding r -by-r Fisher matrix J(). So nuisance parameters increase the Cram´er–Rao bound.
6.3.2 The Cram ´er–Rao bound in the proper multivariate Gaussian model
In Result2.5, the pdf for a proper complex Gaussian random variable y: −→ Cnwas shown to be
p(y)= 1 πndet Ryy
exp{−(y − y)HR−1yy(y− y)}, (6.52) whereyis the mean and Ryy the Hermitian covariance matrix of y. For our purposes we assume that the mean value and the Hermitian covariance matrix are both functions of the unknown parameter vector , even though we have not made this dependence explicit in the notation. We shall assume that we have drawn M independent copies of the random vector y from the pdf p(y), so that the logarithm of the joint pdf of
6.3 Fisher score and the Cram ´er–Rao bound 165
Using the results of Appendix2for Wirtinger derivatives, in particular the results for differentiating logarithms and traces in SectionA2.2, we may express the j th element of the centered measurement score s(Y) as
sj(Y)= −M ∂ It is a simple matter to show that ∂Syy/∂θ∗j has mean-value zero. So to compute the Hessian term −E[(∂/∂θi)sj(Y)] we can ignore any terms that involve a first partial derivative of Syy. The net result, after a few lines of algebra, is
JF,i j = −E
This is the general formula for the (i, j)th element of the Fisher information matrix in the proper multivariate Gaussian experiment that brings information about in its mean and covariance. The real version of this result dates at least toSlepian (1954) and the complex version toBangs (1971). There are a few special cases. If the covariance matrix Ryy is independent of then the first term vanishes. If the mean yis independent of
∗then the second term vanishes, and if it is independent of, the third term vanishes.
6.3.3 The separable linear statistical model and the geometry of the Cram ´er–Rao bound
To gain geometrical insight into the Fisher matrix and the Cram´er–Rao bound, we now apply the results of the previous subsection to parameter estimation in the linear model y= H + n, where n is a zero-mean proper Gaussian with covariance Rnnindependent
166 Performance bounds for parameter estimation
gi
PGigi
ri Gi
Figure 6.4 Geometry of the Cram´er–Rao bound in the separable statistical model with multivariate Gaussian errors; the variance is large when mode gi lies near the subspace Gi of other modes.
of. We shall find that it is the sensitivity of noise-free measurements to small variations in parameters that determines the performance of an estimator.
The partial derivatives are∂y/∂θi = hi, where hi is the i th column of H and the Fisher matrix is JF= MHHR−1nnH. The Cram´er–Rao bound for frequentist-unbiased estimators is thus
Q() ≥ 1
M(HHR−1nnH)−1. (6.56)
With the definition G= Rnn−1/2H, the (i, i)th element may be written as Qii() ≥ 1
M 1 gHigi
gHigi
gHi(I− PGi)gi
= 1 M
1 gHi gi
1
sin2ρi, (6.57) where PGiis the projection onto the subspaceGi spanned by all but the ith mode in G, ρi is the angle that the mode vector gimakes with this subspace, and gHi (I− PGi)gi/(gHigi) is the sine-squared of this angle. Thus, as illustrated in Fig.6.4, the lower bound on the variance in estimatingθi is a large multiple of (MgHigi)−1 when the i th mode can be linearly approximated with the other modes in Gi. For closely spaced modes, only a large number of independent samples or a large value of gHi gi – producing a large output signal-to-noise ratio MgHigi– can produce a small lower bound. With low output signal-to-noise ratio and closely spaced modes, any estimator ofθiwill be poor, meaning that the resolution in amplitude of the i th mode will be poor. This result generalizes to mean-value vectors more general than H, on replacing hiwith∂/∂θi.
Example 6.2. Let the noise covariance matrix in the proper multivariate Gaussian model be Rnn= σ2I and the matrix H= [A1, A2] withk = [1, ejφk, . . ., ej(n−1)φk]T. We call k a complex exponential mode with mode angle φk and A a complex mode amplitude. The Cram´er–Rao bound is
Qii() ≥ 1 M
1 n|A|2/σ2
1
1− ln2(φ1− φ2), (6.58) where
0≤ l2n(φ) = 1 n2
sin2(nφ/2)
sin2(φ/2) ≤ 1 (6.59)
6.3 Fisher score and the Cram ´er–Rao bound 167
is the Lanczos kernel. In this bound, snr= |A|2/σ2 is the per-sample or input signal-to-noise ratio, SNR= nsnr is the output signal-to-noise ratio and 1 − l2n(φ1− φ2) is the sine-squared of the angle between the subspaces 1 and 2. This bound is (MSNR)−1 atφ1− φ2= 2π/n, and this angle difference is called the Rayleigh limit to resolution. It governs much of optics, radar, sonar, and geophysics, even though it is quite conservative in the sense that many independent samples or large input SNR can override the aperture effects of the Lanczos kernel.2
This example bounds the error covariance matrix for estimating the linear param-eters in the separable linear model, not the nonlinear paramparam-eters that would deter-mine the matrix H(). Typically these parameters would be frequency, wavenumber, delay, and so on. The Cram´er–Rao bound for these parameters then depends on terms like∂hi/∂θ∗j.3
6.3.4 Extension of Fisher score and the Cram ´er–Rao bound to improper errors and scores
In order to extend the Cram´er–Rao bound to improper errors and scores, we need only compute the complementary expansion-coefficient matrix T() = E[s(y)eT(y)] and complementary Fisher information matrix ˜J() = E[s(y)sT(y)]. To this end, consider the following conjugate partial derivative of the bias:
∂ fre-quentist unbiased estimators, the complementary expansion-coefficient matrix is zero, and the augmented expansion-coefficient matrix is T() = I.
For the complementary Fisher information, consider the p× p Hessian
∂ Taking expectations, we find the identity
˜JF() = E[s(y)sT(y)]= −E
168 Performance bounds for parameter estimation
which is the complementary dual to (6.45). The augmented Cram´er–Rao bound for improper error and measurement scores is the bound of Result6.3, applied to the Fisher score.
Result 6.5. For frequentist-unbiased estimators, the widely linear Cram´er–Rao bound