7. CAPÍTULO VII. SUJETOS OBLIGADOS AL PAGO DE
7.3. Impuestos
We now turn our attention to the inverse problem of estimating xtruefrom the noisy data y. As mentioned in Section 5.2.1, sinceδ < 1, the problem is underdetermined. Let us consider for a moment the noise-free case with e= 0. Then, y = Axtrue. Since M < N, the matrix A has a nullspace defined as null(A) = {x : Ax = 0}. Notice that for all x in the nullspace of A, we see that A(x + xtrue) = 0 + y = y. Thus, we cannot hope to identify xtrueuniquely from y given no other information, since there are infinitely many signals that will result in the same set of collected data.
In order to obtain a unique solution to this problem, we need to regularize the problem by adding more assumptions about the unknown signal xtrue. One common approach is to find the solution with the minimum energy or2norm, in which case we would estimate the signal as
ˆx = argmin
x x2 subject to Ax= y
This formulation yields a unique solution which can be computed in closed form as ˆx = A+y, where A+is the Moore-Penrose pseudo-inverse10 of A [20]. While a unique solution is obtained, this low energy assumption may not be reasonable for some problems of interest. To illustrate this point, consider a simple example with N = 256, M = 75, and s = 15. For reasons that will become clear later, the matrix A contains entries drawn randomly from the Gaussian distribution N (0, 1). Figure 5-4(a) shows the true signal and the minimum-energy reconstruction. The2 norm penalizes large values, so the resulting estimate contains many small coefficients. Put another way, among the infi-nite number of possible solutions, the2 regularization favors solutions which use com-binations of several atoms with small coefficients in favor of a few large coefficients.
Unfortunately, the signal of interest has precisely this feature—a small number of large coefficients.
Our original motivation for CS techniques was to exploit signals that could be ex-pressed with just a few nonzero coefficients. As such, it seems that we should look for the solution ˆx with the smallest0norm. So, we might try to solve the problem
ˆx = argmin
x x0 subject to Ax= y
Unfortunately, the0norm is not differentiable or even continuous. This problem is in fact NP-hard to solve. Put another way, we would need to test every possible combination of active coefficients to verify that we have found the one with the smallest0norm that allows the signal y to be represented. This combinatorial complexity is intractable for problems of interesting size, where x may contain thousands or even millions of unknowns. As a result, we must explore practical methods for performing sparse reconstruction (SR), which is the estimation of an unknown signal that is assumed to be sparse.
10Given the SVD as A= UVH, the Moore-Penrose pseudo-inverse is calculated as A+= V+UH. Since is a diagonal matrix of singular values, +is obtained by simply transposing the matrix and then taking the reciprocals of the nonzero diagonal elements. When A has full column rank, the pseudo-inverse can be computed with the perhaps more familiar expression A+= (AHA)−1AH.
Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 159 entries of A are generated randomly from a Gaussian distribution and then scaled to yield unit-norm columns. The true signal is shown with circles, while the estimate is shown with crosses. Pane (a) shows the minimum2reconstruction, while pane (b) shows the minimum
1reconstruction. A very similar example was shown in [21]. The reconstructions were computed using theCVX software package [22].
5.2.4 1 Regularization
In Section 5.3 we will explore numerous algorithms for SR. Here, we will explore a problem formulation that motivates many of these algorithms. Let us return to considering the noisy data case where e= 0 given in (5.1). In this setting, we would like to find the solution ˆx given by
ˆx = argmin
x x0 subject to Ax − y2 ≤ σ (5.11) However, this problem is once again NP-hard and effectively impossible to solve for prob-lems of interest in radar signal processing. As mentioned earlier, the issue is that the0norm is not amenable to optimization. Figure 5-1 provides an intuitive alternative: we can replace the intractable0 norm with a similar norm for which optimization is simpler. We have already seen that the2norm provides one possibility, but the resulting solutions tend to be nonsparse. Instead, we will consider the convex relaxation [23] of (5.11) using the1norm:
ˆxσ = argmin
x x1 subject to Ax − y2 ≤ σ (5.12) We will refer to this convex optimization problem as Basis Pursuit De-Noising (BPDN).11 By virtue of being a convex cost function with a convex constraint, the problem described in (5.12) does not suffer from local minima, and a variety of mature techniques exist for solving the problem in polynomial time [24]. Figure 5-4(b) shows the reconstruction of our simple example signal with an1penalty. In this noise-free case, the signal is reconstructed perfectly using the1-based cost function. Notice that this optimization problem has an obvious parameter,σ, that could be varied to obtain different solutions. We will explore this idea in depth in Section 5.3.1.1.
11See Section 5.3.1.1 for details on our naming convention.
Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 160
160 C H A P T E R 5 Radar Applications of Sparse Reconstruction
Regularization using the1norm has a long history, for example, [25]. We shall dis-cuss several formulations of the problem described in (5.12) and algorithms for solving it in Section 5.3.1. When the problem is solved with an2penalty in place of the1norm, the result is termed Tikhonov regularization [26],12which is known in the statistics com-munity as ridge regression [28]. This formulation has the advantage of offering a simple, closed-form solution that can be implemented robustly with an SVD [20]. Unfortunately, as in the noise-free case, this approach does not promote sparsity in the resulting solutions.
We mention Tikhonov regularization because it has a well-known Bayesian interpretation using Gaussian priors. It turns out that the1-penalized reconstruction can also be derived using a Bayesian approach.
To cast the estimation of xtruein a Bayesian framework, we must adopt priors on the signal and disturbance. First, we will adopt a Laplacian prior13 on the unknown signal xtrueand assume that the noise e is circular Gaussian with known covariance, that is,
e∼ CN (0, )
where the normalization constant on p(xtrue) is omitted for simplicity. Given no other information, we could set = I, but we will keep the generality. We can then find the MAP estimate easily as
wherex2 = xH−1x. The resulting optimization problem is precisely what we would expect given the colored Gaussian noise prior. Since is a covariance matrix, and hence positive definite and symmetric, the problem is convex and solvable with a variety of techniques. In fact, we can factor the inverse of the covariance using the Cholesky decom-position as−1= RHR to obtain
12An account of the early history of Tikhonov regularization, dating to 1955, is given in [27].
13Recent analysis has shown that, while the Laplacian prior leads to several standard reconstruction algorithms, random draws from this distribution are not compressible. Other priors leading to the same
1penalty term but yielding compressible realizations have been investigated. See [29] for details.
Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 161
5.2 CS Theory 161
where ¯A = R A, and ¯y = R y. This problem is equivalent to (5.12) when λ is chosen correctly, as detailed in Section 5.3.1.1. Readers familiar with adaptive processing will recognize the application of R as a pre-whitening step. Indeed, this processing is the1
version of the typical pre-whitening followed by matched filtering operation used in, for example, STAP [15,30].
Returning to the geometric interpretation of the problem, examination of Figure 5-1 provides an intuitive geometric reason that the1 norm is effective for obtaining sparse solutions. In particular, sparse solutions contain numerous zero values and thus lie on the coordinate axes in several of their dimensions. Since the1unit ball is “spiky” (i.e., more pointed along the coordinate axes than the rounded2norm), a potential solution x with zero entries will tend to have a smaller1 norm than a non-sparse solution. We could of course consider p < 1 to obtain ever more “spiky” unit balls, as is considered in [31].
Using p < 1 allows sparse signals to be reconstructed from fewer measurements than p= 1, but at the expense of solving a non-convex optimization problem that could feature local minima.
This geometric intuition can be formalized using so-called tube and cone constraints as described in, for example, [1]. Using the authors’ terminology, the tube constraint follows from the inequality constraint in the optimization problem represented in equation (5.12):
A(xtrue− ˆxσ)2 ≤Axtrue− y2+ Aˆxσ − y2
≤ 2σ
The first line is an application of the triangle inequality satisfied by any norm, and the second follows from the assumed bound on e and the form of (5.12). Simply put, any vector x that satisfiesAx − y2 ≤ σ must lie in a cylinder centered around Axtrue. When we solve the optimization problem represented in equation (5.12), we choose the solution inside this cylinder with the smallest1norm.
Since ˆxσ is a solution to the convex problem described in (5.12) and thus a global minimum, we obtain the cone constraint14 ˆxσ1 ≤xtrue1. Thus, the solution to (5.12) must lie inside the smallest1ball that contains xtrue. Since this1ball is “spiky”, our hope is that its intersection with the cylinder defined by the tube constraint is small, yielding an accurate estimate of the sparse signal xtrue. These ideas are illustrated in two dimensions in Figure 5-5. The authors of [1] go on to prove just such a result, a performance guarantee for CS. However, sparsity of the true signal is not enough by itself to provide this guarantee.
We will need to make additional assumptions on the matrix A.