Principio de economía - CAPÍTULO VIII. PRINCIPIOS DE ADAM SMITH

8. CAPÍTULO VIII. PRINCIPIOS DE ADAM SMITH

8.4. Principio de economía

First, we will discuss several equivalent formulations of (5.12). We will adopt the nomen-clature and terminology used in [63]. In this framework, the optimization problem solved in (5.12) is referred to as basis pursuit de-noising (BPDN), or BP_σ. This problem solved in the noise-free setting withσ = 0 is called simply basis pursuit (BP), and its solution is denoted ˆx_BP. The theory of Lagrange multipliers indicates that we can solve an uncon-strained problem that will yield the same solution, provided that the Lagrange multipler is selected correctly. We will refer to this unconstrained problem as1 penalized quadratic program and denote it as QP_λ. Similarly, we can solve a constrained optimization problem, but with the constraint placed on the1norm of the unknown vector instead of the2norm of the reconstruction error, to obtain yet a third equivalent problem. We will use the name LASSO [64], popular in the statistics community, interchangeably with the notation LS_τ for this problem. The three equivalent optimization problems can be written as

(BPσ) ˆx_σ = argmin

x x1 subject to Ax − y2≤ σ (5.18) (QP_λ) ˆx_λ= argmin

x λ x1+ Ax − y²₂ (5.19)

(LSτ) ˆx_τ = argmin

x Ax − y2subject to x1 ≤ τ (5.20) We note that a fourth problem formulation known as the Dantzig selector also appears in the literature [65] and can be expressed as

(DSζ) ˆx_ζ = argmin

x x1 subject to A^H(Ax − y)_∞≤ ζ (5.21) but this problem does not yield the same set of solutions as the other three. For a treatment of the relationship between DS_ζ and the other problems, see [41].

The first three problems are all different ways of arriving at the same set of solutions.

To be explicit, the solution to any one of these problems is characterized by a triplet of values(σ, λ, τ) which renders ˆxσ = ˆxλ = ˆxτ. Unfortunately, it is very difficult to map

Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 169

5.3 SR Algorithms 169

the value of one parameter into the values for the other two. However, once a solution to one problem is available, we can calculate (to at least some accuracy) the parameters for the other two solutions.²²

First, notice that only a certain range of parameters makes sense. Consider solving BP_σ with σ = y2. The solution to this problem is obviously ˆx_σ = 0. Any larger value ofσ will yield the same solution. Similarly, imagine solving LSτwithτ = ˆxBP1. (Recall that ˆxBPis the solution to BP_σ withσ = 0.) In other words, this is the minimum

1 solution such that A ˆx_BP = y. Any larger value of τ will produce the same solution.

Thus, the solution with x = 0 corresponds to a large value of λ, while the solution ˆxBP

corresponds to the limit of the solution to QP_λasλ approaches zero. Values outside this range will not alter the resulting solution.

The fact that the BP solution is the limit of the solution to QP_λ is important. The algorithms that solve the unconstrained problem cannot be used to precisely compute BP solutions. Algorithms that solve QP_λ exhibit a fundamental deficiency in solving BP, as can be seen by their phase transition. See [66] for results on this issue. Notice that this problem does not arise when dealing with noisy data and solving the problem forσ > 0, as the corresponding positiveλ then exists. We will emphasize recovery from noisy data throughout this chapter. In contrast, much of the CS literature centers around solving the noise-free BP problem. From a coding or compression standpoint, this makes a great deal of sense. This distinction,σ > 0 vs. σ = 0, colors our discussion, since algorithms that work beautifully for BPDN may work poorly for BP and vice versa. Indeed, an example would be the approximate message passing (AMP) algorithm [66], whose development was at least partially motivated by the inability of algorithms like Fast Iterative Shrinkage-Thresholding Algorithm (FISTA) to solve the BP problem exactly.

We can create a plot of Aˆx − y2 versus ˆx1 which is parametrized byλ (or by τ or σ ) to obtain what is known as the Pareto frontier for our problem of interest. We will denote the Pareto frontier asφ(τ). This curve represents the minimum 2 error that can be achieved for a given1 bound on the solution norm. Pairs above this curve are sub-optimal, and pairs below the curve are unattainable. It turns out that this curve is convex. Furthermore, for a given point on the curve, the three parameters associated with the corresponding solution ˆx are given byφ(τ) = σ = Aˆx − y2,τ = ˆx1, andλ is related to the slope of the Pareto curve at that point [63]. In particular, the slope of the Pareto curve can be calculated explicitly from the solution ˆx at that point as

φ(τ) = − A^Hr

∞

where r = y − Aˆxτ [63]. This expression is closely related toλ, which is given by λ = 2A^Hr_∞, as shown in [67].²³These results are proven and discussed in detail in [63].

Thus, much like the L-curve [68,69] that may be familiar from Tikhonov regularization, the parameterλ can be viewed as a setting which allows a tradeoff between a family of Pareto optimal solutions. An example Pareto frontier plot is shown in Figure 5-6. In the figure, we have labeled the values of the end points already discussed.

22A good discussion of the numerical issues in moving between the parameters is provided in [61]. In a nutshell, determiningλ from the solution to one of the constrained problems is fairly difficult. The other mappings are somewhat more reliable.

23Note that the factor of 2 stems from the choice to not include a 1/2 in the definition of ˆx_λin (5.19).

Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 170

170 C H A P T E R 5 Radar Applications of Sparse Reconstruction

0 2 4 6 8 10 12 14

FIGURE 5-6 An example of the Pareto frontier for the linear model. Points above the curve are suboptimal for any choice of the parameters, and those below the curve represent an unattainable combination of the two cost function terms. At a given point on the curve, σ = Aˆx − y2,τ = ˆx1, andλ is the related to the slope of the curve. The example was generated using the SPGL1 software [63].

If A is orthogonal, then we can approximately mapλ = σ√

2 log N [63,70]. Other-wise, it is very difficult to determine the value of one parameter given another without first solving the problem, as discussed at length in [61]. This is significant, because the parameter is often easier to choose based on physical considerations for the constrained problems, particularly BP_σ, but the constrained problems are generally harder to solve. As a result, many algorithms solve the unconstrained problem and accept the penalty of more difficult parameter selection. As already mentioned, this issue can be somewhat alleviated by solving the problem for a series of parameter values using a warm-starting or continua-tion approach. As we shall see in Seccontinua-tion 5.4, the unconstrained problem is also beneficial in that we can tack on additional penalty terms to enforce various solution properties and still obtain relatively simple algorithms. Indeed, this addition of multiple penalty terms in a somewhat ad hoc, albeit effective, manner is common in practice [71–73].

Nonetheless, understanding the Pareto frontier and the relationships between the var-ious forms of the optimization problem is highly instructive in interpreting the results. In addition, this explicit mapping between the three problems forms the foundation of the first algorithm discussed in the following section.

5.3.1.2 Solvers

In the last several years, a plethora of solvers has been created for attacking the three

1minimization problems defined in the previous section. We will mention a handful of those approaches here. Our emphasis will be on fast algorithms that do not require explicit access to A and can handle complex-valued signals.

Our first example is SPGL1 [63], the algorithm whose primary reference inspired the discussion of the Pareto frontier in the previous section. This algorithm seeks solutions to BP_σ, which is as we have already mentioned more difficult than solving QP_λ. The algorithm

Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 171

5.3 SR Algorithms 171

takes special advantage of the structure of the Pareto frontier to obtain the desired solution.

In particular, van den Berg and Friedlander develop a fast projected gradient technique for obtaining approximate solutions to LS_τ. The goal is to approximately solve a sequence of these LASSO problems so thatτ0, τ1, . . . τk approachesτσ, which is the value for τ which renders the problem equivalent to BP_σ. While slower than solving the unconstrained problem, the fast approximate solutions to these intermediate problems allow the algorithm to solve the BP_σ in a reasonable amount of time.

Let us consider a step of the algorithm starting withτk. First, we compute the corre-sponding solution ˆx^k_τ. As discussed already, this provides both the value and an estimate of the slope of the Pareto curve as

φ(τk) =A ˆx^k_τ− y₂ φ(τk) = −

A^Hr^k

r^k₂

∞

r^k = Aˆx^k_τ− y (5.22)

We will choose the next parameter value asτk+1= τk+τk. To computeτk, the authors of [63] apply Newton’s method. We can linearize the Pareto curve atτkto obtain

φ(τ) ≈ φ(τk) + φ(τk)τk (5.23) We set this expression equal toσ and solve for the desired step to obtain

τk = σ − φ(τk)

φ(τk) (5.24)

The authors of [63] provide an explicit expression for the duality gap, which provides a bound on the current iteration error, and prove several results on guaranteed convergence despite the approximate solution of the sub-problems. Further details can be found in [63], and a MATLAB implementation is readily available online. We should also mention that the SPGL1 algorithm can be used for solving more general problems, including weighted norms, sums of norms, the nuclear norm for matrix-valued unknowns, and other cases [74].

We will now discuss two closely related algorithms that were developed in the radar community for SAR imaging for solving generalizations of QP_λ. The algorithms can be used to solve the1 problem specifically, and hence inherit our RIP-based performance guarantees, but they can also solve more general problems of potential interest to radar practitioners. First, we will consider the algorithm developed in [75] which addresses the modified cost function

ˆx= argmin

x λ1x^p_p+ λ2D|x|^pp+ Ax − y²₂ (5.25) where D is an approximation of the 2-D gradient of the magnitude image whose voxel values are encoded in|x|.

This second term, for p= 1, is the total variation norm of the magnitude image. The TV norm is the1 norm of the gradient. In essence, this norm penalizes rapid variation and tends to produce smooth images. As Cetin and Karl point out, this term can help to eliminate speckle and promote sharp edges in SAR imagery. Indeed, TV minimization has seen broad application in the radar, CS, and image processing communities [76]. Notice

Melvin-5220033 book ISBN : 9781891121531 September 14, 2012 17:41 172

172 C H A P T E R 5 Radar Applications of Sparse Reconstruction

in (5.25) that the TV norm of the magnitude of the image rather than the complex-valued reflectivity, is penalized. This choice is made to allow rapid phase variations.²⁴ Notice also that thep norm is used with 0< p ≤ 2. As we have mentioned, selecting p < 1 can improve performance but yields a nonconvex problem. Cetin and Karl replace thep

terms in the cost function with differentiable approximations, derive an approximation for the Hessian of the cost function, and implement a quasi-Newton method.

Kragh [77] developed a closely related algorithm (for the case withλ2 = 0) along with additional convergence guarantees leveraging ideas from majorization minimization (MM).²⁵For the case with no TV penalty, both algorithms²⁶ end up with an iteration of the form

ˆx_k+1=A^HA+ h( ˆxk)⁻¹A^Hy (5.26) where h(·) is a function based on the norm choice p. The matrix inverse can be implemented with preconditioned conjugate gradients to obtain a fast algorithm. Notice that A^HA often represents a convolution that can be calculated using fast Fourier transforms (FFTs). A more detailed discussion of these algorithms and references to various extensions to radar problems of interest, including nonisotropic scattering, can be found in [17].

These algorithms do not begin to cover the plethora of existing solvers. Nonetheless, these examples have proven useful in radar applications. The next section will consider thresholding algorithms for SR. As we shall see, these algorithms trade generality for faster computation while still providing solutions to QP_λ.

In document El contador público y las contribuciones en México (página 149-162)