Niveles de satisfacción y frustración de las NPB
4. Anexos
4.10. Anexo 10: Glosario 2017
parameter kDP or λDP is very sensitive to the accuracy of the estimate fore2. In particular, a too small estimate can lead to dramatic undersmoothing (because k is chosen too large, and λ is too small).
In terms of the L-curve from Figure 4.10, the regularization parameter λDP
chosen by the discrepancy principle corresponds to that point where the L-curve that intersects a vertical line located at A xλ − b2 = νdpe2. Now recall that the L-curve’s almost vertical part is actually located at A xλ− b2 ≈ e2; clearly an underestimate of the error norm in the discrepancy principle will force the intersection of the L-curve and the vertical line to a point with a large solution norm (i.e., a highly undersmoothed solution). Even a small underestimate of the error norm can have a large impact on the regularization parameter, making this a somewhat risky parameter-choice method for real data. We return to examples of this issue in Section 5.6.
Several variants of the discrepancy principle are discussed in [32, Section 7.2];
none of these variants seem to be used much in practice.
5.3 The Intuitive L-Curve Criterion
The overall behavior of the L-curve from Section 4.7 is closely related to the size of the regularization error Δxbias and the perturbation error Δxpert. Therefore, one can learn a lot about the solution’s dependence on the regularization error by studying the L-curve. Here, for simplicity, we consider the TSVD method; recall that kη denotes the index at the transition between decaying and flat|uTi b|-coefficients.
• When k is smaller than kη, then the regularization error dominates in xk, and thus the overall behavior of its norm is largely determined by how many (or few) SVD components are effectively included. The fewer the components, the smaller the solution normxk2and the larger the residual normA xk− b2. Using the analysis from the previous subsection (see also Figure 5.2), we conclude that these norms depend roughly on k as follows:
– The solution normxk2is almost a constant given byxexact2except for very small k wherexk2gets smaller as k → 0.
– The residual norm A xk − b2 increases as k → 0, until it reaches its maximum valueb2for k = 0.
• When k is larger than kη, then the perturbation error dominates xk, and the overall behavior of its norm is determined by this component. Now the norms depend on k in the following way:
– The solution normxk2increases (sometimes dramatically) as k increases, because more and more and increasingly larger error components are in-cluded.
– The residual normA xk− b2, on the other hand, stays almost constant at the approximate valuee2(except when k ≈ n).
From this analysis it follows that the L-curve must have two distinctly different parts—
one part where it is quite flat (when the regularization error dominates), and one part
Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
92 Chapter 5. Getting Serious: Choosing the Regularization Parameter that is more vertical (when the perturbation error dominates). The log-log scale for the L-curve emphasizes the different characteristics of these two parts.
One important observation here is that the L-curve must also include a part with the transition between the vertical and horizontal parts. Very often this transition takes place in a small region, and then the L-curve will have a distinct corner between the two parts. Moreover, this transition region is associated with those values of λ or k for which the dominating error component changes between the perturbation error Δxbias and the regularization error Δxpert.
The key idea in the L-curve criterion is therefore to choose a value of λ that corresponds to the L-curve’s corner, in the hope that this value will provide a good balance of the regularization and perturbation errors. We emphasize that this line of arguments is entirely heuristic; but it is supported with some analysis. It is for this reason that we allow ourselves to label the L-curve criterion as “intuitive.”
We still need an operational definition of the corner of the L-curve, such that we can compute it automatically. For Tikhonov regularization, a natural definition of the corner is the point on the L-curve with maximum curvature. We emphasize that this curvature must be measured in the log-log system; i.e., we must compute the curvature ˆc of the curve ( logA xλ− b2, logxλ2).
At this stage it is most convenient to consider the L-curve for Tikhonov regu-larization. Following the analysis in Section 4.7, we introduce the quantities
ξ = logˆ xλ22 and ρ = logˆ A xλ− b22
such that the L-curve is given by (12ρ ,ˆ 12ξ ). Ifˆ denotes differentiation with respect to λ, then it follows that
ξˆ= ξ
ξ and ρˆ= ρ ρ.
When we use these relations, then it follows immediately that the second derivatives of ˆξ and ˆρ are given by
ξˆ=ξξ− (ξ)2
ξ2 and ρˆ= ρρ− (ρ)2 ρ2 . When we insert these relations into the definition of the curvature,
ˆ
cλ= 2 ρˆξˆ− ˆρξˆ
!(ˆρ)2+ ( ˆξ)2"3/2,
it turns out that we can express the curvature of the log-log L-curve entirely in terms of the three quantities ξ, ρ, and ξ:
ˆ
cλ= 2ξ ρ ξ
λ2ξρ + 2 λ ξ ρ + λ4ξ ξ (λ2ξ2+ ρ2)3/2 .
The two quantities ξ =xλ22and ρ =A xλ− b22 are often easy to compute, so it remains to demonstrate how to compute ξ efficiently. Using the SVD it is straightforward to verify that this quantity is given by
ξ= 4 λxλTzλ,
Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
5.3. The Intuitive L-Curve Criterion 93
10−2 100 102
100 101 102
L−curve
log || A x
λ − b ||
2
log || x λ || 2
10−6 10−4 10−2 100 0
0.5 1
x 10−3
λ Curvature
Figure 5.4. An L-curve and the corresponding curvature ˆcλ as a function of λ. The corner, which corresponds to the point with maximum curvature, is marked by the circle; it occurs for λL= 4.86· 10−3.
where the vector zλ is given by
zλ= (ATA + λ2I)−1AT(A xλ− b).
Recalling the formulas in Section 4.4 for the Tikhonov solution, we see immediately that zλ is the solution to the Tikhonov least squares problem
minz
A λ I
z−
A xλ− b 0
2
.
Therefore, this vector is easy to compute by solving a new Tikhonov problem with the same coefficient matrix and whose right-hand side is the residual vector A xλ− b.
The conclusion of this analysis is that for each value of λ, we can compute the curvature ˆcλ with little overhead, and thus we can use our favorite optimization routine to find the value λL that maximizes ˆcλ:
Choose λ = λL such that the curvature ˆcλ is maximum. (5.7) Figure 5.4 shows an L-curve and the corresponding curvature ˆcλas a function of λ.
The above analysis is not immediately valid for the TSVD method, because this method produces a finite (small) set of solutions for k = 1, 2, . . . , and thus the corresponding “L-curve” really consists of a finite set of points. However, the analysis and the arguments do carry over to this case, and it still often makes good sense to locate the TSVD solution at the overall corner of the discrete L-curve:
Choose k = kL at the overall corner of the discrete L-curve. (5.8) How to actually compute this corner is not as straightforward as it may seem, because a discrete L-curve can have many small local corners; we refer to the reader [9] and [37] for recent algorithms (Regularization Tools uses the latter one).
Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
94 Chapter 5. Getting Serious: Choosing the Regularization Parameter
0 5 10 15
10−6 10−3 100
α = 0
σi | u
i
T b | | u
i T b | / σi
0 5 10 15
10−6 10−3 100
α = 1
0 5 10 15
10−6 10−3 100
α = 4
Figure 5.5. Picard plots for the smooth exact solutions generated by Eq. (5.9), where xexactis from the shaw test problem, and α controls the smoothness (the larger the α, the faster the decay of the coefficients|viTxexact,α|. For all three values of α the L-curve criterion gives the truncation parameter kL= 10. While this is a good choice for α = 0, the smallest error in the TSVD solution is obtained for k = 8 for α = 1, and k = 6 for α = 4.
We conclude this discussion of the L-curve criterion with a few words of caution.
The criterion is essentially a (good) heuristic, and there is no guarantee that it will always produce a good regularization parameter, i.e., that λL produces a solution in which the regularization and perturbation errors are balanced.
One instance where the L-curve criterion is likely to fail is when the SVD com-ponents viTxexact of the exact solution decay quickly to zero (in which case xexactwill appear very smooth, because it is dominated by the first few—and smooth—right sin-gular vectors vi). The L-curve’s corner is located at the point where the solution norm becomes dominated by the noisy SVD components. Often this is precisely where the components start to increase dramatically; but for very smooth solutions this happens for larger k when we have included too many noisy components, and hence λL will lead to an undersmoothed solution.
Figure 5.5 illustrates this deficiency of the L-curve criterion with artificially cre-ated smooth solutions genercre-ated by the model
xexact,α=A−α2 (ATA)α/2xexact= V (Σ/σ1)αVTxexact, (5.9) such that the i th SVD component is damped by the factor (σi/σ1)α. The larger the α, the smoother the solution. For all three values of α used in Figure 5.5, the L-curve chooses kL = 10, while for α = 4 (the smoothest solution), the optimal choice is actually k = 6.
Another instance where the L-curve criterion can fail is when the change in the residual and solution norms is small for two consecutive values of k. Again, in this case, it is necessary to include many noisy SVD coefficients in the solution before its norm has increased so much that the corner appears on the L-curve. Unfortunately this behavior of the SVD coefficients is often observed for large-scale problems, and we illustrate this in Exercise 6.4.
Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
5.4. The Statistician’s Choice—Generalized Cross Validation 95 One final critical remark on the behavior of the L-curve criterion: If we let the noise go to zero, then it has been shown that the regularization parameter λL tends to diverge from the optimal one, and hence xλL may not converge to xexact as
e2 → 0. This undesired nonconverging behavior is not shared by the discrepancy principle (or the generalized cross validation (GCV) method discussed in the next section). Fortunately, noise that goes to zero is a rare situation in practice; usually we work with fixed data with a fixed noise level.