Anexo 10: Glosario 2017 - Niveles de satisfacción y frustración de las NPB

Niveles de satisfacción y frustración de las NPB

4. Anexos

4.10. Anexo 10: Glosario 2017

parameter k_DP or λ_DP is very sensitive to the accuracy of the estimate fore2. In particular, a too small estimate can lead to dramatic undersmoothing (because k is chosen too large, and λ is too small).

In terms of the L-curve from Figure 4.10, the regularization parameter λDP

chosen by the discrepancy principle corresponds to that point where the L-curve that intersects a vertical line located at A xλ − b2 = νdpe2. Now recall that the L-curve’s almost vertical part is actually located at A xλ− b2 ≈ e2; clearly an underestimate of the error norm in the discrepancy principle will force the intersection of the L-curve and the vertical line to a point with a large solution norm (i.e., a highly undersmoothed solution). Even a small underestimate of the error norm can have a large impact on the regularization parameter, making this a somewhat risky parameter-choice method for real data. We return to examples of this issue in Section 5.6.

Several variants of the discrepancy principle are discussed in [32, Section 7.2];

none of these variants seem to be used much in practice.

5.3 The Intuitive L-Curve Criterion

The overall behavior of the L-curve from Section 4.7 is closely related to the size of the regularization error Δxbias and the perturbation error Δxpert. Therefore, one can learn a lot about the solution’s dependence on the regularization error by studying the L-curve. Here, for simplicity, we consider the TSVD method; recall that kη denotes the index at the transition between decaying and ﬂat|u^Ti b|-coeﬃcients.

• When k is smaller than kη, then the regularization error dominates in xk, and thus the overall behavior of its norm is largely determined by how many (or few) SVD components are eﬀectively included. The fewer the components, the smaller the solution normxk2and the larger the residual normA xk− b2. Using the analysis from the previous subsection (see also Figure 5.2), we conclude that these norms depend roughly on k as follows:

– The solution normxk2is almost a constant given byx^exact2except for very small k wherexk2gets smaller as k → 0.

– The residual norm A xk − b2 increases as k → 0, until it reaches its maximum valueb2for k = 0.

• When k is larger than kη, then the perturbation error dominates xk, and the overall behavior of its norm is determined by this component. Now the norms depend on k in the following way:

– The solution normxk2increases (sometimes dramatically) as k increases, because more and more and increasingly larger error components are in-cluded.

– The residual normA xk− b2, on the other hand, stays almost constant at the approximate valuee2(except when k ≈ n).

From this analysis it follows that the L-curve must have two distinctly diﬀerent parts—

one part where it is quite ﬂat (when the regularization error dominates), and one part

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

92 Chapter 5. Getting Serious: Choosing the Regularization Parameter that is more vertical (when the perturbation error dominates). The log-log scale for the L-curve emphasizes the diﬀerent characteristics of these two parts.

One important observation here is that the L-curve must also include a part with the transition between the vertical and horizontal parts. Very often this transition takes place in a small region, and then the L-curve will have a distinct corner between the two parts. Moreover, this transition region is associated with those values of λ or k for which the dominating error component changes between the perturbation error Δxbias and the regularization error Δxpert.

The key idea in the L-curve criterion is therefore to choose a value of λ that corresponds to the L-curve’s corner, in the hope that this value will provide a good balance of the regularization and perturbation errors. We emphasize that this line of arguments is entirely heuristic; but it is supported with some analysis. It is for this reason that we allow ourselves to label the L-curve criterion as “intuitive.”

We still need an operational deﬁnition of the corner of the L-curve, such that we can compute it automatically. For Tikhonov regularization, a natural deﬁnition of the corner is the point on the L-curve with maximum curvature. We emphasize that this curvature must be measured in the log-log system; i.e., we must compute the curvature ˆc of the curve ( logA xλ− b2, logxλ2).

At this stage it is most convenient to consider the L-curve for Tikhonov regu-larization. Following the analysis in Section 4.7, we introduce the quantities

ξ = logˆ xλ²2 and ρ = logˆ A xλ− b²2

such that the L-curve is given by (¹₂ρ ,ˆ ¹₂ξ ). Ifˆ denotes diﬀerentiation with respect to λ, then it follows that

ξˆ= ξ

ξ and ρˆ= ρ ρ.

When we use these relations, then it follows immediately that the second derivatives of ˆξ and ˆρ are given by

ξˆ=ξξ− (ξ)²

ξ² and ρˆ= ρρ− (ρ)² ρ² . When we insert these relations into the deﬁnition of the curvature,

c_λ= 2 ρˆξˆ− ˆρξˆ

!(ˆρ)²+ ( ˆξ)²"3/2,

it turns out that we can express the curvature of the log-log L-curve entirely in terms of the three quantities ξ, ρ, and ξ:

cλ= 2ξ ρ ξ

λ²ξρ + 2 λ ξ ρ + λ⁴ξ ξ (λ²ξ²+ ρ²)^3/2 .

The two quantities ξ =xλ²2and ρ =A xλ− b²2 are often easy to compute, so it remains to demonstrate how to compute ξ eﬃciently. Using the SVD it is straightforward to verify that this quantity is given by

ξ= 4 λx_λ^Tzλ,

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

5.3. The Intuitive L-Curve Criterion 93

10⁻² 10⁰ 10²

10⁰ 10¹ 10²

L−curve

log || A x

λ − b ||

log || x λ || 2

10⁻⁶ 10⁻⁴ 10⁻² 10⁰ 0

0.5 1

x 10⁻³

λ Curvature

Figure 5.4. An L-curve and the corresponding curvature ˆcλ as a function of λ. The corner, which corresponds to the point with maximum curvature, is marked by the circle; it occurs for λL= 4.86· 10⁻³.

where the vector zλ is given by

z_λ= (A^TA + λ²I)⁻¹A^T(A x_λ− b).

Recalling the formulas in Section 4.4 for the Tikhonov solution, we see immediately that zλ is the solution to the Tikhonov least squares problem

minz

A λ I

z−

A xλ− b 0

Therefore, this vector is easy to compute by solving a new Tikhonov problem with the same coeﬃcient matrix and whose right-hand side is the residual vector A xλ− b.

The conclusion of this analysis is that for each value of λ, we can compute the curvature ˆcλ with little overhead, and thus we can use our favorite optimization routine to ﬁnd the value λ_L that maximizes ˆcλ:

Choose λ = λL such that the curvature ˆcλ is maximum. (5.7) Figure 5.4 shows an L-curve and the corresponding curvature ˆcλas a function of λ.

The above analysis is not immediately valid for the TSVD method, because this method produces a ﬁnite (small) set of solutions for k = 1, 2, . . . , and thus the corresponding “L-curve” really consists of a ﬁnite set of points. However, the analysis and the arguments do carry over to this case, and it still often makes good sense to locate the TSVD solution at the overall corner of the discrete L-curve:

Choose k = k_L at the overall corner of the discrete L-curve. (5.8) How to actually compute this corner is not as straightforward as it may seem, because a discrete L-curve can have many small local corners; we refer to the reader [9] and [37] for recent algorithms (Regularization Tools uses the latter one).

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

94 Chapter 5. Getting Serious: Choosing the Regularization Parameter

0 5 10 15

10⁻⁶ 10⁻³ 10⁰

α = 0

σ_i | u

T b | | u

i T b | / σ_i

0 5 10 15

10⁻⁶ 10⁻³ 10⁰

α = 1

0 5 10 15

10⁻⁶ 10⁻³ 10⁰

α = 4

Figure 5.5. Picard plots for the smooth exact solutions generated by Eq. (5.9), where xêxactis from the shaw test problem, and α controls the smoothness (the larger the α, the faster the decay of the coefficients|vi^Txêxact,α|. For all three values of α the L-curve criterion gives the truncation parameter kL= 10. While this is a good choice for α = 0, the smallest error in the TSVD solution is obtained for k = 8 for α = 1, and k = 6 for α = 4.

We conclude this discussion of the L-curve criterion with a few words of caution.

The criterion is essentially a (good) heuristic, and there is no guarantee that it will always produce a good regularization parameter, i.e., that λL produces a solution in which the regularization and perturbation errors are balanced.

One instance where the L-curve criterion is likely to fail is when the SVD com-ponents v_i^Txêxact of the exact solution decay quickly to zero (in which case xêxactwill appear very smooth, because it is dominated by the first few—and smooth—right sin-gular vectors v_i). The L-curve’s corner is located at the point where the solution norm becomes dominated by the noisy SVD components. Often this is precisely where the components start to increase dramatically; but for very smooth solutions this happens for larger k when we have included too many noisy components, and hence λ_L will lead to an undersmoothed solution.

Figure 5.5 illustrates this deﬁciency of the L-curve criterion with artiﬁcially cre-ated smooth solutions genercre-ated by the model

xêxact,α=A^−α2 (A^TA)^α/2xêxact= V (Σ/σ1)^αV^Txêxact, (5.9) such that the i th SVD component is damped by the factor (σi/σ1)^α. The larger the α, the smoother the solution. For all three values of α used in Figure 5.5, the L-curve chooses kL = 10, while for α = 4 (the smoothest solution), the optimal choice is actually k = 6.

Another instance where the L-curve criterion can fail is when the change in the residual and solution norms is small for two consecutive values of k. Again, in this case, it is necessary to include many noisy SVD coeﬃcients in the solution before its norm has increased so much that the corner appears on the L-curve. Unfortunately this behavior of the SVD coeﬃcients is often observed for large-scale problems, and we illustrate this in Exercise 6.4.

Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php

5.4. The Statistician’s Choice—Generalized Cross Validation 95 One ﬁnal critical remark on the behavior of the L-curve criterion: If we let the noise go to zero, then it has been shown that the regularization parameter λ_L tends to diverge from the optimal one, and hence x_λ_L may not converge to x^exact as

e2 → 0. This undesired nonconverging behavior is not shared by the discrepancy principle (or the generalized cross validation (GCV) method discussed in the next section). Fortunately, noise that goes to zero is a rare situation in practice; usually we work with ﬁxed data with a ﬁxed noise level.

5.4 The Statistician’s Choice—Generalized Cross

In document Satisfacción y frustración de lasnecesidades psicológicas básicas en eltrabajo en profesores de educación física (página 185-200)