The gradient with respect to x is AT(A x− b) + γ x, and by setting the gradient to zero we obtain the normal equations for the Tikhonov solution with γ = λ2. Hence, the constrained problem (4.12) is equivalent to the Tikhonov problem.
Similarly, we can minimize the norm of the solution (again, to suppress high-frequency components with large amplitude), subject to the constraint that the resid-ual norm is smaller than some upper bound ε:
minx x22 subject to A x − b22≤ ε2.
Again, the Lagrange multiplier formulation of this constrained problem leads to the Tikhonov formulation.
Analogous with the TSVD solution, let us take a brief look at the statistical aspects of the Tikhonov solution. Via the relation xλ = (ATA + λ2I)−1ATb, and assuming again that Cov(e) = η2I, we can show that the covariance matrix for the Tikhonov solution is
Cov(xλ) = η2
n i =1
!ϕ[λ]i "2
σi−2viviT, and its norm is bounded as
Cov(xλ)2≤ η2 (2 λ)2.
Obviously, if λ is chosen somewhat larger than the smallest singular value σn, then we obtain a reduction in the variance for the Tikhonov solution, compared to the naive solution. Again, we achieve this at the expense of introducing a bias in the solution:
E(xλ) =
n i =1
ϕ[λ]i (viTxexact) vi = xexact−
n i =1
(1− ϕ[λ]i ) (viTxexact) vi.
Since 1−ϕ[λ]i = λ2/(σ2i + λ2), and since we assume that the discrete Picard condition is satisfied, we can visually expect that the bias term is small compared toxexact2.
4.5 Perturbation Theory
∗Regularization methods are designed to filter the influence from the noise, such that the solution is less dominated by the inverted noise, provided we can choose the regularization parameter (k or λ) properly. The question is then: How sensitive is the regularized solution to the perturbations, and how does the sensitivity depend on the regularization parameter?
In order to study this sensitivity, throughout this section we will be studying two related problems, namely,
A x = b and Ax = b,
Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
4.5. Perturbation Theory∗ 65 where A and b are perturbed versions of A and b,
A = A + ΔA and b = b + Δb,
i.e., the matrix ΔA is the perturbation of A, and the vector Δb is the perturbation of the right-hand side b. Think of the two problems A x = b and Ax = b as two different noisy realizations of the underlying problem with exact data. The vectors x andx are the solutions to the two problems, and we are interested in bounding the difference
x − x. More precisely, we are interested in upper bounds for the relative normwise differencex − x2/x2, which is independent on the scaling of A, b, and x . These bounds tell us how sensitive the solution is to (variations in) the noise.
Naive solutions. The perturbation bound for the naive solutions x = A−1b and x = A−1b (when A is square and invertible) can be found in many text books on numerical analysis, and it takes the following form. If the perturbation matrix ΔA satisfiesΔA2< σn, then
The requirement that ΔA satisfy ΔA2 < σn ⇔ γ < 1 is necessary for ensuring that the perturbed matrix A stays nonsingular. The perturbation is governed by A’s condition number cond(A) = σ1/σn.
Least squares solutions. The perturbation bound for the least squares solutions to minA x − b2and min Ax − b2(when m > n and A has full rank) can be found in many text books on least squares problems (such as [5]). If the perturbation matrix satisfiesΔA2< σn (which ensures that ΔA also has full rank), then
x − x2
TSVD solutions [28]. Let xk and xk denote the TSVD solutions for the same truncation parameter k, and assume that the perturbation matrix ΔA satisfies
ΔA2< σk− σk+1; then
This result shows that the condition number for the TSVD solution is κk = σ1/σk, which can be much smaller than the condition number for A.
Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
66 Chapter 4. Computational Aspects: Regularization Methods Tikhonov solutions [26]. Let xλ andxλ denote the Tikhonov solutions for the same regularization parameter λ, and assume that the perturbation matrix ΔA satisfies
ΔA2< λ; then
Hence the condition number for the Tikhonov solution is κλ= σ1/λ, and, similar to the TSVD condition number, it can be much smaller than the condition number for A.
The perturbation bounds for the TSVD and Tikhonov solutions are, to no big surprise, quite similar to each other. In both cases, the condition number is governed by the choice of the regularization parameter, k or λ, and the more filtering the smaller the condition number. And in both cases, the size of the residual b− bk = b− A xk
or b− bλ= b− A xλ enters the perturbation bound. Also, in both cases, we assume an upper bound on the norm of the perturbation matrix ΔA; this bound ensures that the SVD components of the perturbed and the unperturbed problems are related.
The main difference between the two perturbation bounds is the factor ˆγk = σk+1/σk, which measures the gap between the smallest retained singular value and the largest discarded one, and which appears only in the TSVD bound. If ˆγk is close to one, the 1− ˆγk is small, and only a very small perturbation of the matrix is allowed.
Hence, one should refrain from truncating in the middle of a cluster of singular values, i.e., a set of singular values which are very close and well separated from the others.
The same difficulty does not apply to the Tikhonov solution; if one chooses a value of λ inside a cluster of singular values, then all these SVD components are included in the Tikhonov solution because all the associated filter factors ϕ[λ]i = σ2i/(σi2+ λ2) are of the same size.
In addition to studying the errors in the solutions, as we have done in the above results, it may be of interest to study the “prediction errors,” i.e., the sensitivity of the vectors A xk and A xλ. Under the same assumption as before, we have for the TSVD and Tikhonov solutions
The interesting result here is that the prediction errors do not depend on the condition number—a result which is well known for the naive solution.
We finish this section with a small example that illustrates the sensitivity of the Tikhonov solution xλ to perturbations of b, as a function of the regularization parameter λ. The matrix, the unperturbed right-hand side, and the unperturbed
Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
4.5. Perturbation Theory∗ 67
λ = 0
−1 0 1 2 3
−1 0 1 2 3
λ = 0.2
−1 0 1 2 3
−1 0 1 2 3
λ = 0.6
−1 0 1 2 3
−1 0 1 2 3
λ = 1.5
−1 0 1 2 3
−1 0 1 2 3
Figure 4.7. Illustration of the sensitivity of the Tikhonov solutions to pertur-bations of the right-hand side for a small 2× 2 test problem. For each of 25 random perturbations we computed xλ for four different values of λ, given in the top of the plots. The ellipsoidal curves are the level curves for the associated Tikhonov problem.
solution are are A =
0.41 1.00
−0.15 0.06
, b =
1.41
−0.09
, x =
1.00 1.00
.
We generated 25 random perturbations b = b + Δb with the perturbation scaled such thatΔb2/b2= 0.15, and for each perturbed problem we computed the Tikhonov solutions corresponding to four values of λ. The results are shown in Figure 4.7, where the black cross indicates the unperturbed solution, while the gray dots represent the perturbed solutions. Also included in the plots are the ellipsoidal level curves for the Tikhonov (least squares) problem for the particular λ.
We see that as λ increases, and more regularization (or filtering) is imposed on the problem, the less sensitive the Tikhonov solution is to the perturbations. As λ increases, the condition number becomes smaller and the level curves become less elliptic. Note that for small values where the ellipsoids are more elongated, the
per-Downloaded 07/26/14 to 129.107.136.153. Redistribution subject to SIAM license or copyright; see http://www.siam.org/journals/ojsa.php
68 Chapter 4. Computational Aspects: Regularization Methods turbation is mainly in the direction of the longer semiaxis. This direction is defined by the right singular vector v2corresponding to the smaller singular value σ2, and so the observed results agree with the theory, namely, that the largest perturbations occur in the directions corresponding to the smallest singular values.