A natural way to measure the quality of an approximate solution z is by how well it satisfies the equation. A virtue of this is that it is easy to compute the residual
r = b - Az .
In this measure, a good solution z has a small residual. Because of cancellation (see Example 1.10), if we should want an accurate residual for a good solution, it will be necessary to compute it in higher precision arithmetic, and this may not be available. The residual provides a for the backward error analysis, namely,
= -r.
The residual r is connected to the error e by
r = b - Az = Ax - Az = A(x - z) = Ae
or e = A-1 r. A small residual r, hence a small may be perfectly satisfactory from the point of view of backward error analysis even when the corresponding error e is not small.
Example 2.8. To illustrate the distinction between the two points of view, consider the system
(2.11) We carry out the elimination process using three-digit chopped decimal arithmetic. After the first step we have
It then follows that
so the computed solution is
The exact solution to (2.11) is easily found to be xl = 1 and x2 = -1. Therefore the
error (in exact arithmetic) is
In contrast, the residual (in exact arithmetic) is
r = b - Az
This says that z is the exact solution of Az = b + where b1 = 0.200 is perturbed to 0.199449 and b2 is perturbed to 0.166341. Thus, z is the solution of a problem very
close to the one posed, even though it differs considerably from the solution x of the
original problem. n
The fundamental difficulty in Example 2.8 is that the matrix in the system (2.11) is nearly singular. In fact, the first equation is, to within roundoff error, 1.2 times the second. If we examine the elimination process we see that z2 was computed from two quantities that were themselves on the order of roundoff error. Carrying more digits in our arithmetic would have produced a totally different z2. The error in z2 propagates
to an error in z1. This accounts for the computed solution being in error. Why then
are the residuals small? Regardless of z2, the number z1 was computed to make the
residual for the first equation as nearly zero as possible in the arithmetic being used. The residual for the second equation should also be small because the system is close to singular: the first equation is approximately a multiple of the second. In Section 2.2 we observed that any matrix A could have its rows interchanged to obtain a matrix PA, which can be decomposed as the product of a lower triangular matrix L and an upper triangular matrix U. For simplicity we ignore the permutation matrix P in what follows. An error analysis of elimination using floating point arithmetic shows that L and U are computed with errors and respectively. Then A is not exactly equal to the product (L + ) (U + ). Let be defined so that
2.3 ACCURACY 53
We might reasonably hope to compute L with errors M that are small relative to L, and the same for U. However, the expression for AA shows that the sizes of L and U play important roles in how well A is represented by the computed factors. Partial pivoting keeps the elements of L less than or equal to 1 in magnitude. We also saw in (2.10) that the size of elements of U, the was moderated with partial pivoting. In particular, they cannot exceed 2n-1 maxij |aij| for an n × n matrix. It can be shown rigorously, on
taking into account the errors of decomposition and of forward/backward substitution, that the computed solution z of Ax = b satisfies
(A + )z = b, (2.12)
where the entries of are usually small. To make precise how small these entries are, we need a way of measuring the sizes of vectors and matrices. One way to mea- sure the size of a vector x of n components is by its norm, which is denoted by ||x||. Several definitions of norm are common in numerical analysis. One that is likely to be
familiar is the Euclidean length of x, All vector norms possess many of
the properties of length. The norm used in this chapter is the maximum norm
(2.13) If A is an n × n matrix and x is an n -vector, then Ax is also an n-vector. A matrix norm can be defined in terms of a vector norm by
(2.14)
Geometrically, this says that ||A|| is the maximum relative distortion that the matrix A creates when it multiplies a vector x 0. It is not easy to evaluate ||A|| directly from
(2.14), but it can be shown that for the maximum norm (2.13)
(2.15)
which is easy enough to evaluate. An important inequality connects norms of vectors and matrices:
||Ax|| < ||A|| ||x||. (2.16)
For x 0 this follows immediately from the definition (2.14). For x = 0 we note that A x = 0 and that ||x|| = 0, from which the inequality is seen to hold.
Example 2.9. Let x =
Let A
=
||A|| = [(|l| + |-l| + |0|), (|2|+ |-2| + |3|), (|-4| + |1| + |-1|)] = max[(2), (7), (6)] = 7.
Returning to the roundoff analysis for Gaussian elimination, it can be shown rig- orously [11] that the computed solution z satisfies the perturbed equation (2.12) where (2.17) As usual, u is the unit roundoff. The factor γn depends on n and can grow as fast
as 2n-1. To put this in perspective, suppose that AA arises from rounding
A
to formmachine numbers. Then could be as large as u|aij| and could be as large
as
According to the bounds, the perturbations due to the decomposition and forward/back- ward substitution process are at worst a factor of γn times the error made in the initial
rounding of the entries of A. If the rigorous bound 2n-1 on γn truly reflected prac-
tice, we would have to resort to another algorithm for large n. Fortunately, for most problems γn is more like 10, independent of the size of n.
From this it can be concluded that Gaussian elimination practically always pro- duces a solution z that is the exact solution of a problem close to the one posed. Since
Az
- b = the residual r satisfiesThis says that the size of the residual is nearly always small relative to the sizes of A and z. However, recall that this does not imply that the actual error e is small.
For additional insight as to why Gaussian elimination tends to produce solutions with small residuals, think of the LU factorization of A discussed in Section 2.2. The forward substitution process used to solve the lower triangular system Ly = b succes- sively computes y1, y2,. . . , yn so as to make the residual zero. For example, regardless
of the errors in y1 and m2,1 the value of y2 is computed so that m2,lyl + y2 = b2,
that is, the residual of this equation is zero (in exact arithmetic) with this value of y2. The same thing happens in the back substitution process to compute xn,xn-1,. . . ,x1
that satisfy Ux = y. Thus, the very nature of the process responds to errors in the data in such a way as to yield a small residual. This is not at all true when x is computed by first calculating the inverse
A
-1 and then forming A-l b. With a little extra work it is possible to make Gaussian elimination stable in a very strong sense.2.3 ACCURACY 55
Suppose that we have solved Ax = b to obtain an approximate solution z. We can expect it to have some accuracy, although perhaps not all the accuracy possible in the precision used. A little manipulation shows that the error e = x - z satisfies A e = r, where r is the residual of the approximate solution z. We have seen that if Gaussian elimination is organized properly, it is inexpensive to solve this additional system of equations. Of course, we do not expect to solve it exactly either, but we do expect that the computed approximation d to the error in z will have some accuracy. If it does, w = z + d will approximate x better than z does. In principle this process, called iterative refinement, can be repeated to obtain an approximation to x correct in all its digits. The trouble in practice is that for the process to work as described, we have to have an accurate residual, and the better the approximate solution, the more difficult this is to obtain. Skeel [14] has shown that just one step of iterative refinement with the residual computed in the working precision will provide a computed solution that is very satisfactory. This solution will have a small residual and will satisfy exactly a system of equations with each coefficient differing slightly from that of the given system. This is much better than the result for z that states that the perturbation in a coefficient is small compared to the norm of the whole matrix, not that it is small compared to the coefficient itself. So, if we are concerned about the reliability of Gaussian elimination with partial pivoting, we could save copies of the matrix and right-hand side and perform one step of iterative refinement in the working precision to correct the result as necessary.