Figure 4.2 Graphical interpretation of roots.
can be circumvented by calculating the roots of the scaled function F(x) = xf (x). It must be kept in mind that as with this example, F(x) has all the roots of f(x), but it might pick up additional roots from g(x). A more substantial example is furnished by an equation to be solved in an exercise:
This function has a simple pole at all the points where cos(x) = cos(π/10) and an apparent singularity at x = 0. Scaling this function with g(x) =
makes computing the roots more straightforward.
Sometimes a natural measure of scale is supplied by a coefficient in the equation. An example is provided by the family of problems f(x) = F(x) - γ with γ > 0. Just as when solving linear equations, the residual r = f(z) = F(z) - γ can be used in a backward error analysis. Obviously z is the exact solution of the problem 0 = F(x) - γ´ , where γ´ = γ + r. If |r| is small compared to |γ|, then z is the exact solution of a problem close to the given problem. For such problems we have a reasonable way to specify how small the residual ought to be.
4.1
BISECTION, NEWTON’S METHOD, AND THE SECANT RULE
If a continuous function f(x) has opposite signs at points x = B and x = C, then it has at least one zero in the interval with endpoints B and C. The method of bisection (or binary search) is based on this fact. If f(B) f (C) < 0, the function f(x) is evaluated at the midpoint M = (B + C)/2 of the interval. If f(M) = 0, a zero has been found. Otherwise, either f(B) f (M) < 0 or f(M) f (C) < 0. In the first case there is at least one zero between M and B, as in Figure 4.2, and in the second case there is at least one
zero between C and M. In this way an interval containing a root is found that has half the length of the original interval. The procedure is repeated until a root is located to whatever accuracy is desired.
In algorithmic form we have the bisection method:
until |B - C| is sufficiently small or f(M) = 0 begin M := (B + C)/2 if f(B)f(M) < 0 then C := M else B := M end until.
Example 4.1. When f(x) = x2 - 2, the equation (4.1) has the simple root
For B = 0, C = 6, the bisection method produces [note: 0.16 (+01) means 0.16 × 101]
Note the erratic behavior of the error, although the interval width |B - C| is halved at
each step. n
Bisection is often presented in programming books in this manner because it is a numerical algorithm that is both simple and useful. A more penetrating study of the method will make some points important to understanding many methods for comput- ing zeros, points that we require as we develop an algorithm that attempts to get the best from several methods.
An interval [B,C] with f(B) f(C) < 0 is called a bracket. A graphical interpretation tells us somewhat more than just that f(x) has a root in the interval. Zeros of even multiplicity between B and C do not cause a sign change and zeros of odd multiplicity do. If there were an even number of zeros of odd multiplicity between B and C, the sign changes would cancel out and f would have the same sign at both ends. Thus, if f(B) f(C) < 0, there must be an odd number of zeros of odd multiplicity and possibly
4.1 BISECTION, NEWTON’S METHOD, AND THE SECANT RULE 139
some zeros of even multiplicity between B and C. If we agree to count the number of zeros according to their multiplicity (i.e., a zero of multiplicity m counts as m zeros), then we see that there are an odd number of zeros between B and C.
A careful implementation of bisection takes into account a number of matters raised in Chapter 1. There is a test for values off that are exactly zero; the test for a change of sign is not programmed as a test of f(B) f(C) < 0 because of the potential for underflow of the product; and the midpoint is computed as M = B + (B - C)/2 because it is just as easy to compute and more accurate than M = (B + C)/2.
We often try to find an approximate root z for which f(z) is as small as possible. In attempting this, the finite word length of the computer must be taken into account and so must the details of the procedure for evaluating f. Eventually even the sign of the computed value may be incorrect. This is what is meant by limiting precision. Figure 1.2 shows the erratic size and sign of function values when the values are so small that the discrete nature of the floating point number system becomes important. If a computed function value has the wrong sign because the argument is very close to a root, it may happen that the bracket selected in bisection does not contain a root. Even so, the approximations computed thereafter will stay in the neighborhood of the root. It is usually said that a bisection code will produce an interval [B,C] of specified length that contains a root because f(B) f(C) < 0. This is superficial. It should be qualified by saying that either this is true, or a root has been found that is as accurate as the precision allows. The qualification “as accurate as the precision allows” means here that either the computed f(z) vanishes, or that one of the computed values f(B), f(C) has the wrong sign.
A basic assumption of the bisection method is that f(x) is continuous. It should be no surprise that the method can fail when this is not the case. Because a bisection code pays no attention to the values of the function, it cannot tell the difference between a pole of odd multiplicity and a root of odd multiplicity [unless it attempts to evaluate f(x) exactly at a pole and there is an overflow]. So, for example, if a bisection code is given the function tan(x) and asked to find the root in [5,7], it will have no difficulty. If asked to find the root in [4,7], it will not realize there is a root in the interval because the sign change due to the simple pole cancels out the sign change due to the simple root. And, what is worse, if asked to find a root in [4,5], it will locate a pole or cause an overflow. We see here another reason for scaling: removing odd order poles by scaling removes the sign changes that might cause bisection to locate a pole rather than a zero. Here this is done by F(x) = cos(x) tan(x) = sin(x). One of the examples of scaling given earlier is a less trivial illustration of the point. Because of the very real possibility of computing a pole of odd multiplicity, it is prudent when using a bisection code to inspect the residual f(z) of an alleged root z-it would be highly embarrassing to claim that z results in a very small value of f(z) when it actually results in a very large value!
A bisection code can converge to a pole because it makes no use of the value f(M), just its sign. Because of this its rate of convergence is the same whether the root is simple or not and whether the function is smooth or not. Other methods converge much faster when the root is simple and the function is smooth, but they do not work so well when this is not the case.
converge no matter how large the initial interval known to contain a root. It is easy to decide reliably when the approximation is good enough. It converges reasonably fast and the rate of convergence is independent of the multiplicity of the root and the smoothness of the function. The method deals well with limiting precision.
Bisection also has some drawbacks. If there are an even number of zeros between B and C, it will not realize that there are any zeros at all because there is no sign change. In particular, it is not possible to find a zero of even multiplicity except by accident. It can be fooled by poles. A major disadvantage is that for simple zeros, which seem to be the most common by far, there are methods that converge much more rapidly. There is no way to be confident of calculating a particular root nor of getting all the roots. This is troublesome with all the methods, but some are (much) better at computing the root closest to a guessed value. Bisection does not generalize to functions of a complex variable nor easily to functions of several variables.
Let us now take up two methods that are superior to bisection in some, although not all, of these respects. Both approximate f(x) by a straight line L(x) and then approximate a root of f(x) = 0 by a root of L(x) = 0.
Newton’s method (Figure 4.3) will be familiar from calculus. It takes L(x) as the line tangent to f(x) at the latest approximation xi and the next approximation (iterate) is the root xi+1 of L(x) = 0. Equivalently, approximating f(x) by the linear terms of a
Taylor’s series about xi,
suggests solving
for its root xi+1 to approximate α [assuming that f´(xi) 0]. The resulting method is
known as
Newton's method:
(4.5)
When it is inconvenient or expensive to evaluate f´(x), a related procedure called the secant rule is preferred because it uses only values of f(x). Let L(x) be the secant line that interpolates f(x) at the two approximations xi-1,xi:
The next approximation xi+1 is taken to be the root of L(x) = 0. Hence, assuming that
f(xi) f(xi-l), we have the
secant rule:
4.1 BISECTION, NEWTON’S METHOD, AND THE SECANT RULE 141
Figure 4.3 Newton’s method.
The method is illustrated graphically in Figure 4.4. Although a picture furnishes a nat- ural motivation for the method, an alternative approach is to approximate the derivative in Newton’s method (4.5) by a difference quotient to get (4.6).
A little analysis shows that Newton’s method and the secant rule converge much faster than bisection for a simple root of (4.1). Considering first Newton’s method, we have from (4.5)
If xi is near α, then
Now f(α) = 0 and f´(α) for a simple root, so
It is seen that if xi is near a simple root, the error in xi+1 is roughly a constant multiple of the square of the error in xi. This is called quadratic convergence.
A similar look at the secant rule (4.6) leads to
Figure 4.4 Secant rule.
This method does not converge as fast as Newton’s method, but it is much faster than bisection. For both methods it can be shown that if the starting values are sufficiently close to a simple root and f(x) is sufficiently smooth, the iterates will converge to that root. For Newton’s method,
and for the secant rule,
A careful treatment of the secant rule even shows that
where p =
Example 4.2. As in Example 4.1, let f(x) = x2 - 2. An easy calculation shows that for the secant rule started with x1 = 3 and x2 = 2,
4.1 BISECTION, NEWTON’S METHOD, AND THE SECANT RULE 143
and for Newton’s method started with x1 = 3,
Both methods converge quite rapidly and the quadratic convergence of Newton’s method is apparent. Comparison with the bisection method of Example 4.1 shows the superi-
ority of the secant rule and Newton’s method (for this problem). n
If an iteration is such that
the method is said to converge at rate r with constant γ. It has been argued that for a simple root, Newton’s method converges at the rate r = 2 and it has been stated that the secant rule converges at the rate r = p 1.618. Bisection does not fit into this framework; the width of the bracketing intervals are being halved at every step, but nothing can be said about
(see Example 4.1).
The secant rule is a principal part of the code Zero developed in this chapter, so we now state conditions that guarantee its convergence to a simple root of (4.1) and study how fast it converges. As a first step we derive an expression that relates the function values at three successive iterates xi-l, xi, xi+1. Let L(x) be the polynomial
of degree 1 interpolating f(x) on the set {xi,xi-1 }. The iterate xi+1 is the zero of L(x).
which in this case is
or, since L( xi + 1) = 0,
(4.8) for a suitable (unknown) point Some manipulation of equation (4.6) gives the two relations
(4.9) (4.10) A third relation is obtained from the mean value theorem for derivatives:
(4.11) where ζ, a point between xi and xi-l, is unknown. Combining equations (4.8)–(4.1l),
we arrive at
Let us assume that on an appropriate interval we have
(4.12) and that we are computing a simple zero a. (Why must it be simple with these hy- potheses?) Then these bounds and the expression above for f(xi+1) imply that
If we let
this inequality leads to
4.1 BISECTION, NEWTON’S METHOD, AND THE SECANT RULE 145
it is easy to argue by induction that
where
The formal proof is left as an exercise. Since
we see that for i large,
In any event, and since 0 < ε < 1, we must have which is
what we wanted to prove. Let us now state a formal theorem and complete the details of its proof.