La Competencia puede fortalecerse - Índice. Introducción 4

the exception flags. If an exceptions occurred, it resets the exception flags and restores the sum. Eventually, the operation will be forwarded to the expression dag. In this way, floating-point exceptions do not affect the correctness of a computation and are invisible to the user, as intended.

Double_sum_warning_protection

(

warnP

): This model resets the exception

flags prior to an operation and checks them afterwards. If an exception occurs, an error handler is called. Unlike the previous model, this one is free from false positives, but requires user interaction to handle the situation.

Double_sum_no_protection

(

noP

): This model does not protect from floating-

point exceptions in any way. It allows us to determine the cost of the former alternatives.

Since accessing exception flags and making backup copies can be quite expensive, we do not employ the protection mechanisms indiscriminately. For example, addition and subtraction in

plaiO

do not perform floating-point operations and are therefore free from floating-point exceptions. The same holds for the sign computation in

expaO

. Therefore, all operations in DoubleSumOperations are accompanied with a tag

indicating whether floating-point exceptions may actually occur, allowing to enable or disable protection from floating-point exceptions at compile time.

No overflow in Compress. Another operation that is free from floating-point exceptions in our case is the Compress algorithm in

expaO

. Compress uses only floating-point additions and subtractions, it is thus unaffected by underflow. Less obviously, it is free from overflow when run on a strongly non-overlapping expansion. Intuitively, compression does not change the value of the sum and due to the non- overlapping property all bits and especially the largest bit are already present in the input expansion. Overflow can only occur when Compress generates larger non-zero bits than present in the input expansion.

Lemma 5.1. Assume floating-point arithmetic over F, with rounding to nearest and

tie-breaking to even. Let e= e₁, e₂, . . . , em be a strongly non-overlapping expansion.

Then, no overflow occurs in the callCompress(e).

This property does not hold for non-overlapping expansions. If we consider the sequence of bits of a strongly non-overlapping expansion, there must be a zero bit at least every p+ 1 bits [104]. Running Compress with the largest strongly non-overlapping expansion over F, no overflow occurs. But if the number is made just a bit larger, i.e., one of the zero bits is set to non-zero and all less significant bits are set to zero, overflow does occur. See Figure 5.2 for an illustration. Our proof makes explicit use of tie-breaking to even. Intuitively, the zero bit every p+ 1 bits catches any carry coming from less significant summands, thus avoiding a carry on the largest summand. The tie-breaking to even rule places this zero bit conveniently at the transition between two summands, which we exploit in our proof. We conjecture that a proof is possible without appealing to the tie breaking rule. But, since tie breaking

2τ(1 − "m): +++ 1 2"m 2τ(1 − "m): +++ 1 2"m 2₂_{τ(1 − "m): +}₊₊ 2τ(1 − "m): +++ 1 2"m 2τ(1 − "m): +++ 1 2" 2 m τ: +++

Figure 5.2. In Compress, no overflow occurs on the upper expansion, but on the lower one.

to even is needed for strongly non-overlapping expansions and FastExpansionSum anyway, this is not relevant in our context.

Proof. To prove Lemma 5.1, we consider a recursive formulation of Compress that allows an inductive argument. The base case occurs, whenPm

i=1ei is a floating-

point number. Otherwise, Compress sums em, em−1, . . . iteratively until the result is not a floating-point number for the first time. Let this occur when adding ej, and let

Q= Pm_i_=j+1e_i, then we have gmand q with

g_m= Q ⊕ e_j= fl   m X i=j e_i  , g_m+ q = m X i=j e_i.

Compress keeps gmas intermediate summand and recursively compresses the se-

quence e₁, e₂, . . . , ej−1, q, obtaining as output the sequence f1, f2, . . . , fl−2, fl−1. Then it computes fland q0with

f_l= g_m_{⊕ f}_l₋₁, f_l+ q0= g_m+ f_l₋₁.

Compress returns one of the expansions f1, f2, . . . , fl−2, q0, fl, or f1, f2, . . . , fl−2, fl,

depending on whether q0_{is non-zero. The computation of both g}

mand fl is at risk

from overflow. We show by induction on the recursive calls, that msb_(fl) ≤ msb(em)

and infer as a side effect that no overflow occurs. We can assume that no input summand is zero, since a zero summand simply results in an iteration of the first loop not changing any values.

The sequence|ej|, |ej+1|, . . . , |em| is a strongly non-overlapping expansion, too.

Hence, the binary representation of E= Pm_i_=j|ei| contains a zero bit at least every

p+ 1 bits. Together with msb(E) = msb(e_m) this yields E < msb(e_m)(2 − "m). In total

we have m X i=j e_i < msb(em)(2 − "m).

The right hand side msb(em)(2 − "m) is the smallest number which might be rounded

to 2 msb(em) (or +∞, if 2 msb(em) 6∈ F). Therefore, the sum is rounded towards zero,

no overflow occurs in the computation of gm, and msb(gm) ≤ msb(em). Furthermore,

Q:+++

ej+1:−−−

ej:+++

ej−1:+++ ej−2:−−−

Figure 5.3. A critical case in proof of Lemma 5.1.

This discussion also settles the base case of the induction, whenPm

i=1eiitself is a

floating-point number and gmis returned as compression result.

Now we turn to the induction step. We claim the sequence e₁, e₂, . . . , ej−1, q is a strongly non-overlapping expansion. If ejis not adjacent to ej−1, then q is neither, because q∈ lsb(ej)Z by Equation (2.9). The claim holds for this case. If ejis adjacent

to ej−1, then both are a power of two. Now consider the computation of Q⊕ ej. The

situation is visualized in Figure 5.3. First, Q is not adjacent to ej, because ej and

e_j₊₁ are non-adjacent and Q∈ lsb(ej+1)Z. We have msb(Q + ej) ≥ "m−1ej, because

otherwise

msb(Q + ej) ≤ 1₂"m−1ej = 1₂"−1m lsb(Q + ej)

and Q+ ej∈ F. With tie-breaking to even, we have gm= Q and q = ej. This shows,

that e1, e2, . . . , ej−1, q is a strongly non-overlapping expansion in this case, too. By induction, the recursive application of Compress gives us a non-adjacent expansion f₁, f₂, . . . , fl−1with msb(fl−1) ≤ msb(q). We compute fl= gm⊕ fl−1. Be- cause gmand q are non-adjacent, gmand fl−1are non-adjacent, too. But then, by the same reasoning applied to the computation of gmabove,

|gm+ fl−1| < msb(gm)(2 − "m),

no overflow occurs in the computation of fland msb(fl) ≤ msb(gm) ≤ msb(em).

With this result, we need no protection from floating-point exceptions for the compression step in

expaO

, or

protO

Avoiding Floating-Point Exceptions. A common strategy in practice is to avoid overflow and underflow in the first place, by detecting that input consists of very small or very large numbers and take appropriate action, e.g., rescale input numbers to a safe range or at least warn the user.

In document Índice. Introducción 4 (página 67-72)