slides_pc17_carson.pdf

(1)

Preconditioned GMRES-based

Iterative Refinement

Erin Carson

and Nicholas J. Higham

for the Solution of Sparse, Ill-Conditioned Linear Systems

New York University University of Manchester

August 2, 2017

(2)

Iterative Refinement for

𝐴𝑥 = 𝑏

𝐴

is

𝑛 × 𝑛

, nonsingular

Solve

𝐴𝑥

₀

= 𝑏

by LU factorization

for

𝑖 = 0: maxit

𝑟

_𝑖

= 𝑏 − 𝐴𝑥

_𝑖

(3)

Iterative Refinement for

𝐴𝑥 = 𝑏

𝐴

is

𝑛 × 𝑛

, nonsingular

Solve

𝐴𝑥

₀

= 𝑏

by LU factorization

for

𝑖 = 0: maxit

𝑟

_𝑖

= 𝑏 − 𝐴𝑥

_𝑖

Solve

𝐴𝑑

_𝑖

= 𝑟

_𝑖

𝑥

_𝑖+1

= 𝑥

_𝑖

+ 𝑑

_𝑖

via

𝑑

_𝑖

= 𝑈

−1

(𝐿

−1

𝑟

_𝑖

)

(4)

Notation/Setting

• Assume standard floating point arithmetic

• 𝑢

denotes unit roundoff

• "Gamma notation":

𝛾

_𝑘

=

𝑘𝑢

1−𝑘𝑢

• Condition numbers

• 𝐴 = |(𝑎

_𝑖𝑗

)|

• 𝜅

_∞

𝐴 = 𝐴

−1 _∞

𝐴

_∞

• cond(

𝐴, 𝑥

)

=

𝐴−1 𝐴 𝑥 ∞

𝑥 _∞

• cond(

𝐴

) = cond(

𝐴, 𝑒

) =

𝐴

−1

𝐴

_∞

(5)

Error Bounds ("Traditional" IR)

Solve

𝐴𝑥

₀

= 𝑏

by LU factorization

for

𝑖 = 0: maxit

𝑟

_𝑖

= 𝑏 − 𝐴𝑥

_𝑖

𝑑

_𝑖

= 𝑈

−1

(𝐿

−1

𝑟

_𝑖

)

𝑥

_𝑖+1

= 𝑥

_𝑖

+ 𝑑

_𝑖

(6)

Error Bounds ("Traditional" IR)

Solve

𝐴𝑥

₀

= 𝑏

by LU factorization

for

𝑖 = 0: maxit

𝑟

_𝑖

= 𝑏 − 𝐴𝑥

_𝑖

𝑑

_𝑖

= 𝑈

−1

(𝐿

−1

𝑟

_𝑖

)

𝑥

_𝑖+1

= 𝑥

_𝑖

+ 𝑑

_𝑖

precision

𝑢

precision

𝑢

2

precision

𝑢

(7)

Error Bounds ("Traditional" IR)

• Early analyses by Wilkinson (1963), Moler (1967)

• If

𝜅

_∞

𝐴 𝑢 < 1

, then error contracts (at a rate depending on

𝜅

_∞

(𝐴)

) until

Solve

𝐴𝑥

₀

= 𝑏

by LU factorization

for

𝑖 = 0: maxit

𝑟

_𝑖

= 𝑏 − 𝐴𝑥

_𝑖

𝑑

_𝑖

= 𝑈

−1

(𝐿

−1

𝑟

_𝑖

)

𝑥

_𝑖+1

= 𝑥

_𝑖

+ 𝑑

_𝑖

precision

𝑢

precision

𝑢

2

precision

𝑢

precision

𝑢

𝑥 − 𝑥

_{𝑖 ∞}

𝑥

_∞

≈ 𝑢

(8)

Information in

𝐿 𝑈 ≈ 𝐴

(9)

Information in

𝐿 𝑈 ≈ 𝐴

• Empirically observed by Rump (1990) that if 𝐿 and 𝑈 are computed LU factors of 𝐴 from GEPP, then 𝜅 𝑈−1 𝐿−1𝐴 ≈ 1 + 𝜅 𝐴 𝑢

• Even if 𝜅 𝐴 ≫ 𝑢−1

(10)

Information in

𝐿 𝑈 ≈ 𝐴

• Empirically observed by Rump (1990) that if 𝐿 and 𝑈 are computed LU factors of 𝐴 from GEPP, then 𝜅 𝑈−1 𝐿−1𝐴 ≈ 1 + 𝜅 𝐴 𝑢

• Even if 𝜅 𝐴 ≫ 𝑢−1

A =

gallery('randsvd', n,10^(n+5))

A = invhilb(n)

(11)

New Analysis Summary

• New rounding error analysis of IR

• Identifies a mechanism by which iterative refinement can work

when

𝜅

_∞

𝐴 > 𝑢

−1

(12)

New Analysis Summary

• New rounding error analysis of IR

• Identifies a mechanism by which iterative refinement can work

when

𝜅

_∞

𝐴 > 𝑢

−1

• Requires that we can solve the equations for the updates

𝑑

_𝑖

with some relative accuracy

• Accomplished by using existing LU factors as

(13)

New Analysis Summary

• New rounding error analysis of IR

• Identifies a mechanism by which iterative refinement can work

when

𝜅

_∞

𝐴 > 𝑢

−1

• Requires that we can solve the equations for the updates

𝑑

_𝑖

with some relative accuracy

• Accomplished by using existing LU factors as

preconditioners in GMRES method

⟹

GMRES-IR

• Even when

𝜅

_∞

𝐴 ≳ 𝑢

−1

, GMRES-IR produces

𝑥

for which

𝑥 − 𝑥

_∞

𝑥

_∞

≈ 𝑢

(14)

New Analysis Summary

• New rounding error analysis of IR

• Identifies a mechanism by which iterative refinement can work

when

𝜅

_∞

𝐴 > 𝑢

−1

• Requires that we can solve the equations for the updates

𝑑

_𝑖

with some relative accuracy

• Accomplished by using existing LU factors as

preconditioners in GMRES method

⟹

GMRES-IR

• Even when

𝜅

_∞

𝐴 ≳ 𝑢

−1

, GMRES-IR produces

𝑥

for which

𝑥 − 𝑥

_∞

𝑥

_∞

≈ 𝑢

(15)

The quantity

𝜃

_𝑖

• Assume computed solution to

𝐴𝑑

_𝑖

= 𝑟

_𝑖

satisfies

𝑑

_𝑖

− 𝑑

_{𝑖 ∞}

𝑑

_{𝑖 ∞}

= 𝜃

𝑖

𝑢

• 𝜃

_𝑖

depends on

𝐴

,

𝑟

_𝑖

,

𝑛

,

𝑢

, and the method of solving

𝐴𝑑

_𝑖

= 𝑟

_𝑖

(16)

The quantity

𝜇

_𝑖

(17)

The quantity

𝜇

_𝑖

• Traditional IR analyses use the bound: 𝐴(𝑥 − 𝑥_𝑖) _∞ ≤ 𝐴 _∞ 𝑥 − 𝑥_{𝑖 ∞}

• Need a tighter bound; define

𝐴(𝑥 − 𝑥_𝑖) _∞ = 𝜇_𝑖 𝐴 _∞ 𝑥 − 𝑥_{𝑖 ∞}

• Note that 𝜅_∞ 𝐴 −1 ≤ 𝜇_𝑖 ≤ 1

(18)

The quantity

𝜇

_𝑖

𝐴(𝑥 − 𝑥_𝑖) _∞ = 𝜇_𝑖 𝐴 _∞ 𝑥 − 𝑥_{𝑖 ∞}

(19)

The quantity

𝜇

_𝑖

𝐴(𝑥 − 𝑥_𝑖) _∞ = 𝜇_𝑖 𝐴 _∞ 𝑥 − 𝑥_{𝑖 ∞}

𝜇_𝑖 𝐴 _∞ 𝑥 − 𝑥_{𝑖 ∞} = 𝐴(𝑥 − 𝑥_𝑖) _∞ = 𝑏 − 𝐴 𝑥_{𝑖 ∞} = 𝑟_{𝑖 ∞}

• For a stable solver, in early stages we expect 𝑟_𝑖

𝐴 𝑥_𝑖 ≈ 𝑢 ≪

𝑥 − 𝑥_𝑖

𝑥 𝜇𝑖 ≪ 1

(20)

The quantity

𝜇

_𝑖

𝐴(𝑥 − 𝑥_𝑖) _∞ = 𝜇_𝑖 𝐴 _∞ 𝑥 − 𝑥_{𝑖 ∞}

𝜇_𝑖 𝐴 _∞ 𝑥 − 𝑥_{𝑖 ∞} = 𝐴(𝑥 − 𝑥_𝑖) _∞ = 𝑏 − 𝐴 𝑥_{𝑖 ∞} = 𝑟_{𝑖 ∞}

• For a stable solver, in early stages we expect 𝑟_𝑖

𝐴 𝑥_𝑖 ≈ 𝑢 ≪

𝑥 − 𝑥_𝑖 𝑥

• But close to convergence,

𝑟_𝑖 ≈ 𝐴 𝑥 − 𝑥_𝑖

𝜇_𝑖 ≪ 1

(21)

Theorem (C. & Higham, 2017)

Let IR in precisions 𝑢 and 𝑢2 be applied to a linear system 𝐴𝑥 = 𝑏 with

nonsingular 𝐴 ∈ ℝ𝑛×𝑛 and a given approximate solution 𝑥₀. Assume that the solver for the corrective term 𝑑_𝑖 satisfies 𝑑_𝑖 − 𝑑_{𝑖 ∞} 𝑑_{𝑖 ∞} = 𝜃_𝑖𝑢. Then for 𝑖 ≥ 0, the computed iterate 𝑥_𝑖+1 satisfies

𝑥 − 𝑥_{𝑖+1 ∞} ≤ 2𝜇_𝑖𝜅_∞ 𝐴 𝑢 + 𝜃_𝑖𝑢 𝑥 − 𝑥_{𝑖 ∞}

+𝑛𝑢2 1 + 𝜃_𝑖𝑢 𝐴−1 𝑏 + 𝐴 𝑥_𝑖 _∞ + 𝑢 𝑥_𝑖+1

(22)

Theorem (C. & Higham, 2017)

𝑥 − 𝑥_{𝑖+1 ∞} ≤ 2𝜇_𝑖𝜅_∞ 𝐴 𝑢 + 𝜃_𝑖𝑢 𝑥 − 𝑥_{𝑖 ∞}

+𝑛𝑢2 1 + 𝜃_𝑖𝑢 𝐴−1 𝑏 + 𝐴 𝑥_𝑖 _∞ + 𝑢 𝑥_𝑖+1

As long as for all 𝑖,

2𝜇_𝑖𝜅_∞ 𝐴 𝑢 + 𝜃_𝑖𝑢 < 1,

the error will contract until a limiting normwise relative error of order

(23)

Theorem (C. & Higham, 2017)

𝑥 − 𝑥_{𝑖+1 ∞} ≤ 2𝜇_𝑖𝜅_∞ 𝐴 𝑢 + 𝜃_𝑖𝑢 𝑥 − 𝑥_{𝑖 ∞}

+𝑛𝑢2 1 + 𝜃_𝑖𝑢 𝐴−1 𝑏 + 𝐴 𝑥_𝑖 _∞ + 𝑢 𝑥_𝑖+1

2𝜇_𝑖𝜅_∞ 𝐴 𝑢 + 𝜃_𝑖𝑢 < 1,

2𝑛𝑢2 1 + 𝜃𝑢 cond 𝐴, 𝑥 + 𝑢 is achieved, where 𝜃 is an upper bound on the 𝜃_𝑖 terms.

≈ 𝑢 if

cond 𝐴, 𝑥 𝑢 ≲ 1

(essentially indep. of

(24)

Theorem (C. & Higham, 2017)

𝑥 − 𝑥_{𝑖+1 ∞} ≤ 2𝜇_𝑖𝜅_∞ 𝐴 𝑢 + 𝜃_𝑖𝑢 𝑥 − 𝑥_{𝑖 ∞}

+𝑛𝑢2 1 + 𝜃_𝑖𝑢 𝐴−1 𝑏 + 𝐴 𝑥_𝑖 _∞ + 𝑢 𝑥_𝑖+1

2𝜇_𝑖𝜅_∞ 𝐴 𝑢 + 𝜃_𝑖𝑢 < 1,

2𝑛𝑢2 1 + 𝜃𝑢 cond 𝐴, 𝑥 + 𝑢 is achieved, where 𝜃 is an upper bound on the 𝜃_𝑖 terms.

𝜇_𝑖𝜅_∞ 𝐴 𝑢 < 1 (condition on iteration) 𝜃_𝑖𝑢 < 1 (condition on solver, data)

≈ 𝑢 if

cond 𝐴, 𝑥 𝑢 ≲ 1

(25)

Standard (LU-based) iterative refinement

• If

𝜅

_∞

𝐴 > 𝑢

−1

,

𝜃

_𝑖

𝑢 < 1

can not be guaranteed no matter how

precision is used in the substitutions

(26)

Standard (LU-based) iterative refinement

• If

𝜅

_∞

𝐴 > 𝑢

−1

,

𝜃

_𝑖

𝑢 < 1

can not be guaranteed no matter how

precision is used in the substitutions

• Assume that the solve

𝑑

_𝑖

=

𝑈

−1

𝐿

−1 _𝑖

𝑟

is carried out exactly:

𝐴 + Δ𝐴 = 𝐿

𝑈,

Δ𝐴 ≤ 𝛾

_𝑛

| 𝐿||

𝑈|

𝑑

_𝑖

=

𝑈

−1

𝐿

−1 _𝑖

𝑟

= 𝐴 + Δ𝐴

−1 _𝑖

𝑟

𝜃

_𝑖

𝑢 =

𝑑

𝑖

− 𝑑

𝑖 ∞

𝑑

_{𝑖 ∞}

≈

𝐴

−1

Δ𝐴𝑑

_{𝑖 ∞}

(27)

Standard (LU-based) iterative refinement

• If

𝜅

_∞

𝐴 > 𝑢

−1

,

𝜃

_𝑖

𝑢 < 1

can not be guaranteed no matter how

precision is used in the substitutions

• Assume that the solve

𝑑

_𝑖

=

𝑈

−1

𝐿

−1 _𝑖

𝑟

is carried out exactly:

𝐴 + Δ𝐴 = 𝐿

𝑈,

Δ𝐴 ≤ 𝛾

_𝑛

| 𝐿||

𝑈|

𝑑

_𝑖

=

𝑈

−1

𝐿

−1 _𝑖

𝑟

= 𝐴 + Δ𝐴

−1 _𝑖

𝑟

𝜃

_𝑖

𝑢 =

𝑑

𝑖

− 𝑑

𝑖 ∞

𝑑

_{𝑖 ∞}

≈

𝐴

−1

Δ𝐴𝑑

_{𝑖 ∞}

𝑑

_{𝑖 ∞}

≤ 𝛾

𝑛

𝐴

−1

𝐿

𝑈

∞

at least as large as cond(𝐴), usually similar size to 𝜅_∞(𝐴)

(28)

GMRES-based iterative refinement

• To compute the updates

𝑑

_𝑖

, apply GMRES to

𝑈

−1

𝐿

−1

𝐴𝑑

_𝑖

=

𝑈

−1

𝐿

−1

𝑟

_𝑖

(29)

GMRES-based iterative refinement

• To compute the updates

𝑑

_𝑖

, apply GMRES to

𝑈

−1

𝐿

−1

𝐴𝑑

_𝑖

=

𝑈

−1

𝐿

−1

𝑟

_𝑖

𝐴

Solve

𝐴𝑥

₀

= 𝑏

by LU factorization

for

𝑖 = 0: maxit

𝑟

_𝑖

= 𝑏 − 𝐴𝑥

_𝑖

Solve

𝐴𝑑

_𝑖

= 𝑟

_𝑖

𝑥

_𝑖+1

= 𝑥

_𝑖

+ 𝑑

_𝑖

via

𝑑

_𝑖

= 𝑈

−1

(𝐿

−1

𝑟

_𝑖

)

Standard IR:

𝑟

𝑖

(30)

GMRES-based iterative refinement

• To compute the updates

𝑑

_𝑖

, apply GMRES to

𝑈

−1

𝐿

−1

𝐴𝑑

_𝑖

=

𝑈

−1

𝐿

−1

𝑟

_𝑖

𝐴

Solve

𝐴𝑥

₀

= 𝑏

by LU factorization

for

𝑖 = 0: maxit

𝑟

_𝑖

= 𝑏 − 𝐴𝑥

_𝑖

Solve

𝐴𝑑

_𝑖

= 𝑟

_𝑖

𝑥

_𝑖+1

= 𝑥

_𝑖

+ 𝑑

_𝑖

via GMRES on

𝐴𝑑

_𝑖

= 𝑟

_𝑖

GMRES-IR:

𝑟

(31)

Extending GMRES backward stability results

• Backward error results for GMRES of Paige, Rozložník, Strakoš

(2006) can be extended to the left-preconditioned case

(32)

Extending GMRES backward stability results

• Backward error results for GMRES of Paige, Rozložník, Strakoš

(2006) can be extended to the left-preconditioned case

• As long as within GMRES,

𝐴

(not explicitly formed) is applied to

a vector with sufficient accuracy,

𝑑

_𝑖

− 𝑑

_{𝑖 ∞}

𝑑

_{𝑖 ∞}

= 𝜃

𝑖

𝑢 ≲ 𝛾

𝑛

𝜅

∞

(

𝐴)

(33)

Extending GMRES backward stability results

𝜅

_∞

𝐴 ≤ 1 + 𝛾

_𝑛

𝐴

−1

𝐿 𝑈

_∞ 2

≪ 𝜅

_∞

(𝐴)

• Backward error results for GMRES of Paige, Rozložník, Strakoš

(2006) can be extended to the left-preconditioned case

• As long as within GMRES,

𝐴

(not explicitly formed) is applied to

a vector with sufficient accuracy,

𝑑

_𝑖

− 𝑑

_{𝑖 ∞}

𝑑

_{𝑖 ∞}

= 𝜃

𝑖

𝑢 ≲ 𝛾

𝑛

𝜅

∞

(

𝐴)

see (C. & Higham, 2017)

(34)

Extending GMRES backward stability results

𝜅

_∞

𝐴 ≤ 1 + 𝛾

_𝑛

𝐴

−1

𝐿 𝑈

_∞ 2

≪ 𝜅

_∞

(𝐴)

• Backward error results for GMRES of Paige, Rozložník, Strakoš

(2006) can be extended to the left-preconditioned case

• As long as within GMRES,

𝐴

(not explicitly formed) is applied to

a vector with sufficient accuracy,

𝑑

_𝑖

− 𝑑

_{𝑖 ∞}

𝑑

_{𝑖 ∞}

= 𝜃

𝑖

𝑢 ≲ 𝛾

𝑛

𝜅

∞

(

𝐴)

(35)

Extending GMRES backward stability results

𝜅

_∞

𝐴 ≤ 1 + 𝛾

_𝑛

𝐴

−1

𝐿 𝑈

_∞ 2

≪ 𝜅

_∞

(𝐴)

• Backward error results for GMRES of Paige, Rozložník, Strakoš

(2006) can be extended to the left-preconditioned case

• As long as within GMRES,

𝐴

(not explicitly formed) is applied to

a vector with sufficient accuracy,

𝑑

_𝑖

− 𝑑

_{𝑖 ∞}

𝑑

_{𝑖 ∞}

= 𝜃

𝑖

𝑢 ≲ 𝛾

𝑛

𝜅

∞

(

𝐴)

(usually

𝜅

_∞

(

𝐴) ≈ 1 + 𝜅

_∞

𝐴 𝑢

)

⟹

Even if

𝜅

_∞

𝐴 > 𝑢

−1

,

𝜃

_𝑖

𝑢 < 1

(36)

Numerical experiments

𝑢 = 2−53 _{(double), 𝑢}2 _{= 2}−113 _(quad)

UFSMC matrix: oscil_dcop_06, 𝑛 = 430

cond 𝐴 = 2 ⋅ 1018, 𝜅_∞ 𝐴 = 1 ⋅ 1021, 𝜅 𝐴 = 45 𝑏 = randn 𝑛, 1

Standard IR GMRES-IR

Standard IR steps GMRES-IR steps GMRES its.

− 2 7 (3,4)

𝜃_𝑖𝑢 = 𝑑_𝑖 − 𝑑_{𝑖 ∞}/ 𝑑_{𝑖 ∞} 𝑒_𝑖 = 𝑥 − 𝑥_{𝑖 ∞} 𝑥 _∞ 𝜇_𝑖(∞) = 𝐴 𝑥− 𝑥𝑖 ∞

(37)

Numerical experiments

𝑢 = 2−53 _{(double), 𝑢}2 _{= 2}−113 _(quad)

UFSMC matrix: oscil_dcop_43, 𝑛 = 430

cond 𝐴 = 1 ⋅ 1018, 𝜅_∞ 𝐴 = 8 ⋅ 1020, 𝜅 𝐴 = 2.1 𝑏 = randn 𝑛, 1

− 3 10 (2,4,4)

𝐴 _∞ 𝑥− 𝑥_{𝑖 ∞}

(38)

Numerical experiments

𝑢 = 2−53 _{(double), 𝑢}2 _{= 2}−113 _(quad)

UFSMC matrix: mhda416, 𝑛 = 416

cond 𝐴 = 1 ⋅ 1019, 𝜅_∞ 𝐴 = 2 ⋅ 1025, 𝜅 𝐴 = 7 ⋅ 109 𝑏 = randn 𝑛, 1

5 2 3 (1,2)

(39)

Two-Stage IR

• Sometimes standard (LU-based) IR converges despite

𝜅

_∞

𝐴 > 𝑢

−1

• Cheaper than GMRES-IR per refinement step

• But hard to predict

(40)

Two-Stage IR

• Sometimes standard (LU-based) IR converges despite

𝜅

_∞

𝐴 > 𝑢

−1

• Cheaper than GMRES-IR per refinement step

• But hard to predict

• Two-Stage IR

• Solve

𝐴𝑥

₀

= 𝑏

by LU factorization

• Attempt standard IR

(41)

Two-Stage IR

• Sometimes standard (LU-based) IR converges despite

𝜅

_∞

𝐴 > 𝑢

−1

• Cheaper than GMRES-IR per refinement step

• But hard to predict

• Two-Stage IR

• Solve

𝐴𝑥

₀

= 𝑏

by LU factorization

• Attempt standard IR

• If convergence is slow, or divergence, switch to GMRES-IR

(making use of existing LU factorization)

• Decision to switch can be based on, e.g., stopping criteria for

forward error of Demmel et al. (2006)

• Future work...

(42)

Extensions

• Pivoting

• common to use pivoting strategy to minimize fill

(43)

Extensions

• Pivoting

• common to use pivoting strategy to minimize fill

• static pivoting, threshold pivoting

• Incomplete LU factorizations

• As long as

𝜅

_∞

𝐴 𝑢 < 1

,

𝜃

_𝑖

𝑢 < 1

, so expect refinement process to

converge

(44)

Extensions

• Pivoting

• common to use pivoting strategy to minimize fill

• static pivoting, threshold pivoting

• Incomplete LU factorizations

• As long as

𝜅

_∞

𝐴 𝑢 < 1

,

𝜃

_𝑖

𝑢 < 1

, so expect refinement process to

converge

• Other solvers

• Left-preconditioned, unrestarted GMRES used here for theoretical

purposes

• In practice, many potential modifications may improve

performance while still resulting in IR convergence

• Restarted GMRES

• Right, split preconditioned GMRES, FGMRES

(45)

Extensions II: Iterative refinement in 3 precisions

• Emerging architectures feature built-in support for multiprecision

computation, rising interest in low-precision storage and computation (performance and energy savings!)

(46)

Extensions II: Iterative refinement in 3 precisions

• Half precision (FP16) defined as storage format in 2008 IEEE standard

• Intel Ivy bridge, 2012: supports half precision for storage

• NVIDIA Tesla P100, 2016: native hardware ISA support for 16-bit FP arithmetic

• TSUBAME3.0 supercomputer, 2017: projected 12.2 double-precision petaflops, 64.3 half-precision petaflops

• Intel Xeon Phi (Knights Mill), 2017: will support 16-bit FP

(47)

Extensions II: Iterative refinement in 3 precisions

• Google Tensorflow processor (TPU): quantizes 32-bit FP computations into 8-bit arithmetic

• Can we use lower precision in the most expensive part of solving 𝐴𝑥 = 𝑏 using IR (the LU factorization) and still obtain accurate solutions?

(48)

Extensions II: Iterative refinement in 3 precisions

• Google Tensorflow processor (TPU): quantizes 32-bit FP computations into 8-bit arithmetic

• Can we use lower precision in the most expensive part of solving 𝐴𝑥 = 𝑏 using IR (the LU factorization) and still obtain accurate solutions?

• Three precisions:

(49)

Extensions II: Iterative refinement in 3 precisions

• Existing analyses:

• Wilkinson (1963): fixed-point arithmetic.

• Moler (1967): floating-point arithmetic.

• Higham (1997, 2002): more general analysis for arbitrary solver.

• Langou et al. (2006): lower precision LU.

• All the above support at most two precisions and require

𝜅

_∞

𝐴 𝑢 < 1

.

(50)

Extensions II: Iterative refinement in 3 precisions

• Existing analyses:

• Wilkinson (1963): fixed-point arithmetic.

• Moler (1967): floating-point arithmetic.

• Higham (1997, 2002): more general analysis for arbitrary solver.

• Langou et al. (2006): lower precision LU.

• All the above support at most two precisions and require

𝜅

_∞

𝐴 𝑢 < 1

.

SSD DDQ HHS HHD HHQ SSQ SSS DDD HHH SDD HSS DQQ HDD HQQ SQQ HSD HSQ HDQ SDQ Traditional Wilkinson (1948)

New analysis generalizes and extends existing types of IR: (

𝑢

_𝑓

, 𝑢, 𝑢

_𝑟

)

(51)

Extensions II: Iterative refinement in 3 precisions

• Three precisions:

• 𝑢_𝑓: factorization precision

• 𝑢: working precision

• 𝑢_𝑟: residual computation precision

For IR in precisions 𝒖_𝒇 ≥ 𝒖 ≥ 𝒖_𝒓, if

𝜙_𝑖 = 2𝒖_𝒇 min cond 𝐴 , 𝜅_∞ 𝐴 𝜇_𝑖 + 𝒖_𝒇𝜃_𝑖

is sufficiently less than 1, then the forward error is reduced on the 𝑖th iteration by a factor ≈ 𝜙_𝑖 until an iterate 𝑥 is produced for which

𝑥 − 𝑥 _∞

𝑥 _∞ ≲ 4𝑛𝒖𝒓cond 𝐴, 𝑥 + 𝒖.

Theorem (C. & Higham, 2017)

(52)

Extensions II: Iterative refinement in 3 precisions

Backward error

𝑢_𝑓 𝑢 𝑢_𝑟 𝜅_∞(𝐴) norm comp Forward error H S S 104 S S cond 𝐴, 𝑥 ⋅ 10−8

H S D 104 S S S

H D D 104 D D cond 𝐴, 𝑥 ⋅ 10−16

H D Q 104 _D _D _D

S S S 108 S S cond 𝐴, 𝑥 ⋅ 10−8

S S D 108 S S S

S D D 108 D D cond 𝐴, 𝑥 ⋅ 10−16

S D Q 108 D D D

(53)

Extensions II: Iterative refinement in 3 precisions

Backward error

H S D 104 S S S

H D D 104 D D cond 𝐴, 𝑥 ⋅ 10−16

H D Q 104 _D _D _D

S S S 108 S S cond 𝐴, 𝑥 ⋅ 10−8

S S D 108 S S S

S D D 108 D D cond 𝐴, 𝑥 ⋅ 10−16

S D Q 108 D D D

Standard (LU-based) IR in three precisions:

(54)

Extensions II: Iterative refinement in 3 precisions

Backward error

H S D 104 S S S

H D D 104 D D cond 𝐴, 𝑥 ⋅ 10−16

H D Q 104 _D _D _D

S S S 108 S S cond 𝐴, 𝑥 ⋅ 10−8

S S D 108 S S S

S D D 108 D D cond 𝐴, 𝑥 ⋅ 10−16

S D Q 108 D D D

(55)

Extensions II: Iterative refinement in 3 precisions

Backward error

H S D 104 S S S

H D D 104 D D cond 𝐴, 𝑥 ⋅ 10−16

H D Q 104 _D _D _D

S S S 108 S S cond 𝐴, 𝑥 ⋅ 10−8

S S D 108 S S S

S D D 108 D D cond 𝐴, 𝑥 ⋅ 10−16

S D Q 108 D D D

(56)

Extensions II: Iterative refinement in 3 precisions

Backward error

H S D 104 S S S

H D D 104 D D cond 𝐴, 𝑥 ⋅ 10−16

H D Q 104 _D _D _D

S S S 108 S S cond 𝐴, 𝑥 ⋅ 10−8

S S D 108 S S S

S D D 108 D D cond 𝐴, 𝑥 ⋅ 10−16

S D Q 108 D D D

(57)

Extensions II: Iterative refinement in 3 precisions

Backward error

𝑢_𝑓 𝑢 𝑢_𝑟 𝜅_∞(𝐴) norm comp Forward error

LU-IR H S D 104 S S S

GMRES-IR H S D 108 S S S

LU-IR S D Q 108 D D D

GMRES-IR S D Q 1016 D D D

LU-IR H D Q 104 D D D

GMRES-IR H D Q 1012 _D _D _D

Benefits of GMRES-IR:

(58)

Extensions II: Iterative refinement in 3 precisions

Backward error

(59)

Extensions II: Iterative refinement in 3 precisions

Backward error

(60)

Extensions II: Iterative refinement in 3 precisions

Backward error

(61)

Numerical experiments

𝑢_𝑓 = 2−11 (half), 𝑢 = 2−24 (single), 𝑢_𝑟 = 2−53 (double) A = gallery('randsvd',100,kappa,2)

b = randn(100,1)

𝜅_∞ 𝐴 = 2 ⋅ 102 𝜅_∞ 𝐴 = 14

GMRES its: 11 (5,6)

(62)

Numerical experiments

b = randn(100,1)

𝜅_∞ 𝐴 = 2 ⋅ 102 𝜅_∞ 𝐴 = 14

GMRES its: 11 (5,6)

𝜅_∞ 𝐴 = 2 ⋅ 106 𝜅_∞ 𝐴 = 8 ⋅ 106

(63)

Numerical experiments

b = randn(100,1)

𝜅_∞ 𝐴 = 2 ⋅ 102 𝜅_∞ 𝐴 = 14

GMRES its: 11 (5,6)

𝜅_∞ 𝐴 = 2 ⋅ 106 𝜅_∞ 𝐴 = 8 ⋅ 106

GMRES its: 40 (7,24,9)

(64)

GMRES convergence rate

• If LU factorization is computed in lower precision, can diminish effectiveness as a preconditioner

• Theory only guarantees that GMRES will converge to an accurate solution within 𝑛 iterations

(65)

GMRES convergence rate

• If close to 𝑛 iterations, no expected performance benefit

• If 𝐴 is nonnormal, spectrum of 𝐴 is irrelevant to GMRES convergence rate (Greenbaum, Pták, Strakoš, 1996)

(66)

GMRES convergence rate

• If 𝐴 is normal, spectrum of 𝐴 determines GMRES convergence rate (Liesen & Tichý, 2004)

• But small 𝜅_∞( 𝐴) may not mean fast GMRES convergence

(67)

GMRES convergence rate

• If 𝐴 is normal, spectrum of 𝐴 determines GMRES convergence rate (Liesen & Tichý, 2004)

• But small 𝜅_∞( 𝐴) may not mean fast GMRES convergence

• e.g., if 𝐴 has a cluster of eigenvalues close to the origin

⇒ Can only make guarantees on fast GMRES convergence in some cases, e.g., normality and no eigenvalue cluster near origin

• Potential fixes for slow GMRES convergence: apply additional preconditioner, deflation, other Krylov subspace methods