• No se han encontrado resultados

2.4 NORMAS INTERNACIONALES DE INFORMACIÓN FINANCIERA

2.4.1 Marco conceptual para la información financiera

Note that one requires strong convexity, or at least strict convexity, of J for the linearised Bregman method to be well-defined. In [13], Beck & Teboulle illustrate that the popular mirror descentalgorithm [156] can be rewritten as a linearised Bregman method.

We highlight this interpretation of first-order descent methods for two reasons. First, Bregman iterations appear in several contexts in this thesis, first as a motivation for intro- ducing Bregman discrete gradient methods in Chapter 6, and in the study of algorithmic differentiation in Chapter 7. Second, this interpretation provides a way of defining gradient flows with respect to non-Euclidean energies, via so-called minimising movements schemes [5], an idea we return to in Chapter 6 and Section 8.3.1.

See [212] for a recent review of first-order optimisation methods, with a focus on Bregman iterations. We furthermore refer the reader to [104] for a review of various energy-diminishing discretisation methods for gradient systems, including implicit Euler and discrete gradient methods.

2.7

Geometric numerical integration and discrete gradi-

ents

In Section 1.1.3, we discussed numerical integration and geometric numerical integration, and their applications to optimisation. In what follows we define discrete gradients, introduce the three most common examples of discrete gradients, and consider their applicability to the Euclidean gradient flow (1.12).

Definition 2.40 (Discrete gradient). Let f be a continuously differentiable function. A discrete gradient is a continuous map ∇ f : Rn× Rn→ Rnsuch that for all x, y ∈ Rn,

⟨∇ f (x, y), y − x⟩ = f (y) − f (x) (Mean value), (2.5) lim

y→x∇ f (x, y) = ∇ f (x) (Consistency). (2.6)

Before we present the discrete gradient method, we briefly consider the dissipative structure of the gradient flow (1.12). By applying the chain rule, we compute

d

dt f(x(t)) = ⟨∇ f (x(t)), ˙x(t)⟩ = −∥∇ f (x(t))∥

2

= −∥ ˙x(t)∥2≤ 0. (2.7) Thus the gradient flow is characterised by the decrease of f (x(t)) along x(t) at the rate of ∥∇ f ∥2or equivalently ∥ ˙x∥2.

We now introduce the discrete gradient method for optimisation. For x0∈ Rnand time steps (τk)k∈N⊂ (0, +∞), we solve

xk+1= xk− τk∇ f (xk, xk+1). (2.8)

This scheme preserves the dissipative structure of gradient flows, as can be seen by applying (2.5), f(xk+1) − f (xk) = ⟨∇ f (xk, xk+1), xk+1− xk⟩ = −τk∥∇ f (xk, xk+1)∥2= −τk∥ xk+1− xk τk ∥2. (2.9)

Note that the decrease holds for all time steps τk> 0, and that (2.9) can be seen as a discrete analogue of the dissipative structure of gradient flows (2.7), replacing derivatives by finite differences.

We assume throughout the thesis that there are bounds τmax≥ τmin> 0 such that for all

k∈ N,

τmin≤ τk≤ τmax. (2.10)

While there are infinitely many discrete gradients, there are three constructions that are of particular relevance. We state these here.

1. The Gonzalez discrete gradient [97] (also known as the midpoint discrete gradient) is given by ∇ f (x, y) = ∇ f x + y 2  + f(y) − f (x) − ⟨∇ f ( x+y 2 ), y − x⟩ ∥x − y∥2 (y − x), x̸= y. (2.11)

This discrete gradient was introduced by Oscar Gonzalez in 1996, with the aim of providing a formalistic way of numerically solving Hamiltonian systems.

2. The mean value discrete gradient [106], used for example in the average vector field method [45], is given by ∇ f (x, y) = Z 1 0 ∇ f (1 − s)x + sy ds, (2.12) whereR denotes integration.

2.7 Geometric numerical integration and discrete gradients 31

3. The Itoh–Abe discrete gradient [116] (also known as the coordinate increment discrete gradient) is given by ∇ f (x, y) =        f(y1,x2,...,xn)− f (x) y1−x1 f(y1,y2,x3,...,xn)− f (y1,x2,...,xn) y2−x2 .. . f(y)− f (y1,...,yn−1,xn) yn−xn        , (2.13) where 0/0 is interpreted as [∇ f (x)]i.

Proposition 2.41. The mappings defined by (2.11)-(2.13) are discrete gradients.

Proof. Continuity of the mappings follows from continuous differentiability of the function f .

The mean value property (2.5) is straightforward to verify for the Gonzalez and Itoh–Abe discrete gradients, by plugging in their respective expressions. For the mean value discrete gradient, we derive * Z 1 0 ∇ f (1 − s)x + sy ds, y − x + = Z 1 0 ⟨∇ f (1 − s)x + sy , y − x⟩ ds = f (y) − f (x),

where the final equality follows by applying the fundamental theorem of calculus [198, Theorem 7.16] to the function g(s) := f ((1 − s)x + sy).

Finally, as with continuity of the mappings, the consistency property (2.6) can be verified directly using continuous differentiability of f .

While the first two discrete gradients are gradient-based , the Itoh–Abe discrete gradient is derivative-free, and is evaluated by computing successive, coordinate-wise difference quotients. In an optimisation setting, the Itoh–Abe discrete gradient is often preferable to the others, as it is relatively computationally inexpensive. Solving the implicit equation (2.8)

with this discrete gradient amounts to successively solving n scalar equations of the form xk+11 = x1k− τk f(xk+11 , xk2, . . . , xkn) − f (xk) xk+11 − xk 1 xk+12 = x2k− τk f(xk+11 , xk+12 , xk3, . . . , xkn) − f (xk+11 , xk2, . . . , xkn) xk+12 − xk 2 .. . xk+1n = xnk− τk f(xk+1) − f (xk+11 , xk+12 , . . . , xk+1n , xkn) xk+1n − xkn .

Chapter 3

The foundations of discrete gradient

methods for smooth optimisation

3.1

Introduction

This chapter is based on the preprint [81] and is joint work with Matthias J. Ehrhardt, Torbjørn Ringholm, and Carola-Bibiane Schönlieb.

As discussed in the previous chapter, discrete gradient methods yield unconditionally stable optimisation schemes when applied to the gradient flow (1.12). While these methods are well understood in the setting of geometric numerical integration, only in recent years have they been considered as optimisation schemes, and thus the analysis is lacking in this context. In this chapter, we seek to lay the foundations for discrete gradient methods for smooth optimisation, providing a comprehensive analysis of the well-posedness of the discrete gradient equation (2.8), optimal choices of time steps τk, convergence rates for

different classes of functions, and guarantees of convergence to a unique limit. We thus consider the unconstrained optimisation problem

min

x∈RnF(x), (3.1)

where the function F : Rn→ R is continuously differentiable.

3.1.1

Contributions and outline

While discrete gradient methods have existed in geometric integration since the 1980s, only recently have they been studied in the context of optimisation, leaving significant gaps in our

understanding of these schemes. In this chapter, we resolve fundamental questions about the discrete gradient methods, including their well-posedness, efficiency, and optimal tuning.

In Section 3.2 we define discrete gradients and introduce the four discrete gradient methods considered in this thesis. In Section 3.3, we prove that the discrete gradient equation (the update formula) (2.8) is well-posed, meaning that for any time step τk> 0 and xk∈ Rn, a solution xk+1exists, under mild assumptions on F. Using the Brouwer fixed point theorem, this is the first existence result for the discrete gradient equation without a bound on the time step. In Section 3.4, we propose an efficient and stable method for solving the discrete gradient equation and prove convergence guarantees.

In Section 3.5, we analyse the dependence of the iterates on the choice of time step, and obtain estimates for preferable time steps in the cases of L-smoothness and strong convexity. In Section 3.6, we establish convergence rates for convex functions with Lipschitz continuous gradients, and for functions that satisfy the Polyak–Łojasiewicz (PŁ) inequality [120]. In Section 3.7, we establish convergence guarantees for functions that satisfy the strong Kurdyka–Łojasiewicz inequality. In Section 3.9, we present numerical results for several test problems, and a numerical comparison of different numerical solvers for the discrete gradient equation (2.8).

We emphasise that the majority of these results hold for nonconvex functions.