el sistema de gestión - Informe de Sostenibilidad

We begin this subsection by recalling the result of Luo and Sturm [89]. The authors estab- lished the global error bound for the zero set of a quadratic function.

Theorem 1.4.10. [89] Let f : Rn

→ R be the quadratic function. There exists a constant τ > 0

such that

dist(x, [f = 0])_{≤ τ(|f(x)| + |f(x)|}12),∀x ∈ Rn.

This result is recovered by the works of [101, Corollary 5], [50, Corollary 2]. Remark that this theorem does not until hold for an arbitrary polynomial,

Example 4. Let f (x, y) = (xy_{− 1)}2_{+ (x}

− 1)2_,

∀(x, y) ∈ R2_.

One has [f _{≤ 0] = {(1, 1)}. Consider the sequence (x}k = 1_k, yk = k)k∈N, it is easy to check

that 0 < f (xk, yk) = 1−1_k 2 < 1,∀k ∈ N and d((xk, yk), [f ≤ 0]) → +∞(k → +∞),

therefore, f does not possess H¨older global error bound.

However, when f is a polynomial convex, this result was proved by Yang [124], and we present it in Theorem 1.4.15.

Let us now present the characterization of global error bound for semi–algebraic, which is proved by Ha [59].

Suppose that f : Rn

→ (−∞, +∞] has a H¨older global error bound, (1.7) dist(x, [f_{≤ 0]) ≤ τ [f(x)]}a

++ [f (x)]b+ , ∀x ∈ Rn.

We observe easily that for any sequence (xk)k∈N⊂ Rn, two following assertions hold

(i) If f (xk)→ 0, then dist(xk, [f ≤ 0]) → 0.

(ii) If dist(xk, [f ≤ 0]) → +∞, then f(xk)→ +∞.

Conversely, in [59], Ha proved that, for a polynomial function which satisfies two above conditions, then it possesses Hölder global error bound. This result was extended for the class of continuous semi-algebraic functions, see [50, Theorem 2]. The definition of the semi-algebraic function is well-known, we can see the one in [50, Definition 1].

Theorem 1.4.11 (Characterization of global error bound for semi-algebraic). [59, 50] Let

f : Rn

→ R be a continuous semi-algebraic function. The following statements are equivalent:

1. For any sequence (xk)k∈N∈ Rn\ [f ≤ 0] and kxkk → +∞, we have:

(i) If f (xk)→ 0 then dist(xk, [f ≤ 0]) → 0.

(ii) If dist(xk, [f ≤ 0]) → +∞ then f(xk)→ +∞.

2. There exist τ > 0 and a, b > 0 such that

dist(x, [f≤ 0]) ≤ τ [f(x)]a

++ [f (x)]b+ , ∀x ∈ Rn.

Proof. (2)_{⇒ (1) is obvious, we now prove the implication (1) ⇒ (2). The proof is divided into}

two parts. Using (i), we shall prove that an error bound holds on the neighborhood of [f _{≤ 0],} while by using (ii) we provide a bound for large dist(x, [f _{≤ 0]).}

Assume (i) holds. Let us prove that there exist τ1> 0, a > 0 and r > 0 such that

dist(x, [f _{≤ 0]) ≤ τ}1[f (x)]a+,∀x ∈ [f ≤ r].

For t∈ R, put ϕ(t) = sup{dist(x, [f ≤ 0]) : f(x) = t}. It is a semi-algebraic function. Thanks to (i), there exists r > 0 such that ϕ(t) <_{∞ for all t ∈ [0, r]. We can choose r suﬃciently small} such that ϕ(t) is continuous and ϕ(t)_{6= 0 on (0, r]. By using Puiseux Lemma:}

ϕ(t) = τ ta+ 0(ta), (t→ 0).

From the assumption (i), it can be seen that τ > 0, a > 0. So there exist r > 0 and τ1> 0 such

that ϕ(t)≤ τ1ta, for all t∈ [f ≤ r]. It means that

dist(x, [f _{≤ 0]) ≤ τ}1[f (x)]a+,∀x ∈ [f ≤ r].

Using (ii), let us prove that there exist τ2> 0, b > 0 and δ > 0 such that

This conclusion is clear when f is bounded from above. We assume thus that supRnf =

supRnϕ = +∞. It appears that ϕ(t) > 0 when t is suﬃciently large, so there exist τ > 0

and b > 0 such that

ϕ(t) = τ tb+ 0(tb). This implies that there is cτ2> 0, R > 0 such that

dist(x, [f_{≤ 0]) ≤ τ}2[f (x)]b+,∀x ∈ [R < f].

It is easily seen that (ii) implies the existence of M > 0 such that dist(x, S) < M , for all

x∈ [r < f < R]. It gives dist(x, [f ≤ 0]) ≤ M

rαf (x)α. Combining with such inequality on the

domain [f≤ r] and [f ≥ R], we have the conclusion.

The implication (2) _{⇒ (1) in the latter theorem explains why do we need two exponents} [f (x)]a

+and [f (x)]b+in the global error bound (1.7). One is ensures that the inequality (1.7) holds

when dist(x, [f _{≤ 0]) → 0, and the other keeps such inequality holds when dist(x, [f ≤ 0]) → +∞.} Generally, the exponents are diﬀerent.

Example 5. [60] Let f (x, y) = x2+ y4,∀(x, y) ∈ R2_.

It can be seen that [f _{≤ 0] = {( 0, 0)}, and}

dist((x, y), [f ≤ 0]) ≤ f14(x, y) + f 1

2(x, y),∀(x, y) ∈ R2.

On the other hands, by taking two sequences (x1

k = k, y1n = 0)k∈N and (x2k = 0, yk2 = 1/k)k∈N,

this follows that there does not exist α_{∈ R such that} dist((x, y), [f ≤ 0]) ≤ τ[f(x, y)]α

+,∀(x, y) ∈ R2.

By using Theorem 1.4.11, Ha [59] provided a global error bound for polynomial function under a Palais–Smale condition. After that, his result was improved in [50] for continuous semi-algebraic functions.

We recall that, f is said to possess the Palais-Smale condition (PS) at r0 if any sequence

(xk)k∈N, for which f (xk) → r0 and dist (0, ∂f (xk)) → 0, then (xk)k∈N possesses a converging

subsequence.

Theorem 1.4.12. [59, 50] Let f : Rn

→ R be a continuous semi-algebraic function. Suppose

that f satisﬁes the Palais-Smale condition at each r > 0, then there exist constants τ > 0 and a, b > 0 such that

dist(x, [f _{≤ 0]) ≤ τ [f(x)]}a++ [f (x)]b+ , ∀x ∈ Rn.

Proof. It is enough to show that f satisﬁes the two conditions (i) and (ii) in Theorem 1.4.11.

First we establish (i). By contradiction, we assume that there exists a sequence (xk)k∈N and a

constant δ > 0 such that:

kxkk → ∞, f(xk)→ 0 and dist(xk, [f ≤ 0]) > δ.

Put X =_{{x|f(x) ≥ 0}, then X is a complete metric space. Applying Ekeland’s principle (see} [53]), there is a sequence (yk)k∈N⊂ X such that

f (yk)≤ f(xk) = εk

f (yk)≤ f(x) +√εkdist(x, yk),∀x ∈ X.

It is clear that f (yk)→ 0 and kykk → +∞. We can suppose that dist(yk, [f ≤ 0]) ≥ δ₂, therefore

∀t ∈ (0,δ

2) and for all u∈ R

n_, kuk = 1 we obtain f (yk+ tu)− f(yk) t ≥ − √_ε k.

Thus _|∇f|(yk) ≤ √εk. On the other hands, k∂f(yk)k ≤ |∇f|(yk), (see [10, Remark 6.1]),

therefore ∂f (yk)→ 0, which is in contradiction with Palais-Smale’s condition.

Now, we will prove that f satisﬁes the condition (ii) of Theorem 1.4.11. By contradiction, suppose that there exists a sequence (xk)k∈N⊂ Rn such that:

kxkk → ∞, dist(xk, [f ≤ 0]) → +∞ and f(xk)→ t ∈ R.

Set X =_{{x|f(x) ≥ 0}, X is a complete metric space. Applying Ekeland’s principle, there is a} sequence (yk)k∈N ⊂ X such that

f (yk)≤ f(xk) = tk dist(xk, yk)≤ dist(xk, [f ≤ 0]) 2 f (yk)≤ f(x) + 2f (xk) dist(xk, [f ≤ 0])dist(x, yk),∀x ∈ X

Therefore, without loss of generality we can assume that the sequence f (yk) is convergent,

kykk → ∞ and dist(yk, [f ≤ 0]) → +∞, therefore,

k∇f(yk)k ≤ |∇f|(yk)≤ 2f (xk)

dist(xk, [f ≤ 0]) → 0,

contradicting to Palais-Smale’s condition.

1.4.2.2 Convex case

We begin this subsection by giving a result of Facchinei, Pang [54], they assert that a lower semicontinuous convex function, a H¨older-type error bound on a level set can be extended to a global error bound.

Theorem 1.4.13. [54] Let f be a lower semicontinuous convex function on Rn _{with [f} _{≤ 0]}

nonempty. Suppose that there exist δ > 0 and τ > 0, θ > 0 such that

dist(x, [f _{≤ 0]) ≤ τ [f(x)]}++ [f (x)]θ+ , ∀x ∈ [f ≤ δ].

There exists τ′ _{> 0 such that}

dist(x, [f_{≤ 0]) ≤ τ}′ [f (x)]++ [f (x)]θ+ , ∀x ∈ Rn.

When we take θ = 1, this means that for a convex function, a Lipschitz error bound on the level set can be extended to a global error bound.

Proof. Let x _{∈ R}n _{such that f (x) > δ and p = P}

[f ≤0]x. It is clear that f (p) = 0. For any

λ_{∈ (0, 1), we denote x}λ = λx + (1− λ)p. It can be seen that p = P[f ≤0]xλ and dist(xλ, [f ≤

0]) = λ dist(x, [f _{≤ 0]). By convexity, we get}

f (xλ)≤ λf(x) + (1 − λ)f(p) = λf(x).

We deduce that

dist(x, [f _{≤ 0]) ≤} dist(xλ, [f ≤ 0])

f (xλ)

f (x).

On the other hand, by choosing λ = _{2f (x)}δ , we get

f (xλ)≤ λf(x) =

2 < δ. Therefore, thanks to the assumption on error bounds, we obtain

dist(xλ, [f ≤ 0]) ≤ τ f(xλ) + fθ(xλ) . It follows that dist(xλ, [f ≤ 0]) f (xλ) < τ 1 + fθ−1_(x λ) < c 1 +  δ 2 θ−1! .

Combining the above inequalities, we get

dist(x, [f_{≤ 0]) ≤ τ} 1 + δ 2

θ−1!

f (x).

This means that

dist(x, [f _{≤ 0]) ≤ τ} 1 + δ 2

θ−1!

f (x) + fθ(x) , ∀x ∈ Rn_.

Combining this result with Theorem 1.4.2, we immediately obtain a result similar to [24, Theorem 3] and [49, Theorem 6].

Theorem 1.4.14. Let fi: Rn → R, (i = 1, . . . m) be continuous, convex and subanalytic func-

tions. Assume that, the set

S ={x ∈ Rn

|fi(x)≤ 0, i = 1, . . . m}

is nonempty, compact. Then, there exist τ, θ > 0 such that

dist(x, S)_{≤ τ [f(x)]}++ [f (x)]θ+ , ∀x ∈ Rn,

where f (x) =Pm

i=1[fi(x)]+.

We remark that if fi is coercive then for all r∈ R, the set [fi≤ r] is compact.

Deﬁnition 4. [81, 78] A continuous function f on Rn _{is said to be a piecewise convex polyno-}

mial function if there exist ﬁnitely many polyhedra P1, . . . , Pk with ∪kj=1Pj = Rn such that the

restriction of f on each Pj, denoted by fi, is a convex polynomial function. The degree of f ,

denoted by deg(f ), is deﬁned as the maximum of deg(fj).

In [81], Li studied error bounds for a convex piecewise quadratic function. More precisely, let

f be a convex piecewise quadratic function. Then, there exists τ > 0 such that

(1.8) dist(x, [f_{≤ 0]) ≤ τ}[f (x)]++p[f(x)]+

,_{∀x ∈ R}n_.

By using Theorem 1.4.5 and Theorem 1.4.13, Li [77] showed that, for a convex polynomial function f on Rn _{with degree d, there exists τ > 0 such that}

(1.9) dist(x, [f ≤ 0]) ≤ τ [f (x)]++ [f (x)] 1 κ(n,d) + ,∀x ∈ Rn_.

This result is further improved by Yang [124].

Theorem 1.4.15. [124] Let f be a polynomial convex with degree d. There exists τ > 0 such

that dist(x, [f _{≤ 0]) ≤ τ}[f (x)]++ [f (x)] 1 d + ,_{∀x ∈ R}n.

The two above results (1.8), (1.9) have been extended by Li ([78]), for general convex piecewise polynomial function.

Theorem 1.4.16. [78] Let f be a piecewise convex polynomial function on Rn _{with degree d.}

Suppose that one of the following two conditions holds: (i) If dist(x, [f _{≤ 0]) → +∞ then f(x) → +∞.} (ii) f is convex.

There exists c > 0 such that

dist(x, [f _{≤ 0]) ≤ c} [f (x)]++ [f (x)] 1 κ(n,d) + ,_{∀x ∈ R}n.

Let us now present a global error bound for convex polynomial function systems. In [87], under the Slater condition, Luo and Luo proved that a global Lipschitzian error bound holds for convex quadratic systems. After that, without the Slater condition, Pang and Wang in [121], showed that any systems of convex quadratic has a global error bound.

Theorem 1.4.17. [121] Let f1, f2. . . , fmbe convex quadratic functions. Assume that

S =_{{x ∈ R}n

|fi(x)≤ 0, i = 1, . . . m}

is not empty, then there exists a positive integer dist≤ n + 1 and a scalar c > 0 such that dist(x, S)≤ c maxk[f(x)]+k, k[f(x)]+k 1 2d ,∀x ∈ Rn_, where f (x) = (fi(x))i=1...,m,∀x ∈ Rn.

Similarly Theorem 1.3.14, the latter result is extended to the Banach space in [103, Theorem 7], with

fi(x) =

2hAix, xi + hBi, xi + ci,

where Ai: X× X → R is a symmetric continuous bilinear and semi-deﬁnite positive, Bi ∈ X∗

and ci∈ R, for i = 1, . . . , m.

Note that this result does not hold for a general convex polynomial function system, see Example 2. However, in some particular cases, the global error bound hold for such systems.

Theorem 1.4.18. Let f1, . . . , fp be convex polynomial functions on Rn whose degrees are at

most d. Let f (x) = maxi=1,...,mfi(x), ∀x ∈ Rn. Then, the following statements are hold

1. [77] If fi(x)≥ 0, ∀x ∈ Rn, i = 1, . . . , m then there exists τ > 0 such that

dist(x, [f _{≤ 0]) ≤ τ} [f (x)]++ [f (x)] 1 κ(n,d) + ,_{∀x ∈ R}n_.

2. [102] If S =_{{x ∈ K|f(x) ≤ 0} is a nonempty compact set, where K is a convex polyhedral} in Rn_{. Then, there exists τ > 0 such that}

dist(x, S)_{≤ c} [f (x)]++ [f (x)] 1 κ(n,2d) + ,_{∀x ∈ K.} 3. [102]Let K is a convex polyhedral in Rn _{and S =}

{x ∈ K|f(x) ≤ 0} is nonempty. Assume

that, for each v_{∈ K}∞_{: max}_i=1,...,p_f∞

i (v) = 0⇒ fi∞(v) = 0 (see (1.4)). Then, there exists

τ > 0 such that dist(x, S)≤ c [f (x)]++ [f (x)] 1 κ(n,2d) + ,∀x ∈ K,

where K∞ _{is recession cone of K, deﬁned by}

K∞₌_{{v ∈ R}n

Chapter 2 From error bounds to the

complexity of ﬁrst–order descent

methods for convex functions

Abstract This chapter shows that error bounds can be used as effective tools for deriving complexity results for first-order descent methods in convex minimization. In a first stage, this objective led us to revisit the interplay between error bounds and the Kurdyka- Lojasiewicz (KL) inequality. One can show the equivalence between the two concepts for convex functions having a moderately flat profile near the set of minimizers (as those of functions with Hölderian growth). A counterexample shows that the equivalence is no longer true for extremely flat functions. This fact reveals the relevance of an approach based on KL inequality. In a second stage, we show how KL inequalities can in turn be employed to compute new complexity bounds for a wealth of descent methods for convex problems. Our approach is completely original and makes use of a one-dimensional worst-case proximal sequence in the spirit of the famous majorant method of Kantorovich. Our result applies to a very simple abstract scheme that covers a wide class of descent methods. As a byproduct of our study, we also provide new results for the globalization of KL inequalities in the convex framework.

2.1 Overview and main results

A brief insight into the theory of error bounds. Since Hoffman’s celebrated result on error bounds for systems of linear inequalities [63], the study of error bounds has been successfully applied to problems in sensitivity, convergence rate estimation, and feasibility issues. In the optimization world, the first natural extensions were made to convex functions by Robinson [113], Mangasarian [93], and Auslender-Crouzeix [6]. However, the most striking discovery came years before in the pioneering works of Lojasiewicz [84, 85] at the end of the fifties: under a mere compactness assumption, the existence of error bounds for arbitrary continuous semi-algebraic functions was provided. Despite their remarkable depth, these works remained unnoticed by the optimization community during a long period (see [88]). At the beginning of the nineties, motivated by numerous applications, many researchers started working along these lines, in quest for quantitative results that could produce more effective tools. The survey of Pang [106] provides a comprehensive panorama of results obtained around this time. The works of Luo [87, 88, 89] and Dedieu [45] are also important milestones in the theory. The recent works [78, 80, 59, 79, 15]

provide even stronger quantitative results by using the powerful machinery of algebraic geometry or advanced techniques of convex optimization.

A methodology for complexity of ﬁrst-order descent methods. Let us introduce the concepts used in this work and show how they can be arranged to devise a new and systematic approach to complexity. Let H be a real Hilbert space, and let f : H_{→ (−∞, +∞] be a proper} lower-semicontinuous convex function achieving its minimum min f so that argmin f_{6= ∅. In its} most simple version, an error bound is an inequality of the form

(2.1) ω f (x)_{− min f ≥ dist(x, argmin f),}

where ω is an increasing function vanishing at 0 –called here the residual function–, and where

x may evolve either in the whole space or in a bounded set. H¨olderian error bounds, which are

very common in practice, have a simple power form

f (x)− min f ≥ γ distp_{(x, argmin f ),}

with γ > 0, p _{≥ 1 and thus ω(s) = (}_γ1s)p1. When functions are semi-algebraic on H = Rn

and “regular” (for instance, continuous), the above inequality is known to hold on any compact set [84, 85], a modern reference being [42]. This property is known in real algebraic geometry under the name of Lojasiewicz inequality. However, since we work here mainly in the sphere of optimization and follow complexity purposes, we shall refer to this inequality as to the Lojasiewicz

error bound inequality.

Once the question of computing constants and exponents (here γ and p) for a given mini- mization problem is settled (see the fundamental works [89, 78, 15, 59]), it is natural to wonder whether these concepts are connected to the complexity properties of first-order methods for minimizing f . Despite the important success of the error bound theory in several branches of optimization, we are not aware of a solid theory connecting the error bounds we consider (as de- fined in (2.1)), with the study of the complexity of general descent methods. There are, however, several works connecting error bounds with the convergence rates results of first-order methods (see e.g., [92, 97, 16, 40, 108]). See also the new and interesting work [79] that provides a wealth of error bounds and some applications to convergence rate analysis. An important fraction of these works involves “first-order error bounds”1 _{(see [88, 92]) that are different from those we}

consider here.

Our answer to the connection between complexity and “zero-order error bounds” will partially come from a related notion, also discovered by Lojasiewicz and further developed by Kurdyka in the semi-algebraic world: the Lojasiewicz gradient inequality. This inequality, also called Kurdyka- Lojasiewicz (KL) inequality (see [23]), asserts that for any smooth semi-algebraic func- tion f there is a smooth concave function ϕ such that

k∇ (ϕ ◦ (f − min f)) (x)k ≥ 1

for all x in some neighborhood of the set argmin f . Its generalization to the nonsmooth case [21, 22] has opened very surprising roads in the nonconvex world and it has allowed to perform convergence rate analyses for many important algorithms in optimization [4, 26, 56]. In a ﬁrst stage of the present paper we show, when f is convex, that error bounds are equivalent to nonsmooth KL inequalities provided the residual function has a moderate behavior close to 0 (meaning that its derivative blows up at reasonable rate). Our result includes, in particular, all power-type examples like the ones that are often met in practice2_.

That is, involving inequalities of the type k∇f (x)k ≥ ω(dist(x, argmin f ))

An absolutely crucial asset of error bounds and KL inequalities in the convex world is their global nature under a mere coercivity assumption – see Section 2.6.

Once we know that error bounds provide a KL inequality, one still needs to make the connection with the actual complexity of ﬁrst-order methods. This is probably the main contribution in this paper: to any given convex objective f : H _{→ (−∞, +∞] and descent sequence of the} form

(i) f (xk) + akxk− xk−1k2≤ f(xk−1),

(ii) _kωkk ≤ bkxk− xk−1k where ωk ∈ ∂f(xk), k≥ 1,

we associate a worst case one dimensional proximal method

αk = argmin ϕ−1(s) + 1 2ζ(s− αk) 2_{: s} ≥ 0 , α0= ϕ−1(f (x0)),

where ζ is a constant depending explicitly on the triplet of positive real numbers (a, b, ℓ) where

ℓ > 0 is a Lipschitz constant ofϕ−1′_{. Our complexity result asserts, under weak assumptions}

that the “1-D prox” governs the complexity of the original method through the elementary and natural inequality

f (xk)− min f ≤ ϕ−1(αk), k≥ 0.

Similar results for the sequence are provided. These ideas are already present in [20] and [18, Section 3.2]. The function ϕ−1 _{above –the inverse of a desingularizing function for f on a}

convenient domain– contains almost all the information our approach provides on the complexity of descent methods. As explained previously, it depends on the precise knowledge of a KL inequality and thus, in this convex setting, of an error bound. The reader familiar with second- order methods might have recognized the spirit of the majorant method of Kantorovich [68], where a reduction to dimension one is used to study Newton’s method.

Deriving complexity bounds in practice: applications. Our theoretical results inaugu- rate a simple methodology: derive an error bound, compute the desingularizing function when- ever possible, identify essential constants in the descent method and finally compute the complexity using the one-dimensional worst case proximal sequence. We consider first some classic well-posed problems: finding a point in an intersection of closed convex sets with regular intersection or uniformly convex problems, and we show how complexity of some classical methods can be obtained or recovered. We revisit the iterative shrinkage thresholding algorithm (ISTA) applied to a least squares objective with ℓ1_{regularization [44] and we prove that its complexity}

is of the form O(qk_{) with q} _{∈ (0, 1) (see [97] for a pioneering work in this direction and also}

[83] for further geometrical insights). This result contrasts with what was known on the subject [17, 51] and suggests that many questions on the complexity of ﬁrst-order methods remain open.

Theoretical aspects and complementary results. As explained before, our paper led us to establish several theoretical results and to clarify some questions appearing in a somehow disparate manner in the literature. We first explain how to pass from error bounds to KL inequality in the general setting of Hilbert spaces and vice versa, similar questions appear in [21, 80, 79]. This result is proved by considering the interplay between the contraction semigroup generated by the subdifferential function and the L1 _{contraction property of this flow. These}

results are connected to the geometry of the residual functions ω and break down when error bounds are too ﬂat. This is shown in Section 2.6 by a dimension 2 counterexample presented in [23] for another purpose.

Our investigations also led us to consider the problem of KL inequalities for convex functions, a problem partly tackled in [23]. We show how to extend convex KL inequalities from a level set

to the whole space. We also show that compactness and semi-algebraicity ensure that real semi- algebraic or deﬁnable coercive convex functions are automatically KL on the whole space. This result has an interesting theoretical consequence in terms of complexity: abstract descent methods

for coercive semi-algebraic convex problems are systematically amenable to a full complexity analysis provided that a desingularizing function –known to exist– is explicitly computable.

2.2 Preliminaries

In this section, we recall the basic concepts, notation and some well-known results to be used throughout the paper. In what follows, H is a real Hilbert space and f : H → (−∞, +∞] is proper, lower-semicontinuous and convex. We are interested in some properties of the function

f around the set of its minimizers, which we suppose to be nonempty and denote by argmin f

or S. We assume, without loss of generality, that min f = 0.

In document Informe de Sostenibilidad (página 124-128)