We begin this subsection by recalling the result of Luo and Sturm [89]. The authors estab- lished the global error bound for the zero set of a quadratic function.
Theorem 1.4.10. [89] Let f : Rn
→ R be the quadratic function. There exists a constant τ > 0
such that
dist(x, [f = 0])≤ τ(|f(x)| + |f(x)|12),∀x ∈ Rn.
This result is recovered by the works of [101, Corollary 5], [50, Corollary 2]. Remark that this theorem does not until hold for an arbitrary polynomial,
Example 4. Let f (x, y) = (xy− 1)2+ (x
− 1)2,
∀(x, y) ∈ R2.
One has [f ≤ 0] = {(1, 1)}. Consider the sequence (xk = 1k, yk = k)k∈N, it is easy to check
that 0 < f (xk, yk) = 1−1k 2 < 1,∀k ∈ N and d((xk, yk), [f ≤ 0]) → +∞(k → +∞),
therefore, f does not possess H¨older global error bound.
However, when f is a polynomial convex, this result was proved by Yang [124], and we present it in Theorem 1.4.15.
Let us now present the characterization of global error bound for semi–algebraic, which is proved by Ha [59].
Suppose that f : Rn
→ (−∞, +∞] has a H¨older global error bound, (1.7) dist(x, [f≤ 0]) ≤ τ [f(x)]a
++ [f (x)]b+ , ∀x ∈ Rn.
We observe easily that for any sequence (xk)k∈N⊂ Rn, two following assertions hold
(i) If f (xk)→ 0, then dist(xk, [f ≤ 0]) → 0.
(ii) If dist(xk, [f ≤ 0]) → +∞, then f(xk)→ +∞.
Conversely, in [59], Ha proved that, for a polynomial function which satisfies two above conditions, then it possesses H¨older global error bound. This result was extended for the class of continuous semi-algebraic functions, see [50, Theorem 2]. The definition of the semi-algebraic function is well-known, we can see the one in [50, Definition 1].
Theorem 1.4.11 (Characterization of global error bound for semi-algebraic). [59, 50] Let
f : Rn
→ R be a continuous semi-algebraic function. The following statements are equivalent:
1. For any sequence (xk)k∈N∈ Rn\ [f ≤ 0] and kxkk → +∞, we have:
(i) If f (xk)→ 0 then dist(xk, [f ≤ 0]) → 0.
(ii) If dist(xk, [f ≤ 0]) → +∞ then f(xk)→ +∞.
2. There exist τ > 0 and a, b > 0 such that
dist(x, [f≤ 0]) ≤ τ [f(x)]a
++ [f (x)]b+ , ∀x ∈ Rn.
Proof. (2)⇒ (1) is obvious, we now prove the implication (1) ⇒ (2). The proof is divided into
two parts. Using (i), we shall prove that an error bound holds on the neighborhood of [f ≤ 0], while by using (ii) we provide a bound for large dist(x, [f ≤ 0]).
Assume (i) holds. Let us prove that there exist τ1> 0, a > 0 and r > 0 such that
dist(x, [f ≤ 0]) ≤ τ1[f (x)]a+,∀x ∈ [f ≤ r].
For t∈ R, put ϕ(t) = sup{dist(x, [f ≤ 0]) : f(x) = t}. It is a semi-algebraic function. Thanks to (i), there exists r > 0 such that ϕ(t) <∞ for all t ∈ [0, r]. We can choose r sufficiently small such that ϕ(t) is continuous and ϕ(t)6= 0 on (0, r]. By using Puiseux Lemma:
ϕ(t) = τ ta+ 0(ta), (t→ 0).
From the assumption (i), it can be seen that τ > 0, a > 0. So there exist r > 0 and τ1> 0 such
that ϕ(t)≤ τ1ta, for all t∈ [f ≤ r]. It means that
dist(x, [f ≤ 0]) ≤ τ1[f (x)]a+,∀x ∈ [f ≤ r].
Using (ii), let us prove that there exist τ2> 0, b > 0 and δ > 0 such that
This conclusion is clear when f is bounded from above. We assume thus that supRnf =
supRnϕ = +∞. It appears that ϕ(t) > 0 when t is sufficiently large, so there exist τ > 0
and b > 0 such that
ϕ(t) = τ tb+ 0(tb). This implies that there is cτ2> 0, R > 0 such that
dist(x, [f≤ 0]) ≤ τ2[f (x)]b+,∀x ∈ [R < f].
It is easily seen that (ii) implies the existence of M > 0 such that dist(x, S) < M , for all
x∈ [r < f < R]. It gives dist(x, [f ≤ 0]) ≤ M
rαf (x)α. Combining with such inequality on the
domain [f≤ r] and [f ≥ R], we have the conclusion.
The implication (2) ⇒ (1) in the latter theorem explains why do we need two exponents [f (x)]a
+and [f (x)]b+in the global error bound (1.7). One is ensures that the inequality (1.7) holds
when dist(x, [f ≤ 0]) → 0, and the other keeps such inequality holds when dist(x, [f ≤ 0]) → +∞. Generally, the exponents are different.
Example 5. [60] Let f (x, y) = x2+ y4,∀(x, y) ∈ R2.
It can be seen that [f ≤ 0] = {( 0, 0)}, and
dist((x, y), [f ≤ 0]) ≤ f14(x, y) + f 1
2(x, y),∀(x, y) ∈ R2.
On the other hands, by taking two sequences (x1
k = k, y1n = 0)k∈N and (x2k = 0, yk2 = 1/k)k∈N,
this follows that there does not exist α∈ R such that dist((x, y), [f ≤ 0]) ≤ τ[f(x, y)]α
+,∀(x, y) ∈ R2.
By using Theorem 1.4.11, Ha [59] provided a global error bound for polynomial function under a Palais–Smale condition. After that, his result was improved in [50] for continuous semi-algebraic functions.
We recall that, f is said to possess the Palais-Smale condition (PS) at r0 if any sequence
(xk)k∈N, for which f (xk) → r0 and dist (0, ∂f (xk)) → 0, then (xk)k∈N possesses a converging
subsequence.
Theorem 1.4.12. [59, 50] Let f : Rn
→ R be a continuous semi-algebraic function. Suppose
that f satisfies the Palais-Smale condition at each r > 0, then there exist constants τ > 0 and a, b > 0 such that
dist(x, [f ≤ 0]) ≤ τ [f(x)]a++ [f (x)]b+ , ∀x ∈ Rn.
Proof. It is enough to show that f satisfies the two conditions (i) and (ii) in Theorem 1.4.11.
First we establish (i). By contradiction, we assume that there exists a sequence (xk)k∈N and a
constant δ > 0 such that:
kxkk → ∞, f(xk)→ 0 and dist(xk, [f ≤ 0]) > δ.
Put X ={x|f(x) ≥ 0}, then X is a complete metric space. Applying Ekeland’s principle (see [53]), there is a sequence (yk)k∈N⊂ X such that
f (yk)≤ f(xk) = εk
f (yk)≤ f(x) +√εkdist(x, yk),∀x ∈ X.
It is clear that f (yk)→ 0 and kykk → +∞. We can suppose that dist(yk, [f ≤ 0]) ≥ δ2, therefore
∀t ∈ (0,δ
2) and for all u∈ R
n, kuk = 1 we obtain f (yk+ tu)− f(yk) t ≥ − √ε k.
Thus |∇f|(yk) ≤ √εk. On the other hands, k∂f(yk)k ≤ |∇f|(yk), (see [10, Remark 6.1]),
therefore ∂f (yk)→ 0, which is in contradiction with Palais-Smale’s condition.
Now, we will prove that f satisfies the condition (ii) of Theorem 1.4.11. By contradiction, suppose that there exists a sequence (xk)k∈N⊂ Rn such that:
kxkk → ∞, dist(xk, [f ≤ 0]) → +∞ and f(xk)→ t ∈ R.
Set X ={x|f(x) ≥ 0}, X is a complete metric space. Applying Ekeland’s principle, there is a sequence (yk)k∈N ⊂ X such that
f (yk)≤ f(xk) = tk dist(xk, yk)≤ dist(xk, [f ≤ 0]) 2 f (yk)≤ f(x) + 2f (xk) dist(xk, [f ≤ 0])dist(x, yk),∀x ∈ X
Therefore, without loss of generality we can assume that the sequence f (yk) is convergent,
kykk → ∞ and dist(yk, [f ≤ 0]) → +∞, therefore,
k∇f(yk)k ≤ |∇f|(yk)≤ 2f (xk)
dist(xk, [f ≤ 0]) → 0,
contradicting to Palais-Smale’s condition.
1.4.2.2 Convex case
We begin this subsection by giving a result of Facchinei, Pang [54], they assert that a lower semicontinuous convex function, a H¨older-type error bound on a level set can be extended to a global error bound.
Theorem 1.4.13. [54] Let f be a lower semicontinuous convex function on Rn with [f ≤ 0]
nonempty. Suppose that there exist δ > 0 and τ > 0, θ > 0 such that
dist(x, [f ≤ 0]) ≤ τ [f(x)]++ [f (x)]θ+ , ∀x ∈ [f ≤ δ].
There exists τ′ > 0 such that
dist(x, [f≤ 0]) ≤ τ′ [f (x)]++ [f (x)]θ+ , ∀x ∈ Rn.
When we take θ = 1, this means that for a convex function, a Lipschitz error bound on the level set can be extended to a global error bound.
Proof. Let x ∈ Rn such that f (x) > δ and p = P
[f ≤0]x. It is clear that f (p) = 0. For any
λ∈ (0, 1), we denote xλ = λx + (1− λ)p. It can be seen that p = P[f ≤0]xλ and dist(xλ, [f ≤
0]) = λ dist(x, [f ≤ 0]). By convexity, we get
f (xλ)≤ λf(x) + (1 − λ)f(p) = λf(x).
We deduce that
dist(x, [f ≤ 0]) ≤ dist(xλ, [f ≤ 0])
f (xλ)
f (x).
On the other hand, by choosing λ = 2f (x)δ , we get
f (xλ)≤ λf(x) =
δ
2 < δ. Therefore, thanks to the assumption on error bounds, we obtain
dist(xλ, [f ≤ 0]) ≤ τ f(xλ) + fθ(xλ) . It follows that dist(xλ, [f ≤ 0]) f (xλ) < τ 1 + fθ−1(x λ) < c 1 + δ 2 θ−1! .
Combining the above inequalities, we get
dist(x, [f≤ 0]) ≤ τ 1 + δ 2
θ−1!
f (x).
This means that
dist(x, [f ≤ 0]) ≤ τ 1 + δ 2
θ−1!
f (x) + fθ(x) , ∀x ∈ Rn.
Combining this result with Theorem 1.4.2, we immediately obtain a result similar to [24, Theorem 3] and [49, Theorem 6].
Theorem 1.4.14. Let fi: Rn → R, (i = 1, . . . m) be continuous, convex and subanalytic func-
tions. Assume that, the set
S ={x ∈ Rn
|fi(x)≤ 0, i = 1, . . . m}
is nonempty, compact. Then, there exist τ, θ > 0 such that
dist(x, S)≤ τ [f(x)]++ [f (x)]θ+ , ∀x ∈ Rn,
where f (x) =Pm
i=1[fi(x)]+.
We remark that if fi is coercive then for all r∈ R, the set [fi≤ r] is compact.
Definition 4. [81, 78] A continuous function f on Rn is said to be a piecewise convex polyno-
mial function if there exist finitely many polyhedra P1, . . . , Pk with ∪kj=1Pj = Rn such that the
restriction of f on each Pj, denoted by fi, is a convex polynomial function. The degree of f ,
denoted by deg(f ), is defined as the maximum of deg(fj).
In [81], Li studied error bounds for a convex piecewise quadratic function. More precisely, let
f be a convex piecewise quadratic function. Then, there exists τ > 0 such that
(1.8) dist(x, [f≤ 0]) ≤ τ[f (x)]++p[f(x)]+
,∀x ∈ Rn.
By using Theorem 1.4.5 and Theorem 1.4.13, Li [77] showed that, for a convex polynomial function f on Rn with degree d, there exists τ > 0 such that
(1.9) dist(x, [f ≤ 0]) ≤ τ [f (x)]++ [f (x)] 1 κ(n,d) + ,∀x ∈ Rn.
This result is further improved by Yang [124].
Theorem 1.4.15. [124] Let f be a polynomial convex with degree d. There exists τ > 0 such
that dist(x, [f ≤ 0]) ≤ τ[f (x)]++ [f (x)] 1 d + ,∀x ∈ Rn.
The two above results (1.8), (1.9) have been extended by Li ([78]), for general convex piecewise polynomial function.
Theorem 1.4.16. [78] Let f be a piecewise convex polynomial function on Rn with degree d.
Suppose that one of the following two conditions holds: (i) If dist(x, [f ≤ 0]) → +∞ then f(x) → +∞. (ii) f is convex.
There exists c > 0 such that
dist(x, [f ≤ 0]) ≤ c [f (x)]++ [f (x)] 1 κ(n,d) + ,∀x ∈ Rn.
Let us now present a global error bound for convex polynomial function systems. In [87], under the Slater condition, Luo and Luo proved that a global Lipschitzian error bound holds for convex quadratic systems. After that, without the Slater condition, Pang and Wang in [121], showed that any systems of convex quadratic has a global error bound.
Theorem 1.4.17. [121] Let f1, f2. . . , fmbe convex quadratic functions. Assume that
S ={x ∈ Rn
|fi(x)≤ 0, i = 1, . . . m}
is not empty, then there exists a positive integer dist≤ n + 1 and a scalar c > 0 such that dist(x, S)≤ c maxk[f(x)]+k, k[f(x)]+k 1 2d ,∀x ∈ Rn, where f (x) = (fi(x))i=1...,m,∀x ∈ Rn.
Similarly Theorem 1.3.14, the latter result is extended to the Banach space in [103, Theorem 7], with
fi(x) =
1
2hAix, xi + hBi, xi + ci,
where Ai: X× X → R is a symmetric continuous bilinear and semi-definite positive, Bi ∈ X∗
and ci∈ R, for i = 1, . . . , m.
Note that this result does not hold for a general convex polynomial function system, see Example 2. However, in some particular cases, the global error bound hold for such systems.
Theorem 1.4.18. Let f1, . . . , fp be convex polynomial functions on Rn whose degrees are at
most d. Let f (x) = maxi=1,...,mfi(x), ∀x ∈ Rn. Then, the following statements are hold
1. [77] If fi(x)≥ 0, ∀x ∈ Rn, i = 1, . . . , m then there exists τ > 0 such that
dist(x, [f ≤ 0]) ≤ τ [f (x)]++ [f (x)] 1 κ(n,d) + ,∀x ∈ Rn.
2. [102] If S ={x ∈ K|f(x) ≤ 0} is a nonempty compact set, where K is a convex polyhedral in Rn. Then, there exists τ > 0 such that
dist(x, S)≤ c [f (x)]++ [f (x)] 1 κ(n,2d) + ,∀x ∈ K. 3. [102]Let K is a convex polyhedral in Rn and S =
{x ∈ K|f(x) ≤ 0} is nonempty. Assume
that, for each v∈ K∞: maxi=1,...,pf∞
i (v) = 0⇒ fi∞(v) = 0 (see (1.4)). Then, there exists
τ > 0 such that dist(x, S)≤ c [f (x)]++ [f (x)] 1 κ(n,2d) + ,∀x ∈ K,
where K∞ is recession cone of K, defined by
K∞={v ∈ Rn
Chapter 2
From error bounds to the
complexity of first–order descent
methods for convex functions
Abstract This chapter shows that error bounds can be used as effective tools for deriving complexity results for first-order descent methods in convex minimization. In a first stage, this objective led us to revisit the interplay between error bounds and the Kurdyka- Lojasiewicz (KL) inequality. One can show the equivalence between the two concepts for convex functions having a moderately flat profile near the set of minimizers (as those of functions with H¨olderian growth). A counterexample shows that the equivalence is no longer true for extremely flat functions. This fact reveals the relevance of an approach based on KL inequality. In a second stage, we show how KL inequalities can in turn be employed to compute new complexity bounds for a wealth of descent methods for convex problems. Our approach is completely original and makes use of a one-dimensional worst-case proximal sequence in the spirit of the famous majorant method of Kantorovich. Our result applies to a very simple abstract scheme that covers a wide class of descent methods. As a byproduct of our study, we also provide new results for the globalization of KL inequalities in the convex framework.
2.1
Overview and main results
A brief insight into the theory of error bounds. Since Hoffman’s celebrated result on error bounds for systems of linear inequalities [63], the study of error bounds has been successfully applied to problems in sensitivity, convergence rate estimation, and feasibility issues. In the optimization world, the first natural extensions were made to convex functions by Robinson [113], Mangasarian [93], and Auslender-Crouzeix [6]. However, the most striking discovery came years before in the pioneering works of Lojasiewicz [84, 85] at the end of the fifties: under a mere compactness assumption, the existence of error bounds for arbitrary continuous semi-algebraic functions was provided. Despite their remarkable depth, these works remained unnoticed by the optimization community during a long period (see [88]). At the beginning of the nineties, motivated by numerous applications, many researchers started working along these lines, in quest for quantitative results that could produce more effective tools. The survey of Pang [106] provides a comprehensive panorama of results obtained around this time. The works of Luo [87, 88, 89] and Dedieu [45] are also important milestones in the theory. The recent works [78, 80, 59, 79, 15]
provide even stronger quantitative results by using the powerful machinery of algebraic geometry or advanced techniques of convex optimization.
A methodology for complexity of first-order descent methods. Let us introduce the concepts used in this work and show how they can be arranged to devise a new and systematic approach to complexity. Let H be a real Hilbert space, and let f : H→ (−∞, +∞] be a proper lower-semicontinuous convex function achieving its minimum min f so that argmin f6= ∅. In its most simple version, an error bound is an inequality of the form
(2.1) ω f (x)− min f ≥ dist(x, argmin f),
where ω is an increasing function vanishing at 0 –called here the residual function–, and where
x may evolve either in the whole space or in a bounded set. H¨olderian error bounds, which are
very common in practice, have a simple power form
f (x)− min f ≥ γ distp(x, argmin f ),
with γ > 0, p ≥ 1 and thus ω(s) = (γ1s)p1. When functions are semi-algebraic on H = Rn
and “regular” (for instance, continuous), the above inequality is known to hold on any compact set [84, 85], a modern reference being [42]. This property is known in real algebraic geometry under the name of Lojasiewicz inequality. However, since we work here mainly in the sphere of optimization and follow complexity purposes, we shall refer to this inequality as to the Lojasiewicz
error bound inequality.
Once the question of computing constants and exponents (here γ and p) for a given mini- mization problem is settled (see the fundamental works [89, 78, 15, 59]), it is natural to wonder whether these concepts are connected to the complexity properties of first-order methods for minimizing f . Despite the important success of the error bound theory in several branches of optimization, we are not aware of a solid theory connecting the error bounds we consider (as de- fined in (2.1)), with the study of the complexity of general descent methods. There are, however, several works connecting error bounds with the convergence rates results of first-order methods (see e.g., [92, 97, 16, 40, 108]). See also the new and interesting work [79] that provides a wealth of error bounds and some applications to convergence rate analysis. An important fraction of these works involves “first-order error bounds”1 (see [88, 92]) that are different from those we
consider here.
Our answer to the connection between complexity and “zero-order error bounds” will partially come from a related notion, also discovered by Lojasiewicz and further developed by Kurdyka in the semi-algebraic world: the Lojasiewicz gradient inequality. This inequality, also called Kurdyka- Lojasiewicz (KL) inequality (see [23]), asserts that for any smooth semi-algebraic func- tion f there is a smooth concave function ϕ such that
k∇ (ϕ ◦ (f − min f)) (x)k ≥ 1
for all x in some neighborhood of the set argmin f . Its generalization to the nonsmooth case [21, 22] has opened very surprising roads in the nonconvex world and it has allowed to perform convergence rate analyses for many important algorithms in optimization [4, 26, 56]. In a first stage of the present paper we show, when f is convex, that error bounds are equivalent to nonsmooth KL inequalities provided the residual function has a moderate behavior close to 0 (meaning that its derivative blows up at reasonable rate). Our result includes, in particular, all power-type examples like the ones that are often met in practice2.
1
That is, involving inequalities of the type k∇f (x)k ≥ ω(dist(x, argmin f ))
2
An absolutely crucial asset of error bounds and KL inequalities in the convex world is their global nature under a mere coercivity assumption – see Section 2.6.
Once we know that error bounds provide a KL inequality, one still needs to make the connec- tion with the actual complexity of first-order methods. This is probably the main contribution in this paper: to any given convex objective f : H → (−∞, +∞] and descent sequence of the form
(i) f (xk) + akxk− xk−1k2≤ f(xk−1),
(ii) kωkk ≤ bkxk− xk−1k where ωk ∈ ∂f(xk), k≥ 1,
we associate a worst case one dimensional proximal method
αk = argmin ϕ−1(s) + 1 2ζ(s− αk) 2: s ≥ 0 , α0= ϕ−1(f (x0)),
where ζ is a constant depending explicitly on the triplet of positive real numbers (a, b, ℓ) where
ℓ > 0 is a Lipschitz constant ofϕ−1′. Our complexity result asserts, under weak assumptions
that the “1-D prox” governs the complexity of the original method through the elementary and natural inequality
f (xk)− min f ≤ ϕ−1(αk), k≥ 0.
Similar results for the sequence are provided. These ideas are already present in [20] and [18, Section 3.2]. The function ϕ−1 above –the inverse of a desingularizing function for f on a
convenient domain– contains almost all the information our approach provides on the complexity of descent methods. As explained previously, it depends on the precise knowledge of a KL inequality and thus, in this convex setting, of an error bound. The reader familiar with second- order methods might have recognized the spirit of the majorant method of Kantorovich [68], where a reduction to dimension one is used to study Newton’s method.
Deriving complexity bounds in practice: applications. Our theoretical results inaugu- rate a simple methodology: derive an error bound, compute the desingularizing function when- ever possible, identify essential constants in the descent method and finally compute the com- plexity using the one-dimensional worst case proximal sequence. We consider first some classic well-posed problems: finding a point in an intersection of closed convex sets with regular inter- section or uniformly convex problems, and we show how complexity of some classical methods can be obtained or recovered. We revisit the iterative shrinkage thresholding algorithm (ISTA) applied to a least squares objective with ℓ1regularization [44] and we prove that its complexity
is of the form O(qk) with q ∈ (0, 1) (see [97] for a pioneering work in this direction and also
[83] for further geometrical insights). This result contrasts with what was known on the subject [17, 51] and suggests that many questions on the complexity of first-order methods remain open.
Theoretical aspects and complementary results. As explained before, our paper led us to establish several theoretical results and to clarify some questions appearing in a somehow disparate manner in the literature. We first explain how to pass from error bounds to KL inequality in the general setting of Hilbert spaces and vice versa, similar questions appear in [21, 80, 79]. This result is proved by considering the interplay between the contraction semigroup generated by the subdifferential function and the L1 contraction property of this flow. These
results are connected to the geometry of the residual functions ω and break down when error bounds are too flat. This is shown in Section 2.6 by a dimension 2 counterexample presented in [23] for another purpose.
Our investigations also led us to consider the problem of KL inequalities for convex functions, a problem partly tackled in [23]. We show how to extend convex KL inequalities from a level set
to the whole space. We also show that compactness and semi-algebraicity ensure that real semi- algebraic or definable coercive convex functions are automatically KL on the whole space. This result has an interesting theoretical consequence in terms of complexity: abstract descent methods
for coercive semi-algebraic convex problems are systematically amenable to a full complexity analysis provided that a desingularizing function –known to exist– is explicitly computable.
2.2
Preliminaries
In this section, we recall the basic concepts, notation and some well-known results to be used throughout the paper. In what follows, H is a real Hilbert space and f : H → (−∞, +∞] is proper, lower-semicontinuous and convex. We are interested in some properties of the function
f around the set of its minimizers, which we suppose to be nonempty and denote by argmin f
or S. We assume, without loss of generality, that min f = 0.