3 ENFERMEDAD DE CROHN
ANTICUERPOS MONOCLONALES ANTI-TNF
5. APOYO SOCIAL
5.5. Efectos del apoyo social sobre la salud
4.2.1 The MUSTARD algorithm
Putting together the ideas of multi-step inertial methods and variable metric ones, we here propose a new operator splitting algorithm: variable metic MUlti-Step inerTial operAtoR splitting methoD, which is dubbed as “MUSTARD” for solving (Pinc) and (Popt). The details of the algorithm for solving
Chapter 4 4.2. Variable metric multi-step inertial operator splitting Algorithm 7: The MUSTARD algorithm
Initial: s ∈ N+ andS def= {0, ...,s − 1}, letMν as defined in (4.1.5). x0 ∈ H, x−i = x0,i ∈ S. Choose , > 0 such that ≤ 2βν − .
repeat
Let {ai,k}i∈S, {bi,k}i∈S ∈ ] − 1, 2]s,Vk∈Mν and γk∈ [, 2βν − ]:
ya,k = xk+Pi∈Sai,k(xk−i − xk−i−1), yb,k= xk+Pi∈Sbi,k(xk−i − xk−i−1), xk+1 =JγkV−1 k A ya,k − γkV−1k B(yb,k). (4.2.1) k = k + 1; until convergence;
Remark 4.2.1. The main reasons of considering the monotone inclusion (Pinc) instead of the opti-
mization problem (Popt), and involving variable metric are as followings:
(i) Problem (Popt) is only a special case of (Pinc). In particular, for various operators splitting
methods (e.g. non-relaxed DR, GFB, and Primal–Dual splitting), their fixed-point iterations can be written as certain monotone inclusion problems, while the involved monotone operators are not sub-differential of convex functions.
(ii) As we have seen from the examples of Primal–Dual splitting methods (e.g. (2.4.14) in Section 2.4.3.3 and (3.4.17) in Section 3.4.3), their corresponding monotone inclusion formulations are under different metric (i.e. metricV for the two examples).
As a result, considering the monotone inclusion problem and involving the variable metric allow us to extend the MUSTARD algorithm to a broad class of operator splitting methods.
Relation to previous work By form, the MUSTARD Algorithm 7 is the most general variable metric Forward–Backward splitting we are aware of, it is brand new to the literature for s ≥ 2. For the case s = 1, it is more general than Algorithm6 as variable metric is considered, and recovers the variable metric Forward–Backward proposed in [62] ifs = 0.
If we chooses = 1 and Vk = Id, then based on the choice of the inertial parameters ak and bk, the relations between Algorithm7 with the aforementioned work are as following,
• ak= 0, bk = 0: this is the original FB method [119,140].
• ak∈ [0, ¯a], bk= 0: this is the case studied in [131] for (Pinc). In the context of optimization with
R = 0, one recovers the heavy ball method [146].
• ak∈ [0, ¯a], bk= ak: this corresponds to the work of [120] for solving (Pinc). If moreover restrict
γk ∈]0, β] and let ak → 1, then Algorithm6 specializes to FISTA-type methods [24,50,13,12] developed for optimization.
• ak∈ [0, ¯a], bk ∈]0, ¯b], ak6= bk: the general inertial FB scheme Algorithm6.
Below we also highlight the several important characteristics of the MUSTARD algorithm.
(i) Similarly to some existing inertial methods [131,120,115], the choice of the step-size γkallowed by MUSTARD is ]0, 2β[ if Vk≡ Id;
(ii) The algorithm allows multiple steps, which is characterized bys. In particular for s = 2, we show that very promising practical results can be obtained (see Section4.5).
(iii) We allow to use negative inertial parameters. Fors = 1, the inertial parameters should be positive and lie in [0, 1[ to ensure convergence. However, fors ≥ 2, we will show that one can benefit from negative choices of the inertial parameters. In particular, for s = 2, the numerical experiments
of Section4.5 implies that a good choice of the inertial parameters should be a0,k, b0,k ∈]0, 2] and a1,k, b1,k ∈] − 1, 0].
Such an inertial setting can be investigated through the dynamical system perspective, see below for a short introduction.
(iv) As the problem we are considering is the general monotone inclusion problem, we can generalize the MUSTARD algorithm to the methods whose iterations are related to some monotone inclusion problem, for example the non-relaxed DR, GFB and several Primal–Dual splitting methods. MUSTARD as a discretised dynamical system Now consider the metric free case of MUSTARD algorithm for the optimization problem (Popt) with V = Id. Consider the following second-order dynamical system
¨
x(t) + c0(t) ˙x(t) + (∂R + ∇F )(x(t)) = 0, (4.2.2) where c0(t) ≥ 0 is an asymptotically vanishing viscous damping function. Typically, c0(t) moderately decreases to 0, i.e. limt→+∞c0(t) = 0 and
R
tc0(t) = +∞.
Let 0 < ω2 < ω1 be two weights such that ω1+ ω2 = 1, h > 0 be the time step-size, tk = kh and xk = x(tk). Consider an implicit (Euler backward) discretization w.r.t. ∂R and an explicit (Euler forward) discretization w.r.t. ∇F , and a weighted sum of explicit and implicit discretization of ¨x(t), i.e.
0 ∈ ω1
h2(xk+1− 2xk+ xk−1) +hω22(xk− 2xk−1+ xk−2) + c0(kh)
h (xk− xk−1) + ∂R(xk+1) + ∇F (yb,k), then we obtain the following inclusion
xk+ a0,k(xk− xk−1) + a1,k(xk−1− xk−2) − γ∇F (yb,k) ∈ xk+1+ γ∂R(xk+1), (4.2.3) where we have a0,k = 1 − ω2 ω1 − hc0(kh) ω1 , a1,k = ω2 ω1 and γ = h 2 ω1. If we moreover set yb,k= xk+ b0,k(xk− xk−1) + b1,k(xk−1− xk−2),
with b0,k, b1,k being properly chosen, then we obtain the MUSTARD scheme for the case s = 2 and Vk≡ Id.
If we choose c0(kh) = d
kh, d > 3, then (4.2.3) simplifies to the following inclusion
xk+ 1 − ωd 1k (xk− xk−1) −ωω2 1 (xk− 2xk−1+ xk−2) − γ∇F (yb,k) ∈ xk+1+ γ∂R(xk+1). If we further let ω1= 1, ω2= 0 and yb,k= xk+ bk(xk− xk−1), bk ∈ [0, 1],
then we recover a special case of Algorithm6. If one moreover sets bk = ak= (1 − d/k), then we obtain the FISTA scheme as studied in [160,11,50].
4.2.2 Global convergence of MUSTARD
In this section, we present the global convergence analysis for the MUSTARD algorithm. We summarize our results as follows:
(i) s ≥ 2 : In Theorem 4.2.3, we establish conditional convergence of the sequence {xk}k∈N for fixed metric V, where the terminology “conditional convergence” refers to the fact that for the convergence of the sequence to occur, the sequences {ai,k}i∈S, {bi,k}i∈S has to be chosen depending (conditionally) on the sequence {xk}k∈N in such a way that an appropriate condition holds, e.g. (4.2.5). Unfortunately, so far for the cases ≥ 2 we only have a result for the case of fixed metric Vk≡V. However, it is sufficient to cover many algorithms of interest.
Chapter 4 4.2. Variable metric multi-step inertial operator splitting (ii) s = 1 :
(a) in Theorem 4.2.5 we manage to prove conditional convergence of {xk}k∈N for a variable metric Vk.
(b) We also devise choices of the inertial parameters and metrics that are independent of {xk}k∈N and still guarantee global convergence (see Theorem 4.2.9). We dub this unconditional convergence.
All the proofs of the above results are gathered in Section4.6.
For the sake of generality, we consider the inexact version of the MUSTARD algorithm. The following definition is needed.
Definition 4.2.2 (ε-enlargement). Let A : H⇒ H be a set-valued maximal monotone operator and ε ≥ 0. Then the ε-enlargement of A is defined by,
Aε(x)def= v ∈ H, hu − v, y − xi ≥ −ε, ∀y ∈ H, u ∈ A(y) .
From the definition, it is easy to verify that for 0 ≤ ε1 ≤ ε2we have Aε1(x) ⊂ Aε2(x) and A0(x) = A(x). Thus Aε is an enlargement of A.
For the updating step of xk+1 in (4.2.1), consider the following inexact form ya,k− γk V−1k B(yb,k) + ξk − xk+1 ∈ γkV−1k Aεk(xk+1),
where ξk ∈ H is the error in the evaluation of the gradient operator B, and εkis the enlargement error. Then we obtain the inexact form of MUSTARD
ya,k= xk+Pi∈Sai,k(xk−i − xk−i−1), yb,k= xk+Pi∈Sbi,k(xk−i− xk−i−1), ya,k− γk(V−1k B(yb,k) + ξk) − xk+1∈ γkV−1k A
εk(x
k+1).
(4.2.4)
4.2.2.1 Conditional convergence
We present first the conditional convergence of the inexact MUSTARD algorithm. For each i ∈ S, define ζi,k def= ai,k−γ2βνkbi,k and ai
def
= supk∈N|ai,k|.
Theorem 4.2.3 (Conditional convergences ≥ 2). For the inexact MUSTARD iteration (4.2.4), let conditions (A.1)-(A.3) hold, fix the metricVk ≡V ∈ Mν, and let ξk ≡ 0. Suppose that the following two conditions hold
(i) the error {εk}k∈N∈ `1+;
(ii) the inertial parameters {ai,k}i∈S are such that P
i∈Sai < 1.
Then the generated sequence {xk}k∈Nis bounded. If moreover the following summability condition holds
P
k∈Nmax maxi∈S ζ 2
i,k, maxi∈S |bi,k|, max i∈S |ai,k|
P
i∈S ||xk−i − xk−i−1|| 2
< +∞, (4.2.5) then there exists an x?∈ zer(A + B) such that the sequence {x
k}k∈N weakly converges to x?. The proof of the theorem can be found in Section4.6 from page83.
Remark 4.2.4. If the inertial parameters {ai,k}i∈S, {bi,k}i∈S are chosen in [0, 1] such that ζi,k2 = ai,k− γkb2βνi,k
2 ≤ a2i,k then condition (4.2.5) simplifies to
P
k∈Nmax maxi∈S |bi,k|, maxi∈S |ai,k|
P
i∈S||xk−i− xk−i−1|| 2
Condition (4.2.5) can be enforced by a simple online updating rule such as, for each i ∈ S, given ai, bi ∈ [0, 1],
ai,k = minai, ca,i,k , bi,k = minbi, cb,i,k , (4.2.6) where ca,i,k, cb,i,k> 0, and max{ca,i,k, cb,i,k}P
i∈S||xk−i− xk−i−1||2 is summable. For instance, one can choose
ca,i,k = ca,i
k1+δP
i∈S||xk−i− xk−i−1||
2, ca,i > 0, δ > 0, and similarly for cb,k.
Whens = 1, then we have the following theorem with a variable metricVk. To lighten the notations, for s = 1, we denote ak = a0,k and bk = b0,k.
Theorem 4.2.5 (Conditional convergence s = 1). For the inexact MUSTARD iteration (4.2.4), let conditions (A.1)-(A.3) hold. Suppose that the following conditions are satisfied
(i) For the metric sequence {Vk}k∈N ∈ Mν with ν > 0, suppose that there exists a non-negative sequence {ηk}k∈N∈ `1
+ such that µ = sup k∈N
||Vk|| < +∞ and (1 + ηk)Vk < Vk+1. (ii) the inertial parameter is ak ∈ [0, 1] such that ¯cdef= supk∈Nak(1 + ηk−1) < 1, and
sup k∈N 1 1 − ¯c
P
k m=1 (1 − ¯ck−m+1)η m 1 + ηm < 1. (4.2.7)(iii) the errors {εk}k∈N∈ `1+ and {||ξk||}k∈N∈ `1+.
Then the generated sequence {xk}k∈Nis bounded. If moreover the following summability condition holds
P
k∈Nmax{ak, bk}||xk− xk−1||
2 < +∞, (4.2.8)
then there exists an x?∈ zer(A + B) such that the sequence {xk}k∈N weakly converges to x?. The proof of the theorem can be found in Section4.6 from page86.
Remark 4.2.6.
(i) If the sequence {ηk}k∈N satisfies P
k∈Nηk < 1 − ¯c, then condition (4.2.7) is in force. Given ¯
c ∈ [0, 1[, let δ, κ > 0, and set ηk as
ηk= k1+δκ .
Then for fixed δ, (4.2.7) can be met with a proper choice of κ;
(ii) If ak ≥ bk, then (4.2.8) recovers the conditions in [131,120] for the conditional convergence of {xk}k∈N.
An empirical choice of the inertial parameters We introduce two empirical ways to set up the inertial parameters. For the sake of simplicity, let Vk ≡ Id, hence ν = 1. Consider the constant parameter setting,
γ ∈]0, 2β[ and bi = ai ∈] − 1, 2[, i ∈ S. Moreover, let (ai)i∈S be monotone non-increasing, i.e. a0≥ a1 ≥ · · · ≥ as−1.
Summarizing from multi-step inertial Forward–Backward and gradient descent, we obtain the fol- lowing two empirical bounds for the summandP
i∈Sai: “Upper bound 1”: P iai ∈0, min 1,2β − γγ , “Upper bound 2”: P iai ∈0, min 1,2|β − γ|2β − γ . (4.2.9)
In practice, to ensure the convergence of the generated sequence {xk}k∈N, these two bounds should be applied together with the online updating rule of inertial parameters (4.2.6). Most of the time, with proper choice of each ai, (4.2.6) may never be triggered.
Chapter 4 4.2. Variable metric multi-step inertial operator splitting Remark 4.2.7. Compare (4.2.9) with (ii) of Theorem 4.2.3, the main difference is that here we consider the summand with signs. This means that we can choose positive inertial parameters bigger than 1, and then compensate with negative ones. As a matter of fact, as we will see in the numerical experiment, negative inertial parameter would make the convergence even faster. For instance, for the cases = 2 withP
iai being fixed, then the choice a1< 0 < a0 may outperform the one with a1, a0 ≥ 0.
The two upper bounds are shown graphically in Figure4.1. It can be observed that for γ ≤ β, the largest value that can be allowed is 1, which corresponds to the choice of FISTA method whose inertial parameter tends to 1 as k → +∞. 0 0.5 1 1.5 2 .=- 0 0.2 0.4 0.6 0.8 1 'i ai min ; 1;2- ! . . < min ; 1; 2- ! . 2j- ! .j <
Figure 4.1: Two empirical upper bounds for the sum of inertial parameters P
iai: “Upper bound 1”, P
i∈Sai ∈0, min 1, 2β−γ
γ ; “Upper bound 2”, Pi∈Sai ∈0, min 1,2|β − γ|2β − γ .
Remark 4.2.8.
(i) Between the two bounds in (4.2.9), “Upper bound 2” is much less stringent than “Upper bound 1”. (ii) For inertial Forward–Backward, P
iai too close to 1 is not a good choice. Such an observation (e.g. from the numerical experiments in Section 4.5 ) coincides with the existing studies on FISTA (e.g. the local oscillation). A theoretical explanation for such behaviours of P
iai too close to 1 is left to Chapter6.
All the above remarks will be made clear in the numerical experiment section, typically for the multi- step inertial Forward–Backward splitting method.
Lastly, it should be emphasised that the two empirical bounds in (4.2.9) are designed for multi-step inertial FB, gradient descent, and the original PPA method. They may not work for the other inertial schemes. As a matter of fact, as we will see in the numerical experiments of Section4.5, the choices of inertial parameters for inertial Douglas–Rachford and Chambolle-Pock Primal–Dual splitting method [51] are rather limited. Moreover, compared to inertial Forward–Backward and gradient descent, the gains of inertia in DR and Primal–Dual splitting are very small.
The reasons underlying such differences on the acceleration brought by inertia to different algorithms is quite complicated to justify in general. However, it can be explained partly through the local linear convergence analysis as we will describe in Chapters6-8.
4.2.2.2 Unconditional convergence
Besides the conditional convergence, we can devise choices of {{ai,k}i∈S}k∈N and {{bi,k}i∈S}k∈N that are independent of {xk}k∈N, and still guarantee the global convergence. We dub this unconditional convergence. The following result generalizes those in [5,131,120,115].
For the unconditional convergence of Algorithm7, we restrict ourselves to the cases = 1.
Theorem 4.2.9 (Unconditional convergence). For the inexact MUSTARD iteration (4.2.4), let conditions (A.1)-(A.3) hold. Suppose that the following conditions are satisfied
(i) For the metric sequence {Vk}k∈N ∈ Mν for ν > 0, suppose that there exists a non-negative sequence {ηk}k∈N∈ `1
+ such that µ = sup k∈N
||Vk|| < +∞ and (1 + ηk)Vk < Vk+1.
(ii) choose the inertial parameters ak, bk ∈ [0, 1], such that (4.2.7) holds and moreover there exists τ > 0 and 1 + ak−2βνγk (1 + bk)2+ ηk−1bk(bk+ 1) ≥ τ : ak≤ 2βνγk bk, 1 − (3 + 2ηk)ak−2βνγk (1 + bk)2+ ηk−1bk(bk− 1) ≥ τ : ( bk≤ ak, or γk 2βνbk≤ ak< bk, (4.2.10)
(iii) the errors are {εk}k∈N∈ `1
+ and {||ξk||}k∈N∈ `1+. Then P
k∈N||xk− xk−1||2 < +∞, and there exists x? ∈ zer(A + B) such that the sequence {xk}k∈N converges weakly to x?.
See Section 4.6 for the proof from page90. When the metric Vk is fixed, i.e. Vk ≡V ∈ Mν, then ηk≡ 0 and condition (4.2.10) simplifies to
1 + ak−2βνγk (1 + bk)2 ≥ τ : ak≤ 2βνγk bk, 1 − 3ak−2βνγk (1 + bk)2 ≥ τ : bk ≤ ak or 2βνγk bk ≤ ak < bk. (4.2.11)
Figure 4.2shows graphically the conditions (4.2.11). We choose τ = 0.01 and two different choices of γ are considered. It can be observed that with γ becoming bigger, the range of a, b in (4.2.10) becomes smaller. Moreover, compared to the empirical choice of inertial parameters, the allowed choices by (4.2.10) are quite conservative. For instance, for the case bk≡ ak ≡ a and γ = βν, the biggest value can be allowed is a ≡√5 − 2. In comparison, when B = 0, bk vanishes and the upper bound of ak is 1/3 which coincides with the result of [4,5].