Técnicas de procesamiento y análisis de datos

CAPITULO I: MARCO METODOLÓGICO

PORCENTAJE DE POBLACIÓN POR ESTRATO SOCIOECONÓMICO POR

1.5. Técnicas de procesamiento y análisis de datos

The generalisation from the Euclidean distance to Bregman distances is significant to optimi- sation and regularisation theory. In what follows, we briefly consider the Bregman proximal methodand show that the derivative convergence result Theorem 7.33 extends to this case under certain conditions.

Denote by J : Rn→ R a function that is C1_{-smooth on int dom J, and 1-convex.}3 _For

F = V + R given by (7.18), the iteration map for the Bregman proximal method is given by A_k(y, ϑ ) := arg min

x∈Rn

f_kJ(x, ϑ , y), f_kJ(x, y, ϑ ) := 1

τ_kDJ(x, y) + R(x, ϑ ) + ⟨∇xV(y, ϑ ), x − y⟩.

(7.29)

Algorithm 6 Bregman proximal method

Input: starting point x0_{∈ R}n, parameter ϑ ∈ Ω, time steps (τk)k∈N⊂ [ε, 1/L] for some

ε > 0

for k = 0, 1, 2, . . . do

xk+1= Ak(xk, ϑ ), Akgiven in (7.29)

end for

As before, we assume that τk→ τ. In comparison to Algorithm 5, this algorithm is more

restrictive, as there is no inertial step, i.e. ak= 0 and τk≤ 1/L. Regarding the restriction on

3_{We only consider smooth functions, since otherwise the Bregman proximal map would depend on a} subgradient choice p ∈ ∂ J(y), further complicating the algorithmic differentiation.

a_k, as is pointed out in [212], FISTA does not seem to be directly extendible to the Bregman distance setting, and while other acceleration variants have been proposed [213], we do not consider these here. Depending on the choice of J, time steps τ_kup to 2/L − ε are possible depending on the Bregman distance generating function J—see [212, Definition 4.1] and surrounding discussion.

Suppose the objective function V + R satisfies Assumption 7.23. Arguing as in Sec- tion 7.5.1 and noting the L-smoothness of V , f_kJ satisfies Assumption 7.23, treating (y, ϑ ) as the parameters. Denote by I_τJ

kV,τkR(y, ϑ ) the index set (7.13) corresponding to (7.29). The

following result is analogous to Lemma 7.31 and Proposition 7.32 for the Bregman proximal method.

Lemma 7.35. Suppose F = V + R : Rn× Ω → R is given by (7.18) and satisfies Assump- tion 7.23. Then the Bregman proximal mappingA_k(y, ϑ ) in (7.29) is piecewise smooth in both arguments, with differential DA_k(y, ϑ ) = [∇xAk(y, ϑ ), DϑAk(y, ϑ )]T having a minimal

local representation of    " (∇2_M_iJ(x) + τk∇2MiR(x, ϑ )) †_(∇2_{J(y) − τ} k∇2xV(y, ϑ )) −τk(∇2MiJ(x) + τk∇ 2 MiR(x, ϑ )) †_(D ϑ∇MiR(x, ϑ ) + Dϑ∇xV(y, ϑ )) #   i∈IJ k(y,ϑ ) , (7.30) where x= A_k(y, ϑ ).

Furthermore, if (ND) holds for F(·, ϑ ) at x∗, thenA_kis locally continuously differentiable near(x∗, ϑ ).

Proof. Piecewise smoothness follows from Theorem 7.29 applied to f_kJ(x, y, ϑ ).

For the second part, it is sufficent to show that 0 ∈ ri ∂xf_kJ(x∗, ϑ , x∗) and apply [130,

Theorem 5.7]. We have

∂xf_kJ(x∗, ϑ , x∗) =

1 τk

(∇J(x∗) − ∇J(x∗)) + ∂xR(x∗, ϑ ) + ∇xV(x∗, ϑ ) = ∂xF(x∗, ϑ ),

and the proof is complete.

Theorem 7.36. Let the function F ≡ V + R : Rn× Ω → R be given by (7.18) and suppose it satisfies Assumption 7.23. Furthermore, suppose for ϑ ∈ Ω that the iterates xk(ϑ ) given by Algorithm 6 converges to a minimiser x∗∈ int dom J of F(·, ϑ ), and that (ND) holds for F(·, ϑ ) at x∗. Then the sequence of (semi)derivatives Dxk(ϑ ) converges linearly to the single-valued limit Dx(ϑ ).

7.5 Algorithmic differentiation 163

Proof.We argue along the same lines as in the proof to Theorem 7.33. Let M ⊂ Rnbe a smooth manifold such that F is partly smooth at (x∗_{, ϑ ) relative to M × R}n. By Lemma 7.35, there is K ∈ N such that for all k ≥ K, fkis continuously differentiable near gk(xk, ϑ ).

Applying (7.20) to (7.29), we have

Dxk+1(ϑ ) = AkDxk(ϑ ) + bk, (7.31)

where

A_k:= ∇xfkJ(xk, ϑ ), bk:= Dϑ fkJ(xk, ϑ )

Write fJ:= limk→∞fkJ. By Lemma 7.35, there is K ∈ N such that for all k ≥ K, the iterations

A_k(xk, ϑ ) are locally continuously differentiable, and we have

A_k→ (∇2_MJ(x∗) + τ∇2_MR(x∗, ϑ ))†(∇2J(x∗) − τ∇2_xV(x∗_{, ϑ )) =: A ∈ R}n,n, bk→ −τ(∇2_MJ(x∗) + τ∇2_MR(x∗, ϑ ))†(D_ϑ∇MR(x∗, ϑ ) + Dϑ∇xV(x

∗

, ϑ )) =: b ∈ Rn,m. Write for shorthand

MJ := ∇2MJ(x∗), MR:= ∇M2 R(x∗, ϑ ), MV := ∇2xV(x∗, ϑ ),

so that A = (MJ+ τMR)†(MJ− τMV).

We need to show that ρ(A) < 1. Suppose Ax = λ x for some x ∈ Cn, λ ∈ C \ 0. Note that any eigenvector x of A must lie in the subspace Tx∗M, so the spectrum of A in Rncoincides

with its spectrum restricted to Tx∗M. Furthermore, restricted to this subspace, A satisfies the

conditions for Proposition 2.8, meaning λ ∈ R. Since x ∈ Tx∗M, we can rearrange λ x = Ax to get

(1 − λ )MJx= τ(λ MR− MV)x

Taking the inner product on each side with respect to x, we get

(1 − λ )⟨x, MJx⟩ = τλ ⟨x, MRx⟩ + τ⟨x, MVx⟩. (7.32)

By strong convexity of F and J, there is µ, ν ≥ 0 with µ + ν > 0 such that ⟨x, MJx⟩ ≥ ∥x∥2,

τ ⟨x, MRx⟩ ≥ τν∥x∥2, and τ⟨x, MVx⟩ ∈ [ε µ∥x∥2, ∥x∥2]. One can then verify that for (7.32) to

Therefore, by Lemma 7.35, Dxk(ϑ ) converges linearly to (I − A)−1b. It remains to show that (I − A)Dx(ϑ ) = b. Writing

D_V = Dϑ∇MV(x(ϑ ), ϑ ), DR= Dϑ∇MR(x(ϑ ), ϑ ), we have (I − A)Dx(ϑ ) = − I− (MJ+ τMR)†(MJ− τMV) (MV + MR)†(DV+ DR) = −τ(MJ+ τMR)†(DV + DR) = b.

This concludes the proof.

As mentioned earlier, we do not consider nonsmooth Bregman distance generating functions J : Rn→ R, as this would involve differentiation with respect to an additional variable, namely subgradients pk∈ ∂ J(xk_{). We therefore leave this for future research.}

Second, in Theorem 7.36, we assume that x∗∈ int dom J. This ensures that A_kconverges to a unique limit. However, this assumption does not hold in general, including for some popular Bregman distances such as the Kullback–Leibler divergence DJ(x, y) = x(log x − log y) −

(x − y) generated by the entropy function J(x) = x log x (in one dimension). Furthermore, as was demonstrated in [166, 167], one can achieve iterative methods that solve nonsmooth variational methods, yet whose iterative map A(x, ϑ ) is continuously differentiable, provided the nonsmoothness can be expressed as convex constraints that coincide with cl dom J. In these settings, one expects x∗∈ dom J./

While we do not prove convergence results for the case where x∗∈ int dom J, we show/ for a simple example with the Kullback–Leibler divergence that the algorithmic iterates Dxk do converge to the implicit derivative Dx even when x∗= 0 /∈ dom J.

Example 7.37. Consider a simple example x(ϑ ) = arg min

x∈Rn

V(x, ϑ ) + δ≥0(x),

and J(x) = ∑n_i=1x_ilog xi. The Bregman distance is the Kullback–Leibler divergence given by

D_J(x, y) =

∑

i=1

x(log x − log y) − (x − y).

We assume that x0_{∈ R}nis such that{x : V (x) ≤ V (x0_{)} ⊂ [0, 1]}n_{, as J this ensures that J is}

7.5 Algorithmic differentiation 165

For τk∈ [ε, 1/L], the iterates of Algorithm 6 yield the updates

xk+1(ϑ ) = xkexp(−τ∇V (xk(ϑ ), ϑ )) → x(ϑ ) =: x∗. We differentiate this with respect to ϑ and obtain

Dxk+1(ϑ ) = Dxk(ϑ ) exp(−τ∇V (xk(ϑ ), ϑ )) − xkD_ϑ

exp(−τ∇V (xk(ϑ ), ϑ ))

whereexp is applied element-wise to the vectors. For each i, if xk_i → 0, then [Dxk+1(ϑ )]i= Dxk(ϑ ) exp(−τ∇iV(xk(ϑ ), ϑ )) + O(∥xk∥).

In this case, the condition(ND) holds if and only if, for each i such that x∗_i = 0, one has [∇V (x∗, ϑ )]i> 0. In this case, we see that [Dxk]i→ 0 linearly. In conclusion, we have

Dxk(ϑ ) → Dx(ϑ ).

In document La influencia ciudadana en la decisión pública, y la influencia de los factores cultura política, y apoyo y recursos de la administración en la participación ciudadana local en Bogotá (página 40-46)