Capacidad para una buena articulación intersectorial

PERSONAS DETENIDAS REQUISITORIDAS TRASLADADAS POR LA DIVREQ PNP PERIODO 2010

OMISIÓN DE LA DIVISIÓN DE REQUISITORIAS OMISIÓN POR PARTE DEL JUZGADO

2. Consecuencias de las detenciones arbitrarias por mandato judicial en la División de Requisitorias de la PNP

2.2 Capacidad para una buena articulación intersectorial

Before formally analyzing the dynamic regret and fit for BanSaP, we assume that the following conditions are satisfied.

(as1) For everyt, the functions ft(x) and gt(x) are convex.

(as2) Functionft(x) is bounded over the setX , meaning |ft(x)| ≤ F, ∀x ∈ X ; while ft(x) and

148 (as3) For a small constantγ, there exists a constant η > 0, and an interior point ˜x∈ (1 − γ)X such thatgt(˜x)≤ −η1, ∀t.

(as4) With B:={x ∈ Rd_:_{kxk ≤ 1} denoting the unit ball, there exist constants 0 < r ≤ R such}

thatrB_{⊆ X ⊆ RB.}

Assumptions (as1)-(as2) are typical in OCO with both full- and partial-information feedback [65, 106, 52]; (as3) is Slater’s condition modified for our BCO setting, which guarantees the existence of a bounded Lagrange multiplier [13] in the constrained optimization; and, (as4) requires the action set to be bounded within a ball that contains the origin. When (as4) appears to be restrictive, it is tantamount to assuming_{X is compact and has a nonempty interior, because one} can always apply an affine transformation (a.k.a. reshaping) onX to satisfy (as4); see [52, Section 3.2].

Under these assumptions, we are on track to first provide upper bounds for the dynamic regret, and the dynamic fit of the BanSaP solver with one-point feedback.

Theorem 11 (one-point feedback). Suppose that (as1)-(as4) are satisfied, and consider the parametersα, µ, δ, γ defined in (6.11)-(6.13), and constants F , G, r, R defined in (as2)-(as4). If the dual variable is initialized byλ1= 0, then the BanSaP with one-point feedback in (6.7)-(6.8)

has dynamic regret bounded by

Regd_T _≤ R αV (x ∗ 1:T) + R2 2α+ d2_G2_R2_αT δ2 +2GδT + γGRT 1 +k¯λk +2µG 2_R2_T _(6.22)

where_{k¯λk := max}tkλtk, and the accumulated variation of the per-slot minimizers x∗t in(6.20)

is given by V (x∗_1:T) := T X t=1 kx∗t − x∗t−1k. (6.23)

In addition, the dynamic fit defined in(6.21) is bounded by

Fitd_T_≤k¯λk µ + G2√_{N T} 2β +δG √ N T +β√N Tα 2_d2_F2 δ2 +α 2_G2 k¯λk2 (6.24)

whereβ > 0 is a pre-selected constant. Furthermore, if we choose the stepsizes as α = µ = O(T−34), and the parameters δ = O(T−

4), β = T 1

4 andγ = δ/r, then the online decisions

generated by BanSaP are feasible, i.e.,x1,t ∈ X ; and also yield the following dynamic regret and

fit

Regd_T=OV (x∗_1:T)T34

For BanSaP with one-point feedback, Theorem 11 asserts that its dynamic regret and fit are upper-bounded by some constants depending on the those parameters, the time horizon, and the accumulated variation of per-slot minimizers. Interestingly, the crucial constantδ controlling the perturbation of random actions appears in both the denominator and numerator of (6.22) and (6.24), which correspond to the variance and bias of the gradient estimator. Therefore, simply setting a smallδ will not only reduce the bias, but it will also boost the variance - a clear manifestation of the that is known as bias-variance tradeoff in BCO [143]. Optimally choosing parameters implies that the dynamic fit is sub-linearly growing, and the dynamic regret is sub-linear given that the variation of the per-slot minimizer is slow enough; i.e.,V (x∗

1:T) = o(T

1 4).

Regarding BanSaP with two-point feedback, we can prove the following result that parallels Theorem 11.

Theorem 12 (two-point feedback). Consider the assumptions and the definitions of constants in Theorem 11. If the dual variable is initialized byλ1 = 0, then BanSaP with two-point feedback

has dynamic regret bounded by

Regd_T ≤ R αV (x ∗ 1:T) + R2 2α +2µG 2_R2_{T +αd}2_G2_{T +γGRT (1 +}_{k¯λk)+2δGT} _(6.26)

and has dynamic fit in(6.21) bounded by

Fitd_T_≤k¯λk µ + G2√_{N T} 2β +δG √ N T +β√N T α2d2G2+α2G2_k¯λk2. (6.27)

In this case, if we choose the stepsizes asα = µ =_O(T−12), and set the parameters as β = T 1 2,

δ =O(T−1_{), and γ = δ/r, then the online decisions generated by BanSaP are feasible, and its}

dynamic regret and fit are bounded by

Regd_T=_OV (x∗_1:T)T12

and Fitd_T =_{O T}12 (6.28)

whereV (x∗_1:T) is the accumulated variation of the per-slot minimizers x∗_t in(6.23).

Comparing with the bounds in (6.22) and (6.24), the perturbation constantδ only appears in the numerator of (6.26) and (6.27) because our gradient estimator here replies on two points. In this case, the additional function evaluation allows BanSaP to choose an arbitrarily smallδ to minimize the bias of stochastic gradient, without increasing its variance. This observation is aligned with those in BCO without long-term constraints [3, 143]. Furthermore, Theorem 12

150 establishes that the dynamic regret and fit are sub-linear ifV (x∗_1:T) = o(T12), which markedly

improves those in Theorem 11 under one-point feedback.

For the case of BanSaP withM > 2 points, slightly improved bounds can be proved without changing the order of regret and fit, but they are omitted here for brevity. In addition, the bounds in Theorems 11 and 12 can be achieved without any knowledge ofV (x∗

1:T). When the order of

V (x∗_1:T) is known, or, can be estimated a-priori, tighter regret and fit bounds can be obtained by adjusting stepsizes accordingly. Formally, we can arrive at the following corollary.

Corollary 2. Under the conditions of Theorems 11 and 12, suppose that there exists a constant ρ _{∈ [0, 1) such that the variation satisfies V (x}∗

1:T) = o(Tρ). If the stepsizes of BanSaP with

one-point feedback are chosen asα = µ =O T34(ρ−1), and the parameters are δ = O(T 1 4(ρ−1)),

β = T34(1−ρ), andγ = δ/r, then the dynamic regret and fit in (6.25) become

Regd_T=_OT14(ρ+3)

and Fitd_T =_OT14(ρ+3)

. (6.29)

Likewise, if the stepsizes of BanSaP with two-point feedback are chosen such that α = µ = O T12(ρ−1), and the parameters are δ = O(T

2(ρ−1)), β = T 1

2(1−ρ), andγ = δ/r, then the

dynamic regret and fit in(6.25) become Regd T=O T12(ρ+1) and Fitd T =O T12(ρ+1) . (6.30)

Apparently, Corollary 2 implies that sub-linear dynamic regret and fit are both possible, provided that the accumulated variation of the minimizers is growing sub-linearly (ρ < 1), and it is available to the learner in advance. It provides valuable insights for choosing optimal stepsizes in dynamic environments. Specifically, adjusting stepsizes to match the variability of the environment is the key to achieving the optimal dynamic regret and fit. Intuitively, when the variation is fast (largeρ), slowly decaying stepsizes (thus larger stepsizes) can better track the potential changes; and vice versa.

Remark10 (Optimal regret). As a special case of Theorems 11 and 12, by confining x∗₁=_{· · · = x}∗_T so thatV (x∗

1:T) = 0, the dynamic regret bounds (6.25) and (6.28) reduce to the static ones, which

correspond to O(T34) in the one-point feedback case, and to O(√T ) in the two-point case.

This pair of bounds markedly improves the regret versus fit tradeoff in [106], and matches the order of regret in [52], and [3, 50], which are the best possible ones that can be achieved by efficientalgorithms even in the BCO setup without the long-term constraints. Considering the full-information setting in [27] as a special case, the regret and fit of BanSaP outperform those in

Algorithm 9 BanSaP for edge computing 1: Initialize: primal iterates{ˆynk

1 } and {ˆz1n}, dual iterate λ1, parametersδ and γ, and proper

stepsizesα and µ. 2: fort = 1, 2 . . . do 3: form = 1, . . . , M do

4: Fog nodes perform perturbed offloading decisions to cloud{zn

m,t}, to neighbor edges

{ynk

m,t}, and locally process {ynnm,t} based on ˆxt.

5: end for

6: Fog nodes observe the (possibly multiple) losses to update (6.31) with stochastic gradients obtained via (6.32).

7: Fog nodes observe the actual demands from devices to update the dual variables (6.33). 8: end for

[27].

Remark11 (Dynamic regret). Theorems 11, 12 and Corollary 2 extend the dynamic regret analysis in [62, 71, 25] to the regime of bandit online learning with long-term time-varying constraints. Interestingly though, in the BCO setting of our interest, sub-linear dynamic regret and fit are possible to achieve when the per-slot minimizer does not vary on average, that is,V (x∗

1:T) is

sub-linearly growing withT .

6.5 Numerical tests

In this section, we demonstrate how the edge computing task can benefit from BanSaP.

In document Las detenciones arbitrarias por mandato judicial en la sede de la División de Requisitorias de la Policía Nacional del Perú, durante el período 2010 al 2014 : análisis crítico desde las políticas públicas (página 70-90)