Apoyo al aprendizaje y al desarrollo infantil

Defining the QFE parameter estimation error E

τ,γ( ˜Θk) = Eτ,γ(Θk − ˆΘk), the error

dynamics using (22), (24) can be represented as E τ,γ( ˜Θ 0 kl+1)= E_τ,γ( ˜Θ j k + W_kjZ_kj,eΞ_Bj,eT(k) 1 + Zej,T k W j kZ j,e k ), k = k_l0 (27) E τ,γ( ˜Θ j+1 kl )= E_τ,γ( ˜Θ j k + W_kjZ_kj,eΞ_Bj,eT(k) 1 + Zej,T k W j kZ j,e k ), k0 l < k < kl0+1 (28)

Remark 8 When there is no data-loss, the Q-function estimator is updated and the control policy is updated as soon as it is computed. This requires the broadcast scheme to generate an acknowledgment signal whenever the packets are successfully received at the subsystems [22]. A suitable scheduling protocol has to ensure that the data lost in the network is kept minimal.

Next an event-sampling condition has to be selected for the proposed scheme to work.

Consider a quadratic function fi_(k) _{= ¯x}

i(k)TΓi¯xi(k), with Γi > 0, for the it h subsystem.

The event-sampling condition should satisfy

fi(k) ≤ λ fi(kl+ 1), ∀k ∈ [kl+ 1, kl+1), (29)

for stability, when λ < 1, as shown in the next section.

Remark 9 The event-sampling condition presented here depends only on the local sub-

system state information. The Lyapunov function based event-sampling condition is also presented in [23] for a single system. The hybrid learning algorithm presented in this paper is independent of the event-sampling condition.

The following result will be used to prove the stability of the closed-loop system during the learning period.

Lemma 2 Consider the system in (5) and the QFE (27). Define ˜U(kl−1)= U(kl−1)− ˆU(kl−1)

and ˜Gux_k

l−1 = G

ux k_l−1− ˆG

k_l−1. If the control policy is updated such that, whenever ˆG uu

k_l−1− R¯x is

not positive definite or ˆGuu_k

l−1 is singular, ˆG

kl−1is replaced by R¯x in the control policy, then

E τ,γ( ˜U(kl−1)) ≤ Eτ,γ{2 R −1 ¯x G ux k_l−1 ¯X_k_l−1 + R−_¯x1 ˜G ux k_l−1 ¯X_k_l−1 } (30)

Proof: See Appendix.

Definition 1 [29] A regression vectorϕ(xk) is said to be persistently exciting if there exists

positive constantsδ,α₋,¯α and kd ≥ 1 such that α₋I ≤ Í_kk+δ_=k_dϕ(xk)ϕT(xk) ≤ ¯αI, where I is

Lemma 3 Consider both the QFE in (27) with an initial admissible control policy U₀∈ <m. Let the Assumption 1-4 hold, and the QFE parameter vector ˆΘ(0) be initialized in a compact

set ΩΘ. When the QFE is updated at the event-sampling instants using (22),(23) and during

the inter-sampling period using (24),(25), the QFE parameter estimation error E

τ,γ( ˜Θ j kl) is bounded. Under the assumption that the regression vectorξ_kj

l satisfies the PE condition, the QFE parameter estimation error ˜Θ_kj

l for all ˆΘ(0) ∈ ΩΘ converges to zero asymptotically in the mean square, with event-sampled instants kl → ∞.

Proof: See Appendix.

Remark 10 Covariance resetting technique [29] is used to reset W whenever W ≤ Wmin.

This condition will also be used in the Lyapunov analysis to ensure stability of the closed- loop system. With the covariance resetting, the parameter convergence proof in Lemma 3 will still be valid [29].

Next, the Lyapunov analysis is used to derive the conditions for the stability of the closed-loop system, with the controller designed in this section.

Theorem 1 Consider the closed-loop system (5), parameter estimation error dynamics (37)

along with the control input (20). Let the Assumptions 1-4 hold, and let U(0) ∈ Ωu be an

initial admissible control policy. Suppose the last held state vector, ¯X_ke, j

l , and the QFE parameter vector, ˆΘ_kj

l are updated by using, (22),(23) at the event-sampled instants, and (24),(25) during the inter-sampling period. Then, there exists a constantγ_min > 0 such that the closed-loop system state vector ¯X_kj

l for all ¯X(0) ∈ Ωx converges to zero asymptotically in the mean square and the QFE parameter estimation error ˜Θ_kj

l for all ˆΘ(0) ∈ ΩΘremains bounded. Further, under the assumption that the regression vector ξ_kj

l satisfies the PE condition, the QFE parameter estimation error ˜Θ_kj

l for all ˆΘ(0) ∈ ΩΘ converges to zero asymptotically in the mean square, with event-sampled instants kl → ∞, provided the

inequalityγ_min > µ + ρ₁is satisfied. γmin, µ, ρ1are positive constants, defined in the proof.

The evolution of the Lyapunov function is depicted in Fig. 2.2a. During the event- sampled instant, due to the updated control policy (21), the Lyapunov function decreases. Due to the event-sampling condition (39) and the iterative learning within the event-sampling instants, the Lyapunov function decreases during the inter-sampling period.

Since the iterative learning does not take place in the time-driven Q-learning [6], the first difference of the parameter estimation error is zero for the inter-event period. This makes the Lyapunov function negative semi-definite during this period. The evolution of the Lyapunov function is depicted in Fig. 2.2b for the time-driven Q-learning.

Remark 11 The design constants R_¯x, Wmin, W0are selected based on the inequalities that

are analytically derived in Theorem 1 using the bounds on A_¯xk, B_¯xk, Sk. Then, the constants

Γand ¯Π can be found to ensure closed-loop system stability.

Remark 12 The requirement of PE condition is necessary so that the regression vector

is non-zero until the parameter error goes to zero. By satisfying the PE condition in the regression vector, the expected value of the parameter estimation error ˜Θ_k will converge to zero. This PE signal is viewed as the exploration signal in the reinforcement learning literature [4].

Remark 13 An initial identification process can be used to obtain the nominal values of

A_¯xk, B_¯xk which can be used to initialize the Q-function parameters.

(a) (b)

Remark 14 The algorithm proposed in this section can be used as a time-driven Q-learning scheme by not performing the iterative learning between the event-sampling instants, in stochastic framework. Also, if the iteration index, j → ∞, for each kl, the algorithm

becomes the traditional policy iteration based ADP scheme.

The event-sampling and broadcast algorithm for the subsystems followed by the proposed hybrid learning algorithm is summarized next.

In document INFANCIAS EN LA CIUDAD DE MÉXICO 2020 (página 74-79)