Conclusiones - PROCESAMIENTO, PRESENTACIÓN Y ANÁLISIS DE LOS

CAPÍTULO IV PROCESAMIENTO, PRESENTACIÓN Y ANÁLISIS DE LOS

4.4. Conclusiones

To be able to learn the state-space dependent structure of the inverse dynamics modeling errors, we assume that the error model is identifiable in the space ofx= (q,q˙,q¨). Furthermore, since every task execution may slightly vary, we also follow an incremental learning process, meaning that each task execution generates a new training data set that can be used to update and improve our error model. Thus, our error models are indexed byk, indicating thekthlearning iteration. For

k=0, meaning that no error model exists for the task at hand, we simply assume f_iderr0 (x;θ0) =0. Given this, the total torque applied to the system is the approximate rigid body dynamics model

τrbd(xd)(if available) plus an offline learned error model fiderrk and a feedback termτfb:

τtotal=τˆrbd(xd) +fiderrk (xd;θk) +τfb. (6.8)

Here we show how the direct and indirect loss functions can be combined into one loss function that uses two different data sources. In order to do so we 1) discuss the loss functions in the context of offline error model learning, 2) show that the two loss functions create two

different training signals for the error model, and 3) use this result to combine direct and indirect learning.

6.2.1 Indirect Loss Function

We start by discussing the details of learning an error model with an indirect loss. We compute the torque command based on the current state(q,q˙)and desired accelerations ¨qd, apply this

torque, and then measure actual accelerations ¨qa. Now we know what torque command achieves

these measured accelerations and can use this data point to learn an inverse dynamics model. We collect all of these data points for one task execution, fort=1. . .T, such that we haveT data points to learn parametersθk_{, initialized with the parameters}_θk−1_.

In the indirect formulation, we try to optimize the parametersθksuch that the difference between the applied torqueτtotaland the inverse dynamics model fidatxais minimized:

L_indirect(θk) =

_∑

T_t₌₁kτ_totalt −fid(xta;θ k

)k2 (6.9)

Here we would like to utilize an approximate rigid body dynamics model (if available) and learn an error model fiderrin order to optimize the fidmodel. Notice, our approach does not require a

rigid body dynamics model, all derivations hold when assuming a constant model ˆτrbd:=0 as

well. In this case we would learn the full inverse dynamics model, not using any domain specific knowledge. However, if possible we can and should leverage the existing rigitd body dynamics model. To compute what the RBD error is at inputxt_a, we have to evaluate ˆτrbdatxtaand subtract

it from the total torque appliedτtotal, such that, in thekthlearning iteration, we optimize L_indirect(θk) =

_∑

_tT₌₁kτ_totalt −τˆrbd(xta)−fiderrk (xta;θk)k2

Thus, using the indirect learning approach we optimize f_iderrk (x;θk)on the following data set

Dk indirect={xt ←xta,y t _← τ_totalt −τˆrbd(xta)} T t=1. (6.10)

The quality of this training data set depends on how well we have tracked the task policy or trajectory. With accurate tracking behavior, one learning run should already give us a good approximation of the modeling errors. However, if tracking is bad, it very well may be that we require several learning iterations to estimate a good error model.

6.2.2 Direct Loss Function

To overcome the limitations of the indirect learning process, (N. Ratliff et al. 2016) proposes to use a direct loss (Equation 6.7) to learn modeling errors. Here we use this loss in acceleration space (N. Ratliff et al. 2016), to derive an additional data source for inverse dynamics learning. We start with Equation 6.7, drop the weighting of the acceleration error by the inertia matrixM

and instead multiply the accelerations withM

L_direct(θ) =

_∑

_tT₌₁kMq¨t_d−Mq¨t_a(θk)k2 (6.11)

_∑

_tT₌₁k(Mq¨t_d+h)−Mq¨t_a(θk)−hk2

where we have also added and subtractedh. The true dynamics modelM,his never evaluated in our loss formulation, it is merely used to derive the direct loss formulation as shown in the following. We can now summarize the first term as the true rigid body dynamics modelτrbd(q¨td),

evaluated at the desired accelerations, and we expand ¨qta(θk)as follows Ldirect(θ) =

_∑

T t=1kτrbd(q¨ t d)−Mq¨ t a(θ k )−hk2 (6.12) =

_∑

_tT₌₁kτrbd(q¨td)−MM −1_[_f id(q¨td;θ k₎₋ h]−hk2 =

_∑

_tT₌₁kτrbd(q¨dt)−(τˆrbd(q¨td) +f k iderr(xtd;θ k₎₎_k2 =

_∑

_tT₌₁k(τrbd(q¨dt)−τˆrbd(q¨td))−f k iderr(x t d;θ k )k2

where fid(q¨t_d)represents the state based rigid body dynamics and error model without a feedback

term which should ideally be zero. We now have transformed the loss on accelerations to a loss on torque commands at the input pointxt_d. Note that this transformed loss intuitively means that we want to minimize the difference between our error model f_iderrk (xt_d;θk₎ _{and the true}

modeling errorτ_errort = (τrbd(q¨td)−τˆrbd(q¨td))at inputxtd= (qt,q˙t,q¨td). While intuitively pleasing,

we unfortunately do not have access to the true modeling errorτ_errort . However, we can get an

estimate of the modeling error ˆτ_errort =τ_fbt +f_iderrk−1(xt_d;θk−1₎_{by combining the feedback term}_τt fb

and the error model f_iderrk−1from the previous task execution (similar to feedback error learning (Nakanishi et al. 2004)), which results in the following loss

L_direct(θ) =

_∑

T_t₌₁kτˆ_errort −f_iderrk (xt_d;θk)k2 _(6.13)

Similar to the indirect learning we can now construct a dataset

direct={x t_←

xt_d,yt ←τ_fbt +f_iderrk−1(xt_d;θk−1)}T

which can be used to learn or update the new error model f_iderrk . We now receive data points directly on the desired accelerations. However, also this data set’s quality depends on tracking accuracy. With low feedback gains, the initialτfbmay not really capture the errors very well,

such that the first learning iteration may only capture part of the modeling errors.

Thus, with low feedback gains, we may require multiple learning iterations to learn an accurate error model. Whereas increasing the feedback gainsgwould lead to improved tracking, increasing the fidelity of the data, at the cost of compliance. Here we simply propose to do both: useg_lowto compute the feedback termsτfb(glow)which are sent to the system, and use ghigh to compute feedback termsτfb(ghigh)which are sent to the learner. Notice, the feedback

term usingτfb(ghigh)is never applied on the system. Thus, we maintain a very compliant system

while obtaining better error data for the portion of the state space reached with theg_low. This can be helpful to break stiction or counteract high friction with fiderrafter fewer iterations which

otherwise would not be possible with traditional inverse dynamics learning approaches and low gains.

6.2.3 Joint Inverse Dynamics Learning

The key insight for our approach is that we can use the feedback term as an error estimate for the desired accelerations. Thereby the error model learning problem has two data sourcesindirect

anddirect. Both exhibit the same structure to optimize the error model. Hence, we can formulate a joint function approximation problem of the form:

Ljoint(θk) =

_∑

(x,y)∈Dk

joint

ky−f_iderrk (x;θk)k (6.15)

where data points for the actual accelerations ¨qt_afor every timesteptcan be used as well as data points for the desired accelerations ¨qt_das described by

Dk_joint=Dk_direct∪ Dk_indirect. (6.16)

In document Inteligencias múltiples en alumnos de 3º, 4º Y 5º de secundaria de un centro educativo privado en Guadalupe, la Libertad (página 127-130)