CAPÍTULO IV PROCESAMIENTO, PRESENTACIÓN Y ANÁLISIS DE LOS
4.4. Conclusiones
To be able to learn the state-space dependent structure of the inverse dynamics modeling errors, we assume that the error model is identifiable in the space ofx= (q,q˙,q¨). Furthermore, since every task execution may slightly vary, we also follow an incremental learning process, meaning that each task execution generates a new training data set that can be used to update and improve our error model. Thus, our error models are indexed byk, indicating thekthlearning iteration. For
k=0, meaning that no error model exists for the task at hand, we simply assume fiderr0 (x;θ0) =0. Given this, the total torque applied to the system is the approximate rigid body dynamics model
ˆ
τrbd(xd)(if available) plus an offline learned error model fiderrk and a feedback termτfb:
τtotal=τˆrbd(xd) +fiderrk (xd;θk) +τfb. (6.8)
Here we show how the direct and indirect loss functions can be combined into one loss function that uses two different data sources. In order to do so we 1) discuss the loss functions in the context of offline error model learning, 2) show that the two loss functions create two
different training signals for the error model, and 3) use this result to combine direct and indirect learning.
6.2.1 Indirect Loss Function
We start by discussing the details of learning an error model with an indirect loss. We compute the torque command based on the current state(q,q˙)and desired accelerations ¨qd, apply this
torque, and then measure actual accelerations ¨qa. Now we know what torque command achieves
these measured accelerations and can use this data point to learn an inverse dynamics model. We collect all of these data points for one task execution, fort=1. . .T, such that we haveT data points to learn parametersθk, initialized with the parametersθk−1.
In the indirect formulation, we try to optimize the parametersθksuch that the difference between the applied torqueτtotaland the inverse dynamics model fidatxais minimized:
Lindirect(θk) =
∑
Tt=1kτtotalt −fid(xta;θ k)k2 (6.9)
Here we would like to utilize an approximate rigid body dynamics model (if available) and learn an error model fiderrin order to optimize the fidmodel. Notice, our approach does not require a
rigid body dynamics model, all derivations hold when assuming a constant model ˆτrbd:=0 as
well. In this case we would learn the full inverse dynamics model, not using any domain specific knowledge. However, if possible we can and should leverage the existing rigitd body dynamics model. To compute what the RBD error is at inputxta, we have to evaluate ˆτrbdatxtaand subtract
it from the total torque appliedτtotal, such that, in thekthlearning iteration, we optimize Lindirect(θk) =
∑
tT=1kτtotalt −τˆrbd(xta)−fiderrk (xta;θk)k2Thus, using the indirect learning approach we optimize fiderrk (x;θk)on the following data set
Dk indirect={xt ←xta,y t ← τtotalt −τˆrbd(xta)} T t=1. (6.10)
The quality of this training data set depends on how well we have tracked the task policy or trajectory. With accurate tracking behavior, one learning run should already give us a good approximation of the modeling errors. However, if tracking is bad, it very well may be that we require several learning iterations to estimate a good error model.
6.2.2 Direct Loss Function
To overcome the limitations of the indirect learning process, (N. Ratliff et al. 2016) proposes to use a direct loss (Equation 6.7) to learn modeling errors. Here we use this loss in acceleration space (N. Ratliff et al. 2016), to derive an additional data source for inverse dynamics learning. We start with Equation 6.7, drop the weighting of the acceleration error by the inertia matrixM
and instead multiply the accelerations withM
Ldirect(θ) =
∑
tT=1kMq¨td−Mq¨ta(θk)k2 (6.11)=
∑
tT=1k(Mq¨td+h)−Mq¨ta(θk)−hk2where we have also added and subtractedh. The true dynamics modelM,his never evaluated in our loss formulation, it is merely used to derive the direct loss formulation as shown in the following. We can now summarize the first term as the true rigid body dynamics modelτrbd(q¨td),
evaluated at the desired accelerations, and we expand ¨qta(θk)as follows Ldirect(θ) =
∑
T t=1kτrbd(q¨ t d)−Mq¨ t a(θ k )−hk2 (6.12) =∑
tT=1kτrbd(q¨td)−MM −1[f id(q¨td;θ k)− h]−hk2 =∑
tT=1kτrbd(q¨dt)−(τˆrbd(q¨td) +f k iderr(xtd;θ k))k2 =∑
tT=1k(τrbd(q¨dt)−τˆrbd(q¨td))−f k iderr(x t d;θ k )k2where fid(q¨td)represents the state based rigid body dynamics and error model without a feedback
term which should ideally be zero. We now have transformed the loss on accelerations to a loss on torque commands at the input pointxtd. Note that this transformed loss intuitively means that we want to minimize the difference between our error model fiderrk (xtd;θk) and the true
modeling errorτerrort = (τrbd(q¨td)−τˆrbd(q¨td))at inputxtd= (qt,q˙t,q¨td). While intuitively pleasing,
we unfortunately do not have access to the true modeling errorτerrort . However, we can get an
estimate of the modeling error ˆτerrort =τfbt +fiderrk−1(xtd;θk−1)by combining the feedback termτt fb
and the error model fiderrk−1from the previous task execution (similar to feedback error learning (Nakanishi et al. 2004)), which results in the following loss
Ldirect(θ) =
∑
Tt=1kτˆerrort −fiderrk (xtd;θk)k2 (6.13)Similar to the indirect learning we can now construct a dataset
Dk
direct={x t←
xtd,yt ←τfbt +fiderrk−1(xtd;θk−1)}T
which can be used to learn or update the new error model fiderrk . We now receive data points directly on the desired accelerations. However, also this data set’s quality depends on tracking accuracy. With low feedback gains, the initialτfbmay not really capture the errors very well,
such that the first learning iteration may only capture part of the modeling errors.
Thus, with low feedback gains, we may require multiple learning iterations to learn an accu- rate error model. Whereas increasing the feedback gainsgwould lead to improved tracking, increasing the fidelity of the data, at the cost of compliance. Here we simply propose to do both: useglowto compute the feedback termsτfb(glow)which are sent to the system, and use ghigh to compute feedback termsτfb(ghigh)which are sent to the learner. Notice, the feedback
term usingτfb(ghigh)is never applied on the system. Thus, we maintain a very compliant system
while obtaining better error data for the portion of the state space reached with theglow. This can be helpful to break stiction or counteract high friction with fiderrafter fewer iterations which
otherwise would not be possible with traditional inverse dynamics learning approaches and low gains.
6.2.3 Joint Inverse Dynamics Learning
The key insight for our approach is that we can use the feedback term as an error estimate for the desired accelerations. Thereby the error model learning problem has two data sourcesindirect
anddirect. Both exhibit the same structure to optimize the error model. Hence, we can formulate a joint function approximation problem of the form:
Ljoint(θk) =
∑
(x,y)∈Dk
joint
ky−fiderrk (x;θk)k (6.15)
where data points for the actual accelerations ¨qtafor every timesteptcan be used as well as data points for the desired accelerations ¨qtdas described by
Dkjoint=Dkdirect∪ Dkindirect. (6.16)