Θ := ¯Θ × AN). One should think of ¯θ as a global parameter and of αi ∈ A as a local pa-
rameter identifiable by node i only. Let (Pθ)θ∈Θ be a collection of probability measures on
(Ω,F). We denote by fθ(z1, . . . , zN) the p.d.f. of (Z1, . . . , ZN) induced by the model Pθ, w.r.t.
some arbitrary reference measures on (X × Y)N. We denote by gθ(y1, . . . , yN) the p.d.f. of the
observations induced by Pθ.
We assume that the observations (Y1, . . . , YN) have an unknown p.d.f. π under some proba-
bility Pπon (Ω,F). Here, Pπrepresents the actual probability under which the observed samples
are generated. For a better understanding, it might be convenient to think of π as π = gθ? for
some "true” parameter θ?, however our algorithm and our analysis does not need such hypothe- sis.
Denote by h . , . i is the inner product in Rpand by | . | the Euclidean norm. Assumption A.1. For any θ = (¯θ, α1, . . . , αN),
i) For anyz1, . . . , zN,fθ(z1, . . . , zN) =:Qifi,¯θ,αi(zi) where the marginal p.d.f. fi,¯θ,αi(zi)
coincides with:
hi(zi) exp− ¯ψ(¯θ) − ψi(αi) + hSi(zi), ¯φ(¯θ) + φi(αi)i
where ¯ψ, ¯φ : ¯Θ → R, ψi, φi : A → Rp,Si : X × Y → Rp are some measurable functions
andhi(zi) is a normalization factor.
ii) The r.v. Eθ[Si(Zi)|Yi] is well defined for any i.
In the sequel, we assume that a sequence of independent and identical distributed (i.i.d.) observations is available at each sensor. More precisely, for each i = 1, . . . , N , we introduce a time series Zi,n = (Xi,n, Yi,n) (n = 1, 2, . . . ) such that, under Pθ, (Zi,n)n≥1is i.i.d. and has
the same distribution as Zi. Here, (Yi,n)n≥1represents the sequence of observations of sensor i
while (Xi,n)n≥1represents the sequence of hidden r.v..
A.3
Centralized EM algorithms
We review centralized EM algorithms, assuming that a fusion center is able to gather all infor- mation of all sensors at each instant n. Although we are interested in on-line algorithms, we first review the usual batch version of the EM algorithm for convenience.
A.3.1 Batch EM
Assume that each sensor i collects T observations Yi,1:N := (Yi,1, . . . , Yi,T). The so-called
intermediate quantityof the EM algorithm plays a central role: QT(θ0, θ) := 1 N T T X n=1 N X i=1 Eθ0 log fi;¯θ,α i(Zi,n) |Yi,n (A.1) where θ0, θ ∈ ¯Θ × AN, θ = (¯θ, α1, . . . , αN) and Eθis the expectation associated with Pθ. The
EM algorithm is an iterative procedure which generates an estimate θ(k)= (¯θ(k), α(k)1 , . . . , α(k)N ) at each iteration k. The update is done in two steps:
E-step: Compute the function θ 7→ QT(θ(k), θ) ;
M-step: Set θ(k+1):= arg maxθQT(θ(k), θ) .
In practice, such an algorithm makes sense only if each of the above steps can be realized at low computational price. Under Assumption A.1, both steps simplify as follows. Consider i = 1, . . . , N , θ = (¯θ, α1, . . . , αN) and θ0 = (¯θ0, α01, . . . , α0N). Let us introduce a function
y 7→ σi;¯θ,αi(y) defined on Y such that w.p.1:
σi;¯θ,αi(Yi) = Eθ(Si(Zi)|Yi) , (A.2)
By AssumptionA.1, the RHS of the above equality depends on θ only through ¯θ and αi. It is
straightforward to show that Eθ0 log fi;¯θ,α
i(Zi) |Yi coincides with:
− ¯ψ(¯θ) − ψi(αi) + hσi;¯θ0,α0
i(Yi), ¯φ(¯θ) + φi(αi)i
up to an additive random value Eθ0(log hi(Zi)|Yi) which does not depend on θ and which shall
thus play no role in the M-step. Thus, up to a constant w.r.t. θ, the intermediate function QT(θ(k), θ) at iteration k coincides with:
¯ ψ(¯θ) + h¯s(k), ¯φ(¯θ)i + 1 N N X i=1 ψi(αi) + hs(k)i , φi(αi)i (A.3) where s(k)i := T1 PT n=1σi;¯θ(k),α(k) i
(Yi,n) and where ¯s(k) = N1 PNi=1s(k)i . We will respectively
refer to these quantities as the local and global sufficient statistics. The E-step reduces to the computation of s(k)i for any i = 1, . . . , N , and their average ¯s(k). The maximization of (A.3) can be achieved separately with respect to (w.r.t.) ¯θ, α1, . . . , αN. Assume that the following
functions are well defined for any s in a relevant domain, and that their numerical computation is inexpensive: M(s) := arg max ¯ θ∈ ¯Θ − ¯ψ(¯θ) + hs, ¯φ(¯θ)i (A.4) Mi(s) := arg max α∈A−ψi(α) + hs (k) i , φi(α)i . (A.5)
The standard batch EM algorithm is summarized below in Algorithm6.
A.3.2 On-line EM
From now on to the end of this paper, we assume that each sensor observes the time series (Yn,i)n≥1. We are interested in on-line algorithms i.e., algorithms which are able to update the
estimate any time new samples come in. The idea beyond the on-line EM algorithm of [36] is simply to replace the batch sufficient statistics with their on-line counterparts. In such case, there is no difference between n and k index, and the E-step is computed any time a new observation comes in. Assume that each agent i has access to its time series (Yn,i)n≥1. The algorithm
A.3. Centralized EM algorithms 147
Algorithm 6: Centralized batch EM algorithm (EM) Initialize: s00,i, ¯s0,i for all i = 1, . . . , N .
Update: At each iteration k ≥ 0 do E-step:
Compute s(k)i for any i, and the average ¯s(k). M-step:
For all i = 1, . . . , N , set α(k+1)i :=Mi(s(k)i ).
Set ¯θ(k+1):=M(¯s(k)).
by an iterative stochastic approximation step in order to track this average value at the same time that the M-step tracks the estimated parameters. The estimate θnat time n is generated similarly
to Algorithm6in two steps after an arbitrary initialization of values s1,0, . . . , sN,0. The on-line
E-step is given by the following recursion: sn,i = sn−1,i+ γn
σi;¯θn−1,αn−1,i(Yn,i) − sn−1,i
(A.6) ¯ sn = 1 N N X i=1 sn,i, (A.7)
where γnis a positive step size/gain. We refer to si,nas a summary statistics. Next, the estimate
θnis updated by the following M-step:
¯
θn=M(¯sn) and ∀i, αn,i=Mi(si,n) . (A.8)
The asymptotic analysis of the above centralized algorithm is available in [36] under the hypoth- esis of vanishing gains γnsuch:
Assumption A.2. Sequence (γn)n≥0is positive, non-increasing, and satisfies:
i) P
nγn= +∞,
ii) P
nγn2 < ∞.
Remark A.1. The convergence result given by [36] is under the assumption that the algorithm is stable: the sequence of summary statistics remains almost surely in some compact set, strictly included in the domain of definition of functions M and Mi. Verifying this assumption is not
an easy task. Instead, it is of common practice in stochastic approximation to force stability by confining the updated sequence (A.6) to a given convex compact setS (see [98, pp.120] for a discussion).
Here, we shall follow this approach. We denote by ΠS the Euclidean projector onto the set
S. Thus, we introduce the following assumption as a consequence of the previous RemarkA.1. Assumption A.3. There exists a convex open set S such that the following holds for any i = 1, . . . , N . Functions ¯M : S → ¯Θ and Mi : S → A are well defined by (A.4) and (A.5)i.e., the