• No se han encontrado resultados

CAPÍTULO 1. CONTEXTO DE DESARROLLO DE LA INVESTIGACIÓN: UNIVERSIDAD

1.1 Proyecto Educativo de la Ibero

1.1.3 Misión y visión

If the interface cliques are too large, one approach is to approximate the joint on the interface as the product of marginals of smaller terms; this is the basic idea behind the BK algorithm [BK98b, BK98a, BK99].

More precisely, BK constructs the junction tree for the112 slice DBNHt, but does not require all the interface nodes to be in the same clique; i.e.,P(It|y1:t)is no longer represented as a single clique potential. Instead, we approximate the belief state by a product of marginals,P(It|y1:t) ≈ QcC=1P(Itc|y1:t), where

P(Ic

t|y1:t)is the distribution on nodes in clusterc. The set of clusters partitions the nodes in the interface. Hence rather than connecting together all the nodes in the interface, we just connect together the nodes in each cluster. The assumption is that this will result in smaller cliques in the 2TBN. Figures 4.3 and 4.4 show that the max clique size is indeed reduced in some cases. The basic reason is that we only had to triangulate the two-slice network, not the unrolled network.

The accuracy of the BK algorithm depends on the clusters that we use to approximate the belief state. Exact inference corresponds to using a single cluster, containing all the interface nodes. The most aggressive approximation corresponds to usingDclusters, one per variable; we call this the “fully factorized” approxi- mation.

We can implement BK by slightly modifying the interface algorithm (see Section 3.4), as we explain below.

Forwards pass

1. Initialize all clique and separator potentials inJtto 1s, as before. 2. Include the CPDs and evidence for slicetas before.

3. To incorporate the prior, we proceed as follows. For each clusterc, we find the smallest cliquesCt1 andDtinJt1andJtwhose domains includec; we then marginalizeφCt−1 ontoI

c

t−1and multiply this ontoφDt.

14 7 19 17 16 20 23 21 25 27 6 12 13 10 8 4 1 3 28 26 22 24 18 15 11 9 5 2 SensorValid1 FYdotDiff1 FcloseSlow1 Xdot0 Xdot1 InLane0 InLane1 LeftClr0 LeftClr1 RightClr0 RightClr1 LatAction0 LatAction1 FwdAction0 FwdAction1 Ydot0 Ydot1

Stopped0 Stopped1 BXdot1 EngStatus0 EngStatus1 BcloseFast1 FrontBackStatus0 FrontBackStatus1 BYdotDiff1

Fclr1 Bclr1 XdotSens1 YdotSens1 LeftClrSens1 RightClrSens1 TurnSignal1 FYdotDiffSens1 FclrSens1 BXdotSens1 BclrSens1 BYdotDiffSens1 slice t slice t+1 evidence

Figure 4.2: The BATnet, designed to monitor the state of an autonomous car (the Bayesian Automated Taxi, or BATmobile) [FHKR95]. The transient nodes are only shown for the second slice, to minimize clutter. The dotted arcs group together nodes that will be used in the BK approximation: see Section 4.2.1. Thanks to Daphne Koller for providing this figure. Nodes are numbered in topological order, as required by BNT.

1 2 3 4 5 6 7 8 9 10 11 0 2 4 6 8 10 12 14 16 clique size number 1 2 3 4 5 6 7 8 0 2 4 6 8 10 12 14 16 clique size number 1 2 3 4 5 6 0 2 4 6 8 10 12 14 16 18 clique size number (a) (b) (c)

Figure 4.3: Clique size distribution for different levels of BK approximation applied to the BATnet (Figure 4.2). The cliques of size 2 are due to the observed leaves. (a) Exact: the interface is {7,14,16,17,19,20,21,23,25,27}. (b) Partial approximation: the interface sets are{7,14,16,17,19}and {20,21,23,25,27}, as in Figure 4.2. (c) Fully factorized approximation: the interface sets are the singletons.

1 2 3 4 5 6 7 8 9 10 11 12 0 1 2 3 4 5 6 7 8 9 clique size number 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 8 9 clique size number 1 2 3 4 5 6 0 1 2 3 4 5 6 7 8 9 10 11 clique size number (a) (b) (c)

Figure 4.4: Clique size distribution for different levels of BK approximation applied to the water DBN (Figure 4.1). The cliques of size 2 are due to the observed leaves. (a) Exact: the interface is{1, . . . ,8}. (b) Partial approximation: the interface sets are {1,2},{3,4,5,6} and{7,8}, as in Figure 4.1. (c) Fully factorized approximation: the interface sets are the singletons.

b

αt

αbt+1

coupled

e

αt1

αet

αet+1

factored

U

P U

P

Figure 4.5: The BK algorithm as a series of update (U) and projection (P) steps.

4. We collect and distribute evidence to/from any clique.

The reason we must collect and distribute evidence is because the clusters which constitute the interface may be in several cliques, and hence all of them must be updated before being used as the prior for the next step. If there is only one cluster, it suffices to collect to the clique that contains this cluster.

Backwards pass

A backwards (smoothing) pass for BK was described in [BK98a]. ([KLD01] describes a similar algorithm for general BNs called “up and down mini buckets”.) Again, it can be implemented by a slight modification to the interface algorithm:

1. ConstructJtfromft|tas before.

2. To incorporate the backwards message, we proceed as follows. For each clusterc, we find the smallest cliquesDt+1inJt+1andCtinJtwhose domains includec; we then marginalizeφDt+1 ontoI

c t and

absorb it intoφCt.

3. We collect and distribute evidence to/from any clique.

Analysis of error in BK

The forwards pass may be viewed more abstractly as shown in Figure 4.5. We start out with a factored prior, αt˜ 1, do one step of exact Bayesian updating using the junction tree algorithm (which may couple some of the clusters together), and then “project” the coupled posteriorαtˆ down to a factored posterior,αt˜ , by computing marginals. This projection step computes αt˜ = arg minq∈SD(ˆαt||q), where S is the set

of factored distributions, andD(p||q)def=Pxp(x) logp(x)/q(x)is the Kullback-Liebler divergence [CT91]. This is an example of assumed density filtering (ADF) [Min01].

As an algorithm, BK is quite straightforward. The main contribution of [BK98b] was the proof that the error remains bounded over time, in the following sense:

E KL(αt||αt)˜ ≤ γt

where the expectation is take over the possible observation sequences,γ∗is the mixing rate of the DBN, and

tis the additional error incurred by projection, over and above any error inherited from the factored prior:

t=KL(αt||αt)˜ KL(αt||αt).ˆ

The intuition is that, even though projection introduces an error at every time step, the stochastic nature of the transitions, and the informative nature of the observations, reduces the error sufficiently to stop it building up.

The best case for BK is when the observations are perfect (noiseless), or when the transition matrix is maximally stochastic (i.e.,P(Xt|Xt−1)is independent ofXt−1), since in either case, errors in the prior are irrelevant. Conversely, the worst case for BK is when the observations are absent or uninformative, and the transition matrix is deterministic, since in this case, the error grows over time. (This is consistent with the theorem, since deterministic processes can have infinite mixing time.) In Section 5.2, we will see that the best case for BK is the worst case for particle filtering (when we propose from the prior), and vice versa.