• No se han encontrado resultados

BLOQUE I. MARCO TEÓRICO: LA VALIDACIÓN DE LOS APRENDIZAJES y Los sistemas de gestión de la

Capítulo 4. LOS SISTEMAS Y MODELOS DE GESTIÓN

4.2 El Cuadro de Mando Integral

4.2.1 Estructura del Cuadro de Mando Integral

We proved above that the interface algorithm always takes at leastΩ(KI+1)time, whereIis the size of the forwards interface. But this is a lower bound on the algorithm, not on the problem itself. To get a lower bound on the problem, we need to distinguish offline and online inference.

3.5.1

Offline inference

The simplest way to do exact inference in a DBN is to “unroll” the DBN forT slices and then apply any in- ference algorithm to the resulting static Bayes net. As discussed in Appendix B, the cost of this is determined by the tree width:

Theorem. Consider a 2TBN GwithD nodes per slice, each of which has a maximum of K pos- sible discrete values. Let GT = UT(G) be this DBN unrolled for T slices. Then the complexity of any offline inference algorithm is at least Ω(Kw), where w is the treewidth, i.e., the max clique size of

triangulate(moralize(GT))using an optimal elimination ordering.

In general,w ≥I, the size of the forwards interface, but this is not always the case. For example, in Figure 3.8,I= 4butw= 2, since the graph is a tree.

3.5.2

Constrained elimination orderings

When doing inference with sequences that can have variable lengths, it becomes too expensive to repeatedly unroll the DBN and convert it to a jtree. The approach taken in [Zwe98] is to unroll the DBN once, to some maximum lengthTmax, construct a corresponding jtree, and then to “splice out” redundant cliques from the jtree when doing inference on a shorter sequence.

To ensure there is some repetitive structure to the junction tree which can be spliced out, we must use a constrained elimination ordering, in which we eliminate all nodes from slicetbefore any from slicet+ 1. The temporal constraint ensures that we create “vertical” cliques, which only contain nodes from neighboring time-slices, instead of “horizontal” cliques, which can span many time-slices. The resulting jtree is said to be constrainedly triangulated.

For example, consider the DBN in Figure 3.3. Using the constrained min-fill heuristic, as implemented in BNT, resulted in the following elimination ordering: 7,1,2,4,5,6,7,3,8. The corresponding jtree, for 4 slices, is shown in Figure 3.9. The cliques themselves are shown in Figure 3.11. Notice how the jtree essentially has a head, a repeating body, and then a tail; the head and tail are due to the boundaries on the left and right. This is shown schematically in Figure 3.10. Zweig [Zwe98] shows how to identify the repeating structure (by looking for “isomorphic” cliques); this can then be “spliced out”, to create a jtree suitable for offline inference on shorter sequences. By contrast, Figure 3.12 shows the cliques which result from an unconstrained elimination ordering, for which there is no repeating pattern.

Figure 3.9: The jtree for 4 slices of the Mildew DBN (Figure 3.3) created using a constrained elimination ordering. Notice how the backward interface (cliques 9, 17 and 25) are separators. Node 28 in slice 4 is not connected to the rest of the DBN. The corresponding clique (26) is arbitrarily connected to clique 1 (nodes 3,5,7,8 in slice 1) to make the jtree a tree instead of a forest.

Figure 3.10: A schematic illustration of a generic jtree for a DBN. The diamonds represent the head and tail cliques, the boxes represent the ’body’ cliques, that get repeated. The dotted arc illustrates how some slices can be ’spliced out’ to create a shorter jtree. Based on Figure 3.10 of [Zwe98].

3.5.3

Consequences of using constrained elimination orderings

Using a constrained elimination ordering for offline inference is suboptimal. For example, Figure 3.8 shows a DBN where the treewidth is 2 (since the graph is essentially a tree), but a constrained elimination ordering creates cliques of sizeD= 4for slices aftert4.8 This is a consequence of the following theorem.

8You can verify this and other facts experimentally using the following BNT script:BNT/examples/dynamic/jtree-clq- test.m.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

Figure 3.11: A representation of the cliques in the constrained Mildew jtree (Figure 3.9). Each row represents a clique, each column represents a DBN variable. The repetitive structure in the middle of the jtree is evident. For example, rows 17 and 25 are isomorphic, because their “bit pattern” is the same (101100011), and is exactly 9 columns apart. (The bit pattern refers to the 0s and 1s, which is a way of representing which DBN nodes belong to each clique set.)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Figure 3.12: This is like Figure 3.11, but using an unconstrained elimination ordering. Notice how some cliques span more than two consecutive time slices, e.g., cliques 22 is{18,26,27,30}.

Theorem (Constrained elimination ordering) [RTL76]. LetA1, . . . , Anbe an elimination sequence tri- angulating the (moral) graphG, and letAi,Aj be two non-neighbors inG,i < j. Then the elimination sequence introduces the fill-inAiAjiff there is a pathAiX1−. . .−Ajsuch that all intermediate nodes

Xkare eliminated beforeAi.

For example, in Figure 3.8, for anyt4, every node in slicetis connected to every other node in slicet

via some (undirected) path through the past; hence all the nodes in slicetbecome connected in one big clique in the triangulated graph.

Another example is the coupled HMM in Figure 3.13. Even though chain 1 is not directly connected to chain 4, they become correlated once we unroll the DBN. Indeed, the unrolled DBN looks rather like a grid-structured Markov Random Field, for which exact inference is known to be intractable. (More precisely, the tree width of anN =n×n2D grid with nearest-neighbor (4 or 8) connectivity isO(n)[RS91].)

Figure 3.13: A coupled HMM with 4 chains. Even though chain 1 is not directly connected to chain 4, they become correlated once we unroll the DBN, as indicated by the dotted line.

my experience the constraint nearly always helps the min-fill heuristic find much better elimination orders; similar findings are reported in [Dar01]. (Recall that finding the optimal elimination order is NP-hard.) However, Bilmes reports (personal communication) that by using the unconstrained min-fill heuristic with multiple restarts, the best such ordering tends to be one that eliminates nodes which are not temporally far apart, i.e., it is similar to imposing a constraint that we eliminate all nodes within a small temporal window, but without having to specify the width of the window ahead of time (since the optimal window might span several slices). The cost of finding this optimal elimination ordering can be amortized over all inference runs.

3.5.4

Online inference

X11 X21 X31 X41 X2 1 X22 X32 X42 X13 X23 X33 X43 X4 1 X24 X34 X44

Figure 3.14: A DBN in which the nodes in the last slice do not become connected when we eliminate the first 3 or more slices. Dotted lines represent moralization arcs.

For an online algorithm to use constant space and time per iteration, there must be some finite timetat which it eliminates all earlier nodes. Hence online inference must used constrained elimination orderings.

The above suggests that perhaps online inference always takes at leastΩ(KI+1)time. However, this is false: Figure 3.14 provides a counter-example. In this case, the largest clique in both the constrained and unconstrained jtree is the triple {X3

t−1, Xt4−1, Xt4}, because there is no path through the pastX’s which

connects all the nodes in the last slice. So in this case inference is cheaper thanO(KI).

The interface in Figure 3.14 has size 4, even though it could be represented in factored form without loss of accuracy. We can identify this sort of situation as follows: unroll the DAG forDslices, whereD is the

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 (a) (b)

Figure 3.15: The adjacency matrices for the closure of the moralized unrolled graphs for two DBNs. (a) The DBN in Figure 3.8; the bottom right4×4block is all 1s, indicating that the interface is all the nodes. (b) The DBN in Figure 3.14; the bottom right4×4block is sparse, indicating that the interface is{1},{2}, and {3,4}.

number of nodes per slice; moralize the unrolled graph; find the transitive closure of the moral graph; extract the subgraph corresponding to the last slice; finally, find the strongly connected components of the subgraph, and call themC1, . . . , CC— these are the factors in the interface. See Figure 3.15 for some examples.

We will denote this sequence of operations as

(C1, . . . , CC) =cc(cl(m(UD(G)))∩VD)

where cc(G)means returns the connected components ofG,GVD means the subgraph ofGcontaining nodes in sliceD,cl(G)means the transitive closure ofG,m(G)means the moral graph ofG, andUD(G)

means unroll the 2TBNGforDslices. We summarize our results as follows.

Conjecture. Consider a 2TBNGwithDnodes per slice, each of which has a maximum ofKpossible discrete values. Let(C1, . . . , CC) = cc(cl(m(UD(G)))∩VD)be the connected components, as described above. Letm= maxC

i=1Cibe the size of the largest interface factor, and letFinbe the maximal number of parents of any node within the same slice. Then the complexity of any online inference algorithm is at least

Ω(Km+Fin+1).

We can achieve this lower bound using what I call the “factored interface” algorithm: this is a simple modification to the interface algorithm, which maintains the distributions over theCi’s in factored form.

3.5.5

Conditionally tractable substructure

The pessimistic result in the previous section is purely graph-theoretic, and hence is a worst-case (distribution- free) result. It sometimes happens that the CPDs encode conditional independencies that are not evident in the graph structure. This can lead to significant speedups. For example, consider Figure 3.16. This is a schematic for two “processes” (here represented by single nodes),BandC, both of which only interact with each other via an “interface” layer,R.

The forward interface is{R, B, C}; sinceB andC represent whole subprocesses, this might be quite large. Now suppose we remove the dotted arcs, soRbecomes a “root”, which influencesB andC. Again, the forward interface is{R, B, C}. Finally, suppose thatR is in fact a static node, i.e., P(Rt|Rt1) =

δ(Rt, Rt1). (For example, R might be a fixed parameter.) In this case, the model can be simplified as shown in Figure 3.17. This model enjoys the property that, conditioned onR, the forward interface factorizes [TDW02]:

YC 1 Y2C Y3C C1 C2 C3 R1 R2 R3 B1 B2 B3 YB 1 Y2B Y3B

Figure 3.16: Two “processes”, here represented by single nodes,BandC, only interact with each other via an “interface” layer,R.

where yt = (ytB, ytC). This follows sinceBt⊥y1:Ct|R andCt⊥y1:Bt|R, i.e., evidence which is local to a

process does not influence other processes: theRnode acts like a barrier. Hence we can recursively update each subprocess separately:

P(Bt|R, yB1:t) = P(Bt|R, y1:Bt−1, ytB) ∝ P(yBt|Bt)

X

b

P(Bt|Bt−1=b, R)P(Bt−1=b|R, y1:Bt−1) The distribution over the root variable can be computed at any time using

P(R|y1:t)∝ X

b

P(R)P(Bt=b|R, y1:t)

The reason we cannot apply the same factoring trick to the model in Figure 3.16 is thatP(Bt|Rt, y1:t)6=

P(Bt|Rt, yB

1:t), since there is a path (e.g., viaRt−1) connectingYtC toBt. If we could condition on the

whole chainR1:tinstead of just onRt, we could factor the problem. Of course, we cannot condition on all

possible values ofR1:t, since there areKtof them; however, we can sample representative instantiations.

Given these samples, we can update the processes exactly. See Section 5.3.3 for an application of this idea to the SLAM problem.

We discuss how to exploit other kinds of “parametric” (non-graphical) conditional independence for exact inference in Section B.6.2. In general, the nodes in the interface will not be conditionally independent, even if we exploit parametric properties in the CPDs. However, they may be only weakly correlated. This is the basis of the approximation algorithms we discuss in the next two chapters.

3.6

Continuous state spaces