BLOQUE I. MARCO TEÓRICO: LA VALIDACIÓN DE LOS APRENDIZAJES y Los sistemas de gestión de la
Capítulo 4. LOS SISTEMAS Y MODELOS DE GESTIÓN
4.1 Los Sistemas de Gestión basada en Procesos
4.1.2 Diseño, desarrollo e implementación de un Sistema de Gestión basada en Procesos
The frontier algorithm uses all the hidden nodes in a slice to d-separate the past from the future. This set is larger than it needs to be, and hence the algorithm is sub-optimal (see Table 3.1). I claim that the set of nodes with outgoing arcs to the next time-slice is sufficient to d-separate the past from the future; following [Dar01], I will call this set the “forward interface”. For example, the forward interface for Figure 3.4 is{X1
t, Xt4}. I
now state and prove this result formally.
Definition. Let the set of temporal arcs between slicest−1andtbe denoted byEtmp(t) ={(u, v)∈ E|u ∈ Vt−1, v ∈ Vt}, where Vtare the nodes in slicet. The forward interface is defined asIt
def
={u ∈ Vt|(u, v) ∈ Etmp(t+ 1), v ∈ Vt+1}, i.e., the nodes which have children in the next slice. The set of non-interface nodes isNt=Vt\It.
Lemma.{V1:t−1, Nt} ⊥Vt+1:T|It, i.e., the forward interface d-separates the past from the future, where
the past isNtand earlier nodes, and the future isVt+1and later nodes.
Fungi Mildew LAI Precip Micro Temp Photo Solar Dry Fungi Mildew LAI Precip Micro Temp Photo Solar Dry 1 2 3 4 5 6 7 8 9
Figure 3.3: The Mildew DBN, designed for foreacasting the gross yield of wheat based on climatic data, observations of leaf area index (LAI) and extension of mildew, and knowledge of amount of fungicides used and time of usage [Kja95]. Nodes are numbered in topological order, as required by BNT.
DBN Figure Slice-size Frontier Back Fwd
BAT 4.2 28 18 9 10
Water 4.1 12 8 8 8
Mildew 3.3 9 9 4 5
Cascade 3.5 4 4 4 1
Uffe 3.4 4 4 3 2
Table 3.1: Sizes of various separating sets for different DBNs. The slice-size is the total number of nodes per slice. The frontier is the total number of hidden nodes per slice. Back is the size of the backwards interface, i.e., the number of nodes with incoming temporal arcs, plus parents of such nodes in the same slice (see text for precise definition). Fwd is the size of the forward interface, i.e., the number of nodes with children in the next slice.
must be a child ofI, by definition). IfP is a parent, the graph looks like this:P →I→F. IfPis a child, the graph looks like this: P ← I → F. Either way, we haveP ⊥ F|I, sinceI is never at the bottom of a v-structure. Since all paths between any node in the past and any node in the future are blocked by some node in the interface, the result follows.
Kjaerulff [Kja95] defines a related quantity, which he called the interface, but which I will call the backward interface, to avoid confusion with the forward interface. He defines the backward interface to be all nodesv s.t. v, or one of its children, has a parent in slicet−1, i.e., int(t) = {v ∈ Vt|(u, v) ∈ Etmp(t)or∃w ∈ ch(v) : (u, w) ∈ Etmp(t), u ∈ Vt−1}. The reason for this definition is the following: when we eliminate a node from slicet−1, we will create a term which involves all nodes which depend on it, including those in slicet; some of the slicetterms will involve parents in slicetas well. For example,
X1 1 X12 X21 X22 X3 1 X23 X4 1 X24
Figure 3.4: The “Uffe” DBN, from Figure 3 of [Kja95]. Nodes are numbered topologically, as required by BNT. The dotted undirected arcs are moralization arcs. The backward interface is shaded.
Figure 3.5: The “cascade” DBN, from Figure 2 of [Dar01]. This graph has a treewidth of 2.
consider eliminating the nodes from slice 1 of Figure 3.4. We have X X3 1 P(X4 1|X13) X X2 1 X X1 1 P(X1 1)P(X12|X11)P(X13|X11, X12)P(X21|X11) | {z } λ(X13,X2 )1 | {z } λ(X21,X1 )4 and X X4 1 P(X4 2|X23, X14)λ(X21, X14) | {z } λ(X21,X24,X2 )3
Clearly we have coupled all the nodes in the backward interface, since the backward interface for Figure 3.4 is{X1
t, Xt3, Xt4}.
The forward interface can sometimes be dramatically smaller than the backward interface (see Table 3.1). For an extreme example, consider Figure 3.5: The forward interface is{1}but the backward interface is all the nodes in the slice. It is easy to see that the size of the forward interface is never larger than the size of the backward interface if all temporal arcs are persistence arcs, i.e., edges of the formXi
t−1toXti.
The other problem with the backward interface is that using it does not lead to a simple online inference algorithm; the one in [Kja95] involves a complicated procedure for dynamically modifying jtrees. Below I present a much simpler algorithm, which always uses the same jtree structure, constructed from a modified two-slice temporal Bayes net (2TBN) using an unconstrained elimination ordering, but with the restriction that the nodes in the forward interface must belong to one clique.
C2
D2
D3
C3
I
1
2
I2
2
2
3
3
J
J
J
1
2
3
C1/ D1
1
1
I
I
I
I
I
N
N
N1
Figure 3.6: A schematic illustration of how to join the junction trees for each 112-slice DBN. It are the interface nodes for slicet,Ntare the non-interface nodes. Dtis the clique inJtcontainingIt−1. Ctis the clique inJtcontainingIt. The square box represents a separator, whose domain isIt.
X1 X1 X2 X2 X3 S3 Y3 S2 Y1 S1 X1 X2 Y2
Figure 3.7: The DAGs for an HMM with mixture of Gaussians output, glued together by their interface nodes,
Xt. The non-interface isNt= (St, Yt).
3.4.1
Constructing the junction tree
LetGtbe the DAG created from slicest−1andtfrom the unrolled DBN. A “11
2-slice DBN”Ht(H for half) is the DAG created by eliminating all non-interface nodes (and their arcs) from the first slice ofGt, i.e.,
Ht=It−1∪Vt. (Fort= 1,H1=V1.)
We now construct a jtree,Jt, for eachHt. (See Section B.3 for details on how to construct a jtree.) We must enforce the constraint thatIt−1andIteach form a clique, since we needP(It−1)andP(It). This can be ensured by simply adding edges to the moral graph between all nodes inIt−1, and similarly forIt, before construcingJt.
We can now “glue” all the junction trees together via their interfaces, as shown in Figures 3.6 and 3.7. We can perform inference in each tree separately, and then pass messages between them via the interface nodes, first forwards and then backwards. We give the details below.
3.4.2
Forwards pass
In the forwards pass, we are given a prior belief stateP(It−1|y1:t−1), which is passed fromCt−1 toDt, whereCt−1is the clique inJt−1containingIt−1 andDtis the clique inJtcontainingIt−1. We then call collect-evidence onJtwithCtas the root, which has the effect of doing one step of Bayesian updating. Finally we marginalize down the distribution overCtontoItto computeP(It|y1:t), and pass this to the next slice. In more detail, the steps are as follows.
1. ConstructJt, where all clique and separator potentials are initialized to the identity element (1’s in the case of discrete potentials).
2. FromJt−1, extract the potential onCt−1, and marginalize it down toIt−1; this represents the prior,
3. Multiply the CPDs for each node in slicetonto the appropriate potential inJt, usingytwhere neces- sary.
4. Collect evidence to the rootCt.
5. Return all clique and separator potentials inJt. We will denote this operator abstractly as
(ft|t, Lt) =Fwd(ft−1|t−1, yt)
whereft|tcontains the clique and separator potentials inJt. As with HMMs, to prevent underflow, we must
normalize all the messages (or else work in the log domain). The normalization constant at the root,Ct, will bect=P(yt|y1:t), which can be used to compute the log-likelihood,Lt.
For the first slice, we skip the step involving the prior (step 2). This will be denoted by
(f1|1, L1) =Fwd1(y1)
After collecting toCt, not all nodes (cliques) inJtwill have “seen” all the evidencey1:t. For example,
in Figure 3.7, if we collect toX2, then the distribution onS2 will beP(S2|y2)rather than the full filtered distribution,P(S2|y1:2), sinceS2will only have received a message fromY2below, and not fromX2above. Hence we must additionally perform a distribute-evidence operation fromCtto compute the distribution over all nodes inJt; this will be performed in the backwards pass. Hence even when filtering, we must perform a backwards pass, but only within a single slice.
3.4.3
Backwards pass
In the backwards pass, we distribute evidence fromCt, and then pass a message fromDttoCt−1. The details are as follows.
1. The input isft|t, which contains the clique and separator potentials over all nodes inJt, andbt+1|T,
which contains the clique and separator potentials over all nodes inJt+1.
2. Frombt+1|T, extract the potential onDt+1, and marginalize it down toIt; this representsP(It|y1:T).
3. Update the potential onCtinJtby absorbing from the potential onDt+1inJt+1as follows:
φ∗C=φC× P D\CφD P C\DφC whereC=CtandD=Dt+1.6 4. Distribute evidence from the rootCt.
5. Return all clique (and optionally separator) potentials. We will denote this operator by
bt|T =Back(bt+1|T, ft|t)
To start the process at the final slice, we just distribute evidence fromCT (no need to absorb fromJT+1, which does not exist).
bT|T =BackT(fT|T)
6In the forwards pass, we implicitely computedφ∗
D=φD× P
C\DφC
P
D\CφD whereC=Ct−1andD=Dt. However, sinceDtis
initially all 1s, we simplified this toφD =PC\DφC =P(It−1|y1:t−1). In otherwords, we simply set the potential onDtto be P(It−1|y1:t−1), with a suitably extended domain.
X1 1 X21 X31 X41 X2 1 X22 X32 X42 X3 1 X23 X33 X43 X4 1 X24 X34 X44
Figure 3.8: A DBN in which all the nodes in the last slice become connected when we eliminate the first 3 or more slices, even though the max clique size is 2.
3.4.4
Complexity of the interface algorithm
The complexity of the interface algorithm can be characterized as follows.
Theorem. The complexity of the interface algorithm is betweenΩ(KI+1)andO(KI+D), whereIis
the size of the forwards interface,Dis the number of hidden nodes per slice, andKis the maximum number of values each discrete hidden node can take.
Proof.7 When we create the11
2slice jtree, the nodes areIt−1∪Vt. Letwbe the size of the maximum clique created by eliminating this set according to some orderingπ. ClearlyI+ 1≤w, all the nodes inIt−1have a target inVt, andw≤I+D, since each node inIt−1can be connected to at mostDnodes inVt. We now show these bounds are tight. The lower bound can be achieved by a Markov chain, whereI = 1, sow= 2
(the cliques correspond to(Xt−1, Xt). The upper bound can be achieved by the DBN shown in Figure 3.2.