1.9 Control de cambios y versiones de reglas de negocio
CAPÍTULO 3: Validación y prueba de la herramienta LPT-SQL.
3.3 Creación de las clases de equivalencia.
Recall from Example 1 in Chapter 2.2 that the CHDS example consists of the following four variables:
• X2 = family economic situation: binary variable: ‘low’, ‘high’
• X3 = number of family life events: variable with three categories : ‘low’,
‘average’, ‘high’
• X4 = hospital admission: binary variable: ‘yes’, ‘no’,
and that this can be represented by a BN on four variables. Based on the conclusions of Fergusson et al. [1986] I originally deduced the BN in Figure 2.1. This stated that the economic situation has no e↵ect on hospital admission once adjusting for the social background and the family life events, and further that this is the only non-trivial conditional independence statement. In this section I will now instead use the BD metric to find the best fitting BN structure given the data set of the CHDS example discussed in Chapter 1.2.1.
To set up the Dirichlet prior distributions on the ✓ij = p(xi|pa(xi) =j;✓),
I assume a uniform prior on p(x|Bc) such that the distribution over all possible
configurations is uniform and hence the hyperparameters↵ijkof thep(✓ij) are given
by equation 3.11. I further specify an equivalent sample size of↵= 3, the maximum number of categories taken by a variable in the CHDS problem, As recommended in Neapolitan [2004]. Finally, I assume that structures are a priori equally likely and hence Bayes Factors are used throughout for the comparison of di↵erent models. An exhaustive search using the ‘deal’ package in R [Bøttcher and Dethlefsen, 2003] over all possible BNs on the four variables scores each BN according to the logarithm of the marginal likelihood of the structure given the data and finds the MAP model to be the DAG given in Figure 3.1 with associated CPVs given in Table 3.2.
X2 Economic situation X1 Social back- ground # # < < X4 Ad- missions X3 Life events ; ;
Figure 3.1: The Maximum a Posteriori BN of the CHDS example on social back- ground, economic situation, life events and hospital admission. BN score (logarithm of the marginal likelihood) logL(B|N) = 2489.776
Similar to the network structure derived from Fergusson et al. [1986] (Fig- ure 2.1) the MAP model suggests that hospital admission is independent of the
Conditional Probability Vector
(P(X1=High), P(X1=Low)) (0.569,0.431)
(P(X2=High|X1=High), P(X2=Low|X1=High)) (0.468,0.532)
(P(X2=High|X1=Low)P(X2=Low|X1=Low)) (0.122,0.878)
(P(X3=Low|X1=High), P(X3=Average|X1=High), P(X3=High|X1=High)) (0.461,0.347,0.192)
(P(X3=Low|X1=Low), P(X3=Average|X1=Low), P(X3=High|X1=Low)) (0.248,0.311,0.441)
(P(X4=No admission|X3=Low), P(X4=Admission|X3=Low)) (0.880,0.120)
(P(X4=No admission|X3=Average), P(X4=Admission|X3=Average)) (0.789,0.211)
(P(X4=No admission|X3=High), P(X4=Admission|X3=High)) (0.743,0.257)
Table 3.2: The associated table of CPVs associated with the MAP BN from Figure 3.1
economic situation given the social background and the number of life events. How- ever, exhibits several additional conditional independencies between the variables: It suggests that the economic situation and the family life events are independent given the social background (X3??X2|X1) and expresses that a direct dependency
occurs only between the life events and the hospital admissions and not between social background and admissions (X4 ?? X1, X2|X3). Table 3.2 shows that the
hospital admissions vary between 12% and 25.7% depending on the number of life events.
Nevertheless, the exhaustive search over all possible structures reveals two further BN structures scoring only slightly less than the MAP model which are given in Figure 3.2. Network structure (a) swaps the directed edge from family life
X2 Economic situation X1 Social back- ground / / < < # # X4 Ad- missions X3 Life events (a) 2nd BN, logL(B|N) = 2490.073 X2 Economic situation ✏ ✏ X1 Social back- ground < < # # X4 Ad- missions X3Life events ; ; (b) 3rd BN, logL(B|N) = 2490.751
Figure 3.2: High scoring BN structures for the CHDS example on social background, economic situation, life events and hospital admission
events to admissions with an edge from the social background to the admissions. Structure (b) introduces an extra edge between the economic situation and the family life events. In comparison to the MAP model the log Bayes Factors are 0.297 and 0.975 favouring the MAP model. By Table 3.1, giving the scale of evidence for Bayes Factors, these di↵erences in scores are negligible and hence, given the data
set provided, all three structures are believed to be similarly plausible. As noted in Section 3.1 model selection may be sensitive to the selected equivalent sample size. Nevertheless, in this case, increasing the equivalent sample size leads to the same three highest scoring models, together with the BN derived from Fergusson’s results.
Although all three models suggested in Figures 3.1 and 3.2 have similar scores, the conclusions drawn from these three BN structures di↵er. While all three structures suggest that the social background a↵ects the economic situation and the life events and that the economic situation does not influence hospital admissions, it is not clear in what way the social background and the life events a↵ect the hospital admissions and whether the life events depend on the economic situation. This suggests that a model, which combines features of di↵erent competing BNs may be closer to the underlying true model. When searching the CEG space it will always be possible to find the CEG corresponding to the MAP BN structure, as the BN is a subclass of the class of CEGs. However, when considering the BN structures in Figure 3.1, it seems likely that we will be able to find a CEG which combines vertices into stages and positions in an asymmetric and hence result in a higher model score.