• No se han encontrado resultados

Fase 3 Al finalizar la obra

3.4.2 Plan general de trabajo

An important parameter of local-to-global learning previously unnoticed in algorithms such as SCA and MMHC is the ordering of variables when executing the local causal discovery variable-by- variable (i.e., not in parallel). We will assume that results are shared among local learning runs of GLL-PC, that is when we start learning PC(X) by GLL-PC rather than starting with an empty

TPC(X) set, we start with all variables Y : XPC(Y). This constitutes a sound instantiation of the GLL-PC algorithm template as explained in Aliferis et al. (2010). Figure 11 gives two extreme examples where the right order can “make-or-break” an LGL algorithm.

In Figure 11(a) it is straightforward (and left to the reader to verify) that an order of local learning<X1,X2, . . . ,X100,Y >without symmetry correction (the latter being a reasonable choice as we have seen) requires a quadratic number of conditional independence tests (CITs) for the unoriented graph to be correctly learned. However, the order of local learning<Y,X1,X2, . . . ,X100> requires up to an exponential number of CITs as max-k and sample are allowed to grow without bounds. Even with modest max-k values, the number of CITs is higher-order polynomial and thus intractable. Even when Y is not in the beginning but as long as a non-trivial number of X ’s are after it in the ordering, the algorithm will be intractable or at least very slow. The latter setting occurs in the majority of runs of the algorithm with random orderings.

In Table 12 we provide data from a simulation experiment showing the above in concrete terms and exploring the effects of limited sample and connectivity at the same time. As can be seen, under fixed sample, running HHC with order from larger to smaller connectivity, as long as the sample is enough for the number of parents to be learned (i.e., number of parents is≤20), increases run time by more than 100-fold. However because sample is fixed, as the number of parents grows the number of conditional independence tests equalizes between the two strategies because CITs that have too large conditioning sets for the fixed sample size are not executed. Although the number of CITs is self-limiting under these conditions, quality (in terms of number of missing edges, that is, number of undiscovered parents of T ) drops very fast as the number of parents increases. The random ordering strategy trades off quality for execution time with the wrong (larger-to-smaller connectivity) ordering, however in all instances the right ordering offers better quality and 2 to 100-fold faster execution that random ordering.

!"#$%&'()' *+&%,-.'()'/ %0-&+' %12%. #3..3,2' %12%. 456. %0-&+'%12%. #3..3,2' %12%. 456. %0-&+' %12%. #3..3,2' %12%. 456. !" 7 8 9: 7 8 7;<9= 7 8 <;:7> #" < 8 7:: <?@ >?7 79;78: > @ 7A;@@< $" =7 8 >79 =7 =7?< <=;<AA == 7= A;878 %" =: 8 A8< =9?< 78?= >=;79A =A :: >;979 &" 77 @ =;<7B 7B?B :8 =9;B7B :< <: <;=<A '" 7A @ 7;88= :7?A :>?@ :9;A>8 :B >< :;B97 (" <= =A 7;@@: <>?@ :@?A 7<;<>9 >> 9: <;<9< )" >B 7B :;9>7 9>?< >>?= =7;9:8 @8 @< >;87: *" 99 :> <;9:< @7?: >@?9 =9;@=B B@ B> >;>A7

!"" @@ << >;>A< BB?@ B8 =9;799 A9 A< @;77A

(&1%&')&(#'C(DE-(EF32F' G(,,%G-3H3-I &+,1(#'(&1%& J+H%&+2%'&%."C-.'(H%&'=8'(&1%&.K (&1%&')&(#'F32FE-(EC(D' G(,,%G-3H3-I

Table 12: Results of simulation experiment with HHC algorithm. The graphical structure is de- picted on Figure 11(a). HHC was run on a random sample of size 1,000 with G2test for conditional independence,α=0.05, max-k = 5, Dirichlet weight = 10, BDeu priors.

A more dramatic difference exists for the structure in Figure 11(b) where Y is a parent of all

X ’s. Here the number of tests required to find the parent (Y ) of each Xi is quadratic to the number of variables with the right ordering (low-to-high connectivity) whereas an exponential number is needed with the wrong ordering (large-to-small connectivity). Because the sample requirements are constant to the number of children of Y , quality is affected very little and there is no self- restricting effect of the number of CITs, opposite to what holds for causal structure in Figure 11(a). Hence the number of CITs grows exponentially larger for the large-to-small connectivity ordering versus the opposite ordering and a similar trend is also present for the average random ordering in full concordance with our theoretical expectations. See Table 13 for results of related simulation experiments.

These results show that in some cases, it is possible to transform an intractable local learning

problem into a tractable one by employing a global learning strategy (i.e., by exploiting asymmetries in connectivity). Thus the variable order in local-to-global learning may have promise for substantial

speedup and improved quality in real-life data sets (assuming the order of connectivity is known or can be estimated). However the optimal order is a priori unknown for some domain. Can we use local variable connectivity as a proxy to optimal order in real data? The next experiment assumes the existence of an oracle that gives the true local connectivity for each variable. The experiment examines empirically the effect of three orders (low-to-high connectivity, lexicographical (random) order, and high-to-low connectivity order) on the quality of learning and number of CITs in the MMHC evaluation data sets. It also compares the sensitivity of HHC to order.

As can be seen in Figure 12, the order does have an effect on computational efficiency however not nearly as dramatic in the majority of these more realistic data sets compared to the simpler structures of Figure 11. An exception is the Link data set in which low-to-high connectivity allows HHC to run 17 times faster than lexicographical (random) order and 27 times faster than high-to-low connectivity order. For the majority of cases, running these algorithms with lexicographical (i.e.,

!"#$%&'()' *+,-.&%/'()'0 %12&3' %.4%5 #,55,/4' %.4%5 6785 %12&3'%.4%5 #,55,/4' %.4%5 6785 %12&3' %.4%5 #,55,/4' %.4%5 6785 !" 9 : 9:; 9 : <=>?< 9 : ?=>;; #" 99 : ?@A ABC : 9?9=9?@ A : >CC=??@ $" 9@ : 9=9C> 9;B@ : <=><9=:>: 9C : D=:<:=?:: %" <? : 9=A;@ E E E E E E &" >> : >=9A: E E E E E E '" ?@ : D=:>9 E E E E E E (" D> : ;=@AA E E E E E E )" C9 : @=A>A E E E E E E *" C; : 99=??@ E E E E E E !"" AD : 9?=;CC E E E E E E (&.%&')&(#'-(FE2(E+,4+' *(//%*2,G,2H &3/.(#'(&.%& I3G%&34%'&%5"-25'(G%&'9:'(&.%&5J (&.%&')&(#'+,4+E2(E-(F' *(//%*2,G,2H

Table 13: Results of simulation experiment with HHC algorithm. The graphical structure is de- picted on Figure 11(b). HHC was run on a random sample of size 1,000 with G2test for conditional independence,α=0.05, max-k=5, Dirichlet weight = 10, BDeu priors. Empty cells correspond to experiments when the algorithm did not terminate within 10,000,000 CITs.

random) order is very robust and does not affect quality adversely but affects run time and number of CITs to a small degree (details in Table S21 in the online supplement).

Thus, while connectivity affects which variable order is optimal in LGL algorithms, ranking by local connectivity does not exactly correspond to the optimal order. Figure S3 in the online supplement shows the number of CITs plotted against true local connectivity in each one of the 9 data sets used in this section. Related to the above, Figure S4 in the supplement also shows the distribution of true local connectivity in each data set. Consistent trends indicating the shape of the distributions by which the degree of local connectivity may determine an advantage of orderings low-to-high to high-to-low connectivity are not apparent in these data sets.

We hypothesize that more robust criteria for the effect of variable ordering in LGL algorithms can be devised. For example, the number or total cost of CITs required to locally learn the neigh- borhood of each variable. Such criteria are also more likely to be available or to be approximated well during practical execution of an algorithm than true connectivity. A variant of HHC, algorithm HHC-OO (standing for HHC with optimal order) (Aliferis and Statnikov, 2008) orders variables dynamically according to heuristic approximations to the total number of CITs for each variable. We also conjecture that the strategy for piecing together the local learning results strongly interacts with the local variable ordering to determine the tradeoff between the quality and efficiency of LGL algorithms. Evaluation of these hypotheses is outside the scope of the present paper.