• No se han encontrado resultados

ENVEJECIMIENTO Y CALIDAD DE VIDA 3.1 El proceso del envejecimiento

Structure

Structure learning algorithms for BNs can be divided into two categories: constraint- based algorithms and score-based algorithms. Constraint-based algorithms aim to determine CIs in the dataset, and build a structure satisfying these CIs. The tests required for determining CIs may become computationally infeasible as the BN gets

37

larger. Therefore, these algorithms make several simplification assumptions such as limiting the maximum number of parents that a variable can have.

A common statistical test for identifying CIs in data is the χ2 test. This test calculates the false-rejection probability of a CI hypothesis. The mutual information measure, which is mathematically related to the χ2 test, is also used for testing the same

hypothesis. A more recent CI test for constraint-based learning is developed by Dash and Druzdzel (2003). A non-parametric test is proposed by Margaritis (2004).

Constraint-based algorithms such as IC (Pearl and Verma, 1991) can learn a part of the causal relations from data. However, the true – complete – causal structure is not identifiable from the data. Even if a learning algorithm identifies all of the CIs in the probability distribution, it may not find the true causal structure as multiple BN structures can represent the same probability distribution (see Section 2.2). Moreover, since data is noisy, we may never be sure about the CIs identified by the learning algorithm. Notable constraint-based structure learning algorithms include IC (Pearl and Verma, 1991), LCD (Cooper, 1997), PC (Spirtes et al., 2001), Grow- shrink (Margaritis, 2003) and TDPA (Cheng et al., 2002).

Scored-based algorithms aim to find the BN structure that maximises a likelihood score. Adding edges to a BN increases the likelihood of representing the probability distribution but it can also reduce the quality of parameter estimation by dividing the data. Therefore, the scoring functions for these algorithms are often a combination of the goodness of fit and penalty for additional edges. Commonly used scoring functions include the Bayesian information criterion (Cruz-Ramírez et al., 2006; Schwarz, 1978), minimum description length (Lam and Bacchus, 1994), minimum message length (Wallace et al., 1996; Wallace and Korb, 1999) and BDe score (Heckerman et al., 1995).

Based on the selected scoring function, a score-based algorithm searches the space of possible BN structures to find the structure with the maximum score. The search is done by removing, adding or reversing edges between the variables available in data. The algorithms can either search the space of singular BN structures or the space of equivalent structure classes. Notable search algorithms include Cooper and

38

Herskovits (1992), Glover and Laguna (1997), Chickering (2003; 1996), Chickering and Meek (2002), and Castelo and Kocka, (2003).

Tsamardinos et al. (2006) proposed a combination of score-based and constraint- based methods for structure learning. Their algorithm, called max min hill climbing (MMHC), defines a skeleton for the BN structure based on a constraint-based method, and orients the edges in the skeleton by maximising a scoring function.

Structure learning is more complicated when missing values exist in the data. Calculation of the scoring functions becomes more difficult as these functions do not decompose when missing values exist. Daly et al. (2011) and Koller and Friedman (2009b) provide a thorough review of structure learning methods for complete and incomplete data.

Parameters

A popular approach for parameter learning is to find the parameters that maximises the likelihood of the model given the data. For discrete variables, the maximum likelihood estimates can be found by calculating the related conditional probabilities in the data. Replacing zero probabilities with small values can increase the performance of the model in other datasets. Parameters can also be estimated by a Bayesian approach, which uses a prior distribution, representing the background knowledge, for the parameters and updates the prior based on data. Bayesian approach can provide better results especially for small datasets as it includes expert knowledge into parameter learning.

Parameter learning becomes more difficult when data contains missing values. A simple way to deal with missing values is to complete the data by assigning values to them. The values can be assigned randomly, sampled from a distribution or estimated from the data. This approach is called imputation in statistics. After the missing values are assigned, standard parameter learning methods can be used.

Expectation-maximisation (EM) is an iterative algorithm that uses the BN structure to deal with missing values (Lauritzen, 1995). EM starts with assigning initial values either to the BN parameters or to the missing values. In each iteration, EM calculates the parameters based on expected values of the missing values, and it updates the

39

expected values based on the new parameters. EM is guaranteed to converge to a local maximum. EM has also been applied to learn the parameters of canonical models such as noisy-OR (Meek and Heckerman, 1997).

Bayesian learning can also be used for datasets with missing values. While calculating the posteriors in Bayesian learning is often trivial for complete datasets, it becomes computationally expensive, and sometimes infeasible, when missing values are present. In complete datasets, the parameters of different CPDs are independent of each other, and the posterior often has a compact form that can be solved analytically. However, the parameters become correlated when missing values exists. A thorough introduction to Bayesian parameter learning with complete and incomplete data is presented by Koller and Friedman (2009c; 2009d).