CAPÍTULO I: Cultura, turismo y la Mama Negra
1.4. La Mama Negra, el atractivo
1.4.2. Una Historia
6.2.1 Preliminaries
The concept of Markov blankets was first introduced by Pearl [93] in Bayesian networks. We first review Bayesian networks and then Markov blankets in detail. Then, we justify why Markov blankets can be used in subspace outlier detection. Hereafter, the terms “variable” and “attribute” will be used interchangeably.
Given a training data set O = {o1, o2,· · · , on} containing n training instances, D is
defined on the set of dimensions D = {D1, D2,· · · , Dd} where d is the dimensionality.
Let P be a joint probability distribution of a set of random variables D via a directed acyclic graph G. We call the triplet hD, G, Pi a Bayesian network if hD, G, Pi satisfies the Markov condition: every variable is independent of any subset of its non-descendant variables conditioned on its parents in G[93]. A simple Bayesian network of Lung Cancer as an example is shown in Figure 6.1.
Anxiety Peer Pressure
Yellow
Fingers Smoking Genetics
Allergy Coughing Fatigue Attention Disorder Lung Cancer
Figure 6.1: Example of a Bayesian network [75]
With the Markov condition, a Bayesian network encodes the joint probability P over a set of variables D and decomposes it into a product of the conditional probability dis- tributions over each variable given its parents inG. AssumingP a(Di) is the set of parents
of Di(1≤i≤d) in G, the joint probabilityP is
P(D1, D2,· · ·, Dd) = d Y i=1 P(Di|P a(Di)) .
Definition 6.1 (Faithfulness). [93] A Bayesian network hD, G, Pi is said to satisfy the faithfulness condition if and only if every conditional independence entailed by G is also present in P.
Theorem 6.1. [93] If a Bayesian network satisfies the faithfulness condition, then the Markov blanket of a variable X in the Bayesian network is the set of children, parents, and spouses ofX.
6.2.2 Markov Blanket Subspaces
In subspace ouliter detection, the existing studies showed that outliers only appear in correlated subspaces [82]. The challenge is that the number of candidate subspaces for
possible correlated subspaces is exponentially large. To tackle this challenge, we propose the concept of Markov blanket subspaces for efficient selection of correlated subspaces.
In a Bayesian network, for any attribute X, its Markov blanket satisfies the following property.
Property 6.1. [5] In a Bayesian network hD, G, Pi, if a subsetM B(X)⊆D− {X} is a Markov blanket of X, then the following holds
DVKL(P(X|M B(X))| |P(X|D)) = 0
, whereDVKLis the Kullback-Leibler (KL) divergence [70] between the estimated distribution
P(X|M B(X))and the true distribution P(X|D), given by
DVKL(p||q) = X f∈D p(f)logp(f) q(f) .
From Property 6.1, we can see that, given an attribute X, its Markov blanket renders
X to be statistically independent from all the remaining attributes in a Bayesian network. Therefore, only the attributes in the Markov blanket of X are highly correlated with X, and provide the meaningful information to X. The remaining attributes are irrelevant to
X.
For instance, in Figure 6.1, only the attributes in the Markov blanket of “Lung cancer” provide the meaningful information for detecting the abnormal patients with a lung cancer disease, while the remaining attributes not in this Markov blanket are irrelevant to“Lung cancer”. Furthermore, according to [5], for any attribute, its Markov blanket is unique in a faithful Bayesian network.
Property 6.2. [5] If a Bayesian network satisfies the faithfulness condition, then the Markov blanket of each attribute is unique.
Accordingly, in Bayesian networks, an attribute and its Markov blanket form a natural subspace that makes this attribute independent of the remaining attributes.
Definition 6.2 (Markov blanket subspaces). A Markov blankt subspace consists of an attribute and its Markov blanket.
Given a data set O with dattributes, it is natural for us to considerd Markov blanket subspaces as candidate subspaces for outlier detection. Interestingly, to efficiently select meaningful subspaces from 2d−1 possible subspaces, M¨uller et al. [82, 65, 17] proposed the concept of high contrast subspaces. In the following, we analyze the connections between Markov blanket subspaces and high contrast subspaces.
Given a subspaces⊂Dand∀Di ∈s, a high contrast subspace is measured by comparing
conditional probability density P(Di|s− {Di}) to the corresponding marginal probability
density P(Di) [65]. In contrast, if a subspace s is an uncorrelated one, it satisfies the
following equation P(D1, D2,· · ·, D|s|) = |s| Y i=1 P(Di).
For an uncorrelated subspaces, the joint probability densityP(D1, D2, ..., D|s|) is equal
to the product of the marginal probability of each attribute in s, and thus the contrast between the marginal densityP(Di) and its corresponding conditional probability density
P(Di|s− {Di}) is equal to 1, that is,
∀Di ∈s, P(Di)/P(Di|s− {Di}) = 1.
Accordingly, an uncorrelated subspace is not a high contrast subspace, and does not contain any meaningful outliers [82, 65]. A high contrast subspace is only the subspace that shows high dependencies between attributes in this subspace. Using the correlation of dimensions in a subspace as an objective function for computing subspace contrasts, if a subspacesis a candidate of high contrast subspace, it should satisfy the following equation
∃Di ∈s, P(Di)/P(Di|s− {Di})>1.
The number of candidates of high contrast subspaces is exponentially large. Furthermore, to find high contrast subspaces, in any subspaces, we need to search for condition sets from 2|s|attribute sets for each attributeDi ∈sto compute the contrast between its conditional
probability densities and its marginal densities. It is very computationally costly or even prohibitive.
With the discussion above, we can get the observation that for any attribute Di ∈
D(1≤ i≤ d), its Markov blanket M B(Di) is the minimum condition set that makes the
Markov blanket subspace,{Di} ∪M B(Di), be a high contrast subspace. The explanation is
holds. So the Markov blanket subspace, {Di} ∪M B(Di), is a high contrast subspace.
Assuming ∀s ⊂ D, M B(Di) ⊂ s, according to Property 6.1, we get P(Di)/P(Di|s) =
P(Di)/P(Di|M B(Di)). Thus, the M B(Di) is the minimum condition set that makes
P(Di)/P(Di|M B(Di))>1 hold.
The observation illustrates that for any attribute Di ∈ D(1 ≤ i ≤ d), its Markov
blanket not only provides the meaningful information toDi, but also forms a minimum size
of condition set for computing the marginal densityP(Di) and the conditional probability
density P(Di|M B(Di)). Accordingly, we can exactly use the d Markov blanket subspaces
as the candidates of high contrast subspaces for outlier detection instead of enumerating 2d−1 possible subspaces.
Table 6.1 lists the frequently used notations in this chapter. Notation Description
D={D1, . . . , Dd} ad-dimensional space
O={o1, o2,· · ·, on} a training data set withn training instances
X an arbitrary attribute,X=Di(1≤i≤d)∈D
P a joint probability distribution
G a directed acyclic graph
s a subspace,s⊂Dand ∀Di∈s
P a(Di) the set of parents of Di(1≤i≤d) in G
M B(X) a Markov blanket subspace with respect to attributeX
Table 6.1: Summary of frequently used notations in Chapter 6