• No se han encontrado resultados

7. REFERENTES CONCEPTUALES

7.4. Altruismo visto desde la integración escolar

In this section, we tackle the following problem: Given three uncertain objects A, B and

Qin a multidimensional spaceRd, determine whether objectA is closer toQthanB w.r.t.

a distance function defined on the objects in Rd. If this is the case, we say A dominates

B w.r.t. Q. In contrast to Chapter 5, where this problem is solved for certain data, in the context of uncertain objects this domination relation is not a predicate that is either true or false, but rather a (dichotomous) random variable as defined in Definition 30. In the example depicted in Figure 6.1, there are three uncertain objectsA,B andR, each bounded by a rectangle representing the possible locations of the object in R2. The PDFs of A, B

and R are depicted as well. In this scenario, we cannot determine for sure whether object

A dominates B w.r.t. R. However, it is possible to determine that object A dominates object B w.r.t. R with a high probability. If Q = R is the uncertain query object of a probabilistic 1NN queries, we can guarantee that B must have a low probability to be a result of this query, because A has a high probability to be closer to Qthan B.

The problem at issue is to determine the domination probability P(A ≺Q B) as de-

fined in Definition 31. Naively, we can compute P(A ≺Q B) by simply integrating the

probability of all possible worlds in which A dominates B w.r.t. Q exploiting inter-object independency: P(A≺Q B) = Z a∈A Z b∈B Z q∈Q I(a, b, q)·P(A=a)·P(B =b)·P(Q=q)da db dq,

where I(a, b, q) is the following (crisp) indicator function:

I(a, b, r) =

(

1, if dist(a, r)< dist(b, r) 0, else

The problem of this naive approach is the computational cost of the triple-integral. The integrals of the PDFs of A, B and Q may in general not be representable as a closed-form expression and the integral of I(a, b, q) does not have a closed-from expression. Therefore, an expensive numeric approximation is required for this approach. In the rest of this section we propose methods that efficiently derive bounds for P(A ≺Q B), which can be

used to prune objects, thus avoiding integral computations.

6.3.1

Complete Domination

First, we show how to detect whether Acompletely dominates B w.r.t. Q(i.e. if P(A≺Q

B) = 1) based only on the rectangular approximations A,B and Q. The state-of-the- art criterion to detect spatial domination on rectangular uncertainty regions is with the use of minimum/maximum distance approximations. This criterion states that A dominates

B w.r.t. Q if the minimum distance between Q and B is greater than the maximum distance between Q and A. Although correct, this criterion is not tight (cf. Chapter 5), i.e. not each case where A dominates B w.r.t. Q is detected by the min/max- domination criterion. The problem is that the dependency between the two distances betweenAandQand betweenB andQis ignored. Obviously, the distance betweenAand

Qas well as the distance betweenB and Qdepend on the location ofQ. However, sinceQ

can only have a unique location within its uncertainty region, both distances are mutually dependent. To obtain a tighter decision criterion, we adopt the spatial domination concepts proposed in Chapter 5 for rectangular uncertainty regions.

Corollary 6 (Complete Domination). Let A, B, Qbe uncertain objects having rectangular space approximation A, B and Q, respectively. The following implication holds:

6.3 Similarity Domination on Uncertain Data 109 P(A ≺QB) = 1 ⇐ (6.1) d X i=1 max qi∈{Qmini ,Qmaxi } (MaxDist(Ai, qi)p −MinDist(Bi, qi)p)<0,

where Ai, Bi andQi denote the projection interval of the respective rectangular uncertainty

regionA, B andQ on theith dimension; Qmin

i (Qmaxi )denotes the lower (upper) bound

of interval Qi, and p corresponds to the used Lp norm. The functions MaxDist(Ai, qi)

and MinDist(Ai, qi) denote the maximal (respectively minimal) distance between the one-

dimensional interval Ai and the one-dimensional point qi.

Proof. In Section 5.4 of Chapter 5 it is shown that the right hand side of implication 6.1 is equivalent to the following statement:

∀a∈A, b ∈B, q∈Q :dist(a, b)<dist(q, b) (6.2) which is true if and only if for each a ∈ A, b ∈ B, q ∈ Q it holds that a is closer to b than q, where A, B and Q are rectangular approximations. By definition of the possible worlds model, the set of combinations a ∈ A, b B, q Q corresponds to a superset of all possible worlds. Consequently, the above Predicate 6.2 implies that

∀a ∈ A, b∈ B, q ∈ Q :I(dist(a, b) < dist(q, b)) = 1 by definition of indicator function I. Using Equation 8.1 we obtain the triple-sum

P(A≺Q B) = X ai∈A X bj∈B X qk∈Q 1·P(ai)·P(bj)·P(qk)

As we can see, the above triple-sum is equal to the sum of the probabilities of all possible worlds which is equal to one. Consequently, we obtain P(A≺Q B) = 1.

In addition, it holds that

Corollary 7.

P(A≺Q B) = 1⇔P(B ≺Q A) = 0

Proof. The above corollary is evident, since in any world where A =a, B =b and Q =q

it holds that (A ≺QB) = 1−(B ≺Q A). Thus, if and only if (A≺Q B) = 1 in all possible

worlds, then (B ≺Q A) = 0 in all possible worlds and vice versa.

Lemma 16 (Complete Non-Domination). Let A, B, Q be uncertain objects having rectan- gular space approximation A, B and Q, respectively. The following statement holds:

P(B ≺QA) = 0 ⇐ d X i=1 max qi∈{Qmini ,Qmaxi } (MaxDist(Ai, qi)p −MinDist(Bi, qi)p)<0,

(a) Complete domination (b) Probabilistic domination Figure 6.2: Similarity Domination.

Proof. This Lemma follows directly from Lemma 6 and Corollary 7.

In the example depicted in Figure 6.2(a), the grey region on the right shows all points that definitely are closer to A than to B and the grey region on the left shows all points that definitely are closer to B than to A. Consequently, A dominates B (B dominates A) if Q completely falls into the right (left) grey shaded half-space.1

6.3.2

Probabilistic Domination

Now, we consider the case where A does not completely dominate B w.r.t. Q. In con- sideration of the possible world semantics, there may exist worlds in which A dominates

B w.r.t. Q, but not all possible worlds may satisfy this criterion. Let us consider the example shown in Figure 6.2(b) where the uncertainty region ofA is decomposed into five partitions, each assigned to one of the five grey-shaded regions illustrating which points are closer to the partition in A than to B. As we can see, Q only completely falls into three grey-shaded regions. This means that A does not completely dominate B w.r.t. Q. However, we know that in some possible worlds (at least in all possible words where A is located in A1, A2 or A3) A does dominate B w.r.t. Q. The question at issue is how to

determine the probability P(A ≺Q B) that A dominates B w.r.t. Q in an efficient way.

The key idea is to decompose the uncertainty region of an object X into subregions for which we know the probability that X is located in that subregion (as done for object A

in our example). Therefore, if neither (A≺Q B) nor (B ≺QA) holds, then there may still

exist subregions A0 ⊂A, B0 ⊂B and Q0 ⊂ Qsuch that (A0 ≺Q0 B0) holds. Given disjunc- tive rectangular decomposition schemes A, B and Q we can identify triples of subregions (A0 ∈ A, B0 ∈ B, Q0 ∈ Q) for which (A0 ≺Q0 B0) holds. Let I(A0, B0, Q0) be the following indicator function:

1Note that the grey regions are not explicitly computed; we only include them in Figure 6.2(a) for

6.3 Similarity Domination on Uncertain Data 111

I(A0, B0, Q0) =

(

1, if (A0 ≺Q0 B0) 0, else

Lemma 17. Let A, B and Q be uncertain objects with disjunctive rectangular object de- compositions A,B and Q, respectively. To derive a lower bound PLB(A ≺Q B) of the

probability P(A ≺Q B) that A dominates B w.r.t. Q, we can accumulate the probabilities

of combinations of these subregions as follows:

PLB(A≺Q B) =

X

A0∈A,B0∈B,Q0∈Q

P(a ∈A0)·P(b ∈B0)·P(r ∈Q0)· I(A0, B0, Q0),

where P(X ∈X0) denotes the probability that object X is located within the region X0. Proof. The probability of a combination (A0, B0, Q0) can be computed by P(a ∈ A0) ·

P(b ∈ B0)·P(r ∈ Q0) due to the assumption of mutually independent objects. These probabilities can be aggregated due to the assumption of disjunctive subregions, which implies that any two different combinations of subregions (A0 ∈ A, B0 ∈ B, Q0 ∈ Q) and (A00 ∈ A, B00 ∈ B, Q00∈ Q,A0 6=A00∨B0 6=B00∨Q0 6=Q00 must represent disjunctive sets of possible worlds. It is obvious that all possible worlds defined by combinations (A0, B0, Q0) where I(A0, B0, Q0) = 1, A dominates B w.r.t. Q. But not all possible worlds where A

dominates B w.r.t. Q are covered by these combinations and, thus, do not contribute to

PLB(A≺Q B). Consequently, PLB(A≺Q B) lower bounds P(A ≺QB).

Analogously, we can define an upper bound of P(A ≺QB):

Lemma 18. An upper bound PU B(A≺Q B) of P(A≺QB) can be derived as follows:

PU B(A≺Q B) = 1−PLB(B ≺Q A)

Naturally, the more refined the decompositions are, the tighter the bounds that can be computed and the higher the corresponding cost of deriving them. In particular, starting from the entire MBRs of the objects, we can progressively partition them to iteratively derive tighter bounds for their dependency relationships until a desired degree of certainty is achieved (based on some threshold). However, in the next section, we show that the derivation of the domination count DomCount(B, Q) of a given object B (cf. Definition 32), which is the main module of prominent probabilistic queries cannot be straightfor- wardly derived with the use of these bounds and we propose a methodology based on generating functions for this purpose.

Figure 6.3: A1 and A2 dominate B w.r.t. Q with a probability of 50%, respectively.