PROYECTO EXPEDIENTE SOLICITUD FECHA UBICACIÓN Botadero la

For the intersect_boolean predicate, it suffices to find a single intersecting query and database container pair in order to issue the database id. Obviously, it is desirable to detect such intersecting pairs as early as possible in order to avoid unnecessary

blobintersection tests. In this section, we present two optimizations striking for this

goal. First, we introduce a fast second filter step which tries to determine intersecting pairs without examining the BLOBs. This test is entirely based on aggregated information of the gray containers. Secondly, we introduce a probability model which leads to an ordering for the candidate pairs such that the most promising blobinter-

section tests are carried out first. In order to put these optimizations into practice, we

pass D(C_gray) and H(C_gray) as additional parameters to the user-defined aggregate function table. Thus, the following two optimizations can easily be integrated into this user-defined aggregate function. If the fast second filter step determines an intersecting container pair, all other candidate pairs are deleted so that the resulting table of candidate pairs, called ctable, consists only of one intersecting pair. Nevertheless, there might be database objects where this second filter step does not determine an intersection for any of the corresponding candidate pairs. In this case, we sort the candidate pairs at the end of our user-defined aggregation function table such that the most promising blobintersection tests are carried out first.

Optimizations for one-dimensional gray intervals. Let us first mention that a gray interval with maximum density is also called a black interval. In this paragraph, we will discuss how gray and black intervals have to look like so that we can decide whether two interlacing intervals intersect each other or not without accessing their BLOBs. These optimizations are only suitable for the intersect_boolean-predicate.

Two Black Intervals. If two black intervals interlace, they necessarily intersect as

well.

Black and Gray Intervals. In this case, the situation is a little bit more complicated.

In almost all cases where a black interval I_black = (l_black, u_black) interlaces a gray interval I_gray, with H(I_gray) = (l_gray, u_gray), there is an intersection of I_black and I_gray as well. If any of the two conditions depicted in Table 3 holds, then I_black and I_gray intersect.

Object Relational Query Processing 141

Two Gray Intervals. Obviously, two gray intervals I_grayand I’_gray, with H(I_gray) = (l_gray, u_gray) and H(I’_gray)=(l’_gray, u’_gray), which interlace do not have to intersect. Fortunately, there are two cases where we can assert that they intersect without examining the detailed black interval sequences. The two cases are illustrated in Table 4.

No intersection. There are only two situations, depicted in Table 5, where we can

determine that two interlacing intervals do not intersect. In this case, the interlacing

Condition Explanation Figure

L(I_black) >

G(I_gray)

If the black interval is longer than the maximum gap between two black intervals of the detailed black interval sequence of I_graythan the two intervals intersect.

u_interlace = u_gray or

l_interlace = l_gray

If one of the two conditions presented in the box on the left holds, then the black and gray interval intersect. This is due to the fact that the gray intervals end and start with black intervals.

Table 3: Intersection between an interlacing black and gray interval.

Condition Explanation Figure

u_gray = u’_gray or

l_gray = l’_gray or

l_gray = u’_gray

If one of the three conditions depicted in the box on the left holds, then the two gray intervals intersect (gray intervals start and end with black intervals). This test is similar to the polygon boundary test in [HJR 97].

N_w(I_gray) +

N_w(I’_gray) <

L_interlace

If the sum of the number of the “white cells” of two gray interlacing intervals is smaller than the length of the interlacing area, then the two intervals necessarily intersect. This is the generalization of the first case of Table 3. This test is similar to the false area

test in [BKSS 94].

Table 4: Intersection between two interlacing gray intervals.

smaller than G(Igray) G(Igray) larger than linterlace=lgray uinterlace=ugray l’gray = ugray

u’gray = ugray

l’_gray = l_gray

L (I_gray)

L’ (I_gray)

Linterlace

interval pair is not added to ctable, i.e. we do not carry out the blobintersection test for this interval pair.

False Area Test. In this paragraph we generalize the false area test [BKSS 94] to multi-dimensional intervals, i.e. boxes and tiles. If there exist two containers which interlace, and both containers have a high density, i.e. the number of white cells is rather small, then we know that the two objects intersect without accessing the data stored in the BLOB.

Lemma 3. Let O = (id, 〈 C₁ , ..., C_m 〉) and O’ = (id’, 〈 C’₁ , ..., C’_m’ 〉) be two objects

with their hulls H(C_gray) = and H(C’_gray) = . Then the

following statement holds:

Proof. Let be the intersection volume

between two gray containers C_i and C’_j. Then and

holds. As ≥ >

V holds, there must be at least one object voxel v belonging to O and O’, i.e. O and O’ intersect.■

Condition Explanation Figure

N_b(I_gray) = 2 and

l_gray < l’_gray and

u_gray > u’_gray

If I_gray consists only of two black cells and

I’_grayis totally “included” in I_gray, then we know that the two intervals cannot intersect each other, although they interlace.

N_b(I_gray) =

N_b(I’_gray) = 2 and

If both gray intervals consist only of two black cells and, furthermore, have distinct interval bounds, then the two intervals cer- tainly do not intersect.

Table 5: No intersection between two interlacing gray intervals.

u’gray < ugray

L_gray L’_gray l_gray < l’_gray

Nb(Igray)= 2

l_gray≠l’_gray≠ u_gray≠u’_gray

u’_gray=u_gray

Lgray L’gray l’_gray=l_gray N_b(I_gray)= 2 N’_b(I_gray)= 2 Xd k=1 hl k h_uk [ , ] _Xd k=1 h’l k h’_uk [ , ] i {1..m} j {1..m’}:N_w( )C_i N_w( )C’_j (min h( k_u,h’_uk)–max h( _lk,h’_lk)+1) k=1 d

∏

< + ∈ ∃ ∈ ∃ intersect⇒ (O O', ) = true V (min h( _uk,h’_uk)–max h( k_l,h’k_l)+1) k=1 d ∏ = N_b( )C_i ≥V–N_w( )C_i N_b( )C’_j ≥V–N_w( )C’_j N_b( )C_i +N_b( )C’_j V–N_w( )C_i +V–N_w( )C'_j

Spatial Join 143

Ranking. As shown above, we can pinpoint, based on relatively little information, whether two container pairs intersect or not. Nevertheless, there might be cases where we cannot do this for any of the database and query candidate pair. But if the hulls of two gray containers intersect, it is still helpful if we can predict how likely an object intersection according to Definition 18 might be in order to rank the pairs properly in the set of all candidate pairs belonging to the same database object.

Lemma 4. Let C_gray and C’_gray be two gray intervals with densities d = D(C_gray),

d’ = D(C’_gray) and hulls H(C_gray) = and H(C’_gray) = . Fur-

thermore, let denote the intersec-

tion volume. Then the probability for an object intersection is:

Proof. As we assume that the black cells of both gray containers are equally dis- tributed, we can conclude that N = d×V black cells of C_gray are included in the testing area and likewise N’ = d’×V black cells from C’_gray. The number of all dif- ferent possibilities how N’ black cells of C’_gray can be placed over the V cells of the testing area is . Assuming that all N black cells of C_grayare already distributed over V, then the number of different possibilities how N’ black cells of

C’_gray can be placed over the remaining V-N white cells such that no intersection

occurs is .

Thus, the probability for a non-intersection is equal to . ■

6.5 Spatial Join

One of the most common query types in spatial database management systems is the spatial intersection join [GG 98]. This join retrieves all pairs of intersecting ob- jects. A usual spatial join example of 2D geographical data is “find all cities which are crossed by a river”. In the automobile industry, spatial join processing of more complex 3D high-resolution objects is required. One query example is to find all temperature sensitive parts touching hot parts of the engine or the exhaust pipe. The high complexity of the objects and the fact that these objects are highly clustered yield a great challenge for efficient join processing.

Xd k=1 hl k h_uk [ , ] _Xd k=1 h'l k h'_uk [ , ] V (min h( _uk,h'_uk)–max h( k_l,h'k_l)+1) k=1 d

∏

= >0 P 1 V–d×V d’×V     V d’×V     ⁄ – = V N’     V d’×V     = V–N N’     V–d×V d’×V     = V–d×V d’×V     V d’×V     ⁄

In this section, we demonstrate the benefits of our cost-based decompositioning approach of high-resolution spatial objects by integrating it into a simple nested-loop join procedure and a more efficient sort-merge join variant. Both methods do not assume the presence of pre-existing spatial indices on the relations.

We start with two relations R and S, both containing a set of tuples (id, mbr, link), where id denotes a unique object identifier, mbr denotes the minimal bounding rect- angle conservatively approximating the respective object and link refers to an exter- nal file containing the complete voxel set of the object (cf. Figure 75a). In this section, we assume that the voxel representation of the objects is accurate enough to determine intersecting objects without any further refinement step. In order to carry out the intersection tests efficiently, we decompose the high-resolution voxelized objects. We store the generated approximations in an auxiliary temporary relation (cf. Figure 75b), which allows us to reload certain approximations on demand keep- ing the main-memory footprint small.

The remainder of the section is organized as follows: In Section 6.5.1, we shortly present the related work in the area of spatial joins discussing three different classes dependent on the availability of index structures. In Section 6.5.2 and Section 6.5.3, we introduce a cost-based grouping approach into gray intervals, quite similar to the one presented in Section 6.3. The main difference is that we do not assume a potential query distribution, but use the statistics of the join partner relation as input parameter for the grouping algorithm. In Section 6.5.4, we introduce two different join algorithms based on our cost-based decompositioning. The experimental evaluation is deferred to Section 6.6.5, where we will present the benefits of our approach.

. . .

Figure 75: Spatial join procedure.

a) Object relation, b) Auxiliary relation with decomposed object approximations

Object Relation a) id A B C mbr link ’file_A’ ’file_B’ ’file_C’ Auxiliary Relation A B b) approx id

Spatial Join 145

6.5.1 Related Work

In this section, we will shortly discuss different aspects of efficient spatial join processing of complex spatial objects. Numerous spatial join algorithms have been proposed over the last decade. Most of them rely on the paradigm of multi-step query processing [BKSS 94]. A fast filter step excludes all objects that cannot satisfy the join predicate. The subsequent refinement step is applied to the join candidate pairs which are returned from the filter step. Thereby, the main focus of research is on the

filter step which is applied to geometric object approximations. On the basis of the

availability of indices for processing the filter step, spatial join methods operating on two relations can be divided into three classes:

• Class 1: Index on both relations • Class 2: Index on one relation • Class 3: No indices

The common solutions for the spatial join methods of Class 1 are the algorithms based on matching two R-trees as presented in [BKS 93b]. In the last few years, the international research community has focused on methods of Class 2 and Class 3. A simple Class 2 approach is the index nested loop, where each tuple of the non- indexed relation is used as query applied to the indexed relation. In [LR 94] seeded trees were introduced in order to process spatial joins efficiently when only one R-tree is available. The authors propose to create a second R-tree using the available tree as a skeleton and apply thereafter a Class 1 algorithm. For spatial join algorithms of Class 3, initially no indices are available which could be used to improve the query performance. Several techniques have been proposed which partition the tuples into buckets and then use hash based techniques, e.g. the spatial-hash join [LR 96] or the

partition based spatial merge join [PD 96]. The scalable sweeping-based spatial join

[APR+ 98] is w.r.t. worst-case efficiency the most promising algorithm for processing spatial joins. Let us note that we use a simplified version of this algorithm as starting point for our sort-merge join variant presented in Section 6.5.4. Furthermore, there exist different other approaches to improve Class 3 join algorithms, e.g. in [DS 00] the problem of redundancy and duplicate detection was investigated and in [DSTW 02] a generic technique called progressive merge join (PMJ) was introduced that eliminates the blocking behavior of sorted-based join algorithms.

Most of the approaches presented in the literature focus on the efficient computa- tion of the filter step where MBRs are used for approximating spatial objects. Our approach deals with very complex 3D objects, where the MBRs form rather poor approximations, leading to low filter selectivities. In the following sections, we will present a cost-based decompositioning algorithm for spatial join processing, which aims at finding an optimal trade-off between low redundancy and high approxima- tion quality.

6.5.2 Cost Model

For our decompositioning algorithm we take the estimated join cost between a gray interval I_gray and a join-partner relation T into account. Let us note that T can be either of the tables R or S (cf. Figure 75a), or any temporary table containing derived information from the original tables R and S (cf. Figure 75b). The overall join cost

cost_join for a gray interval I_gray and a join-partner relation T are composed of two parts, the filter cost cost_filter and the refinement cost cost_refine:

cost_join(I_gray,T) = cost_filter(I_gray,T) + cost_refine(I_gray,T).

Filter Cost. The cost_filter(I_gray,T) can be computed by the expected number of gray intervals I_gray,Tof the join partner relation T. We penalize each intersection test by a constant c_f which reflects the cost related to the access of one gray interval I_gray,T and the evaluation of the join predicate for the pair (H(I_gray),H(I_gray,T)):

cost_filter(I_gray,T) = N_gray(T) · c_f,

where N_voxel(T) (number of voxels) ≥ N_gray(T) (number of gray intervals) ≥

N_object(T) (number of objects) holds for the join-partner relation T. The value of the parameter c_f depends on the used system.

Refinement Cost. The cost of the refinement step cost_refineis determined by the selectivity of the filter step. For each candidate pair resulting from the filter step, we have to retrieve the exact geometry B(I_gray) in order to verify the intersection predicate. Consequently, our cost-based decompositoning algorithm is based on the following two parameters:

• Selectivity σ_filterof the filter step.

Spatial Join 147

The refinement cost of a join related to a gray interval I_gray can be computed as follows:

cost_refine(I_gray, T) = N_gray(T) · σ_filter(I_gray,T) · cost_eval(I_gray).

In Section 6.3.4, it was shown how to calculate the evaluation cost costeval. In the following, we show how to estimate the selectivity of the filter step σ_filter. We use simple statistics of the join-partner relation T to estimate the selectivity σfilter(Igray,T). In order to cope with arbitrary interval distributions, histograms can be employed to capture the data characteristics at any desired resolution (cf. Section 4.3.1). The selectivity σ_filter(I_gray,T) related to a gray interval I_gray can be determined by using an appropriate interval histogram H(T,ν) of the join partner relation T. Based on H(T,ν), we compute a selectivity estimate by evaluating the intersection of I_gray with each bucket span b_i,_ν (cf. Definition 7 and Figure 38).

σfilter(Igray, T) = ,

Join cost. To sum up, the join cost cost_join(I_gray) related to a gray interval I_gray and a join-partner relation T can be expressed as follows:

cost_join(I_gray,T) =N_gray(T) · (c_f +σ_filter(I_gray,T) ·cost_eval(I_gray)).

6.5.3 Decompositioning Algorithm

For each object, there exist a lot of different possibilities to decompose the voxelized object into a gray interval sequence.

Lemma 5. Let O_voxel= (id, {v₁, ..., v_n}) be a voxelized object and be a space filling curve. Furthermore, let W = {(l, u) ∈ IN2, l ≤ u} be the domain of intervals and let b₁ = (l₁, u₁), …, b_n = (l_n, u_n) ∈ W be a sequence of intervals where

u_i+ 1 < l_i+1representing the set {ρ(v₁), ...,ρ(v_n)}. Then, there exist O(2n) different

gray interval sequences O_gray= (id, 〈 , , …,

〉).

Proof. The interval sequence representing O_voxel consists of n-1 “gaps”. For each of these gaps we can decide whether it is included in a gray interval, or whether it separates two gray intervals. Thus we have 2n-1 different possible gray interval sequences. ■ overlap H I( ( _gray),b_i_,_ν) β ---⋅n_i i=1 ν

∑

n_i i=1 ν

∑

--- ρ:INd→IN b_i 0+1,...,bi1 〈 〉 b_i 1+1,...,bi2 〈 〉 b_i m–1+1,...,bim 〈 〉

Based on the formulas for join cost related to a gray interval I_gray and a join-part- ner relation T, we can find a cost optimum decompositioning algorithm. Unfortunate- ly, as shown in the above lemma, there exist exponentially many decompositioning possibilities, which results in a high runtime of an optimum cost-based decompositioning algorithm. In this section, we will present a greedy algorithm with a guaran- teed worst-case runtime complexity of O(n) which produces decompositions helping to accelerate the query process considerably (cf. Section 6.6.5).

For fulfilling the grouping rules presented in Section 6.3, we introduce the following top-down decompositioning algorithm for gray intervals, called JoinGroupCon

(cf. Figure 76). JoinGroupCon is a recursive algorithm which starts with a gray inter- val I_gray initially covering the complete object. In each step of our algorithm, we look

for the longest remaining gap. We carry out the split at this gap, if the estimated join cost caused by the decomposed intervals is smaller than the estimated cost caused by our input interval I_gray. The expected join cost cost_join(I_gray,T) can be computed as described above. Data compressors which have a high compression rate and a fast decompression method, result in an early stop of the JoinGroupCon algorithm gener- ating a small number of gray intervals. Let us note that the inequality “cost_gray>

cost_dec” in Figure 76 is independent of N_gray(T), and thus N_gray(T) is not required

In document Contribución a la actualización del plan de gestión integral de residuos sólidos del Municipio de Santa Rosa de Cabal (página 65-68)