• No se han encontrado resultados

MUSEOS VATICANOS + ROMA CRISTIANA + ALMUERZO

Our algorithm for discovering bases takes advantage of previous generated and tested combi- nations to decide which alternatives must be explored, according to the necessary conditions introduced in Section 4.5.1. This algorithm is devised according to three sets:

Feasible bases: In the ithiteration, this set represents those combinations of size i, satis- fying the three necessary conditions introduced in Section 4.5.1.

Candidate sets: In the ithiteration, this set contains those feasible bases refuted as bases with data.

Bases: In the ithiteration, this set contains those feasible bases, up to size i, verified as bases with data.

Our algorithm has two inputs: the concept we are looking bases for (for example, the end- DurationPrice), and its FD-tree (see Figure 4.9). The algorithm starts considering each root concept as a feasible base (see Figure 4.11, step 3). This is sound, since root concepts are unary sets fulfilling the necessary conditions by definition (and thus, we can directly consider them feasible bases). Every combination in the feasible bases is verified (see step 4ci and Section 4.5.2.1) to see if, according to data, it is indeed a base (if it is, this combination is added to the base sets; see step 4ciA) or, if it is not (then, it is added to the candidate sets; see step 4ciiA).

Note that we only explore the FD-descendants of a combination Z if Z determines A (see step 4ci and Section 4.5.2.1). It is a direct application of Prop. 5: the FD-descendants of a given combination Z cannot determine A if Z does not determine A. From here on, we denote by the Intermediate Set Rule (ISR) this application of Prop. 5 in our algorithm. A combination is generated by ISR, in the gen comb by FD function, if its direct FD-ancestors are known to determine A (the reason why this function needs the base sets as parameter). Importantly, ISR only needs to check the direct FD-ancestors to decide if a given FD-descendant must be generated or not. This holds because FD-ancestors are generated either in the gen comb by FD function (by ISR) or in gen comb by SS, and both functions guarantee that new combinations generated

function seek bases (Concept A, Fd-Tree M) returns Set<Base>

1. Set<Concept> Comb; Ordered Set<Comb> Candidates Sets, Feasible Bases; 2. int i:=1; Set<Comb> Bases := {};

3. Feasible Bases := Get Root Concepts(A,M); 4. while(Feasible Bases != ∅)

(a) Candidates Sets := {};

(b) Comb := Get First Combination(Feasible Bases); (c) while(Comb != ∅)

i. if(Determines(Comb,A)) then A. Bases += Comb;

B. if (Has fd-Descendants(Comb,M)) then

Feasible Bases += Gen Comb by FD(Comb, Bases, M); ii. else

A. Candidates Sets += Comb; iii. Feasible Bases -= Comb;

iv. Comb := Get Next Combination(Feasible Bases); (d) i++;

(e) Feasible Bases := Gen Comb by SS(i, Bases, Candidates Sets, M); 5. return Bases;

Figure 4.11: AMDO: an algorithm for discovering bases

do fulfill the three necessary conditions (see Section 4.5.3 for further details on the former, and Section 4.5.2.2 for further details on the latter).

For example, if {beginningDate, endingDate, rentalDuration} is a base, then by means of ISR we must add {beginningDate, endingDate, rentalDuration- Name}, {beginningDate, endingDate, minimumDuration}, {beginningDate, endingDate, maximumDuration} and {beginningDate, endingDate, time- Period} to the feasible bases. Given a verified base, the number of new combinations added to the feasible bases by applying ISR, is, in the worst case, linear regarding the number of direct FD-descendants the base has (see the properties of FD-edges in Section 4.5.1.2).

Since combinations generated fulfill the necessary conditions, we directly add them to the feasible bases set. Thus, each new combination is eventually verified as a base. If it is a base then, we iteratively apply ISR, and we continue exploring its FD-descendants. Interestingly, note that a ISR-generated combination, which is refuted as a base (i.e., the determines function re- turns false and therefore, it is queued in the candidate sets), may give rise to bases of bigger size than i when combined with other SS-descendants in step 4e. Following our example, suppose that {beginningDate, endingDate, rentalDurationName} happens to be a base, but not the rest of FD-descendants generated. In this case, according to ISR, this combination could generate new FD-descendants, which would be queued in the feasible bases set (how- ever, this is not the case, as none of the concepts in this combinations have FD-descendants).

function Gen Comb by SS (int i, Set<Comb> Bases, Ordered Set<Comb> Candidates Sets, FdTree M) returns Set<Comb> 1. Set<Comb> Combinations := {};

2. For(int j = 0; j < sizeof(Candidates Sets); j++) (a) CS1 := get candidate set(Candidates Sets, j); (b) For(int z = j+1; z < sizeof(Candidates Sets); z++)

i. CS2 := get candidate set(Candidates Sets, z);

ii. if (Have (i-1) Concepts In Common(CS1,CS2) AND (i != 2 OR orthogonal(CS1,CS2))) A. if(Find Subsets(CS1,CS2, Bases, Candidates Sets, z)) then

B. Combinations += Eliminate Duplicates(CS1 ∪ CS2); 3. return Combinations;

Figure 4.12: AMDO: an algorithm to compute SS-descendants

Regarding the rest of combinations, they would be queued in the candidate sets and eventu- ally, they could generate (i+1)-sized sets (for example, {beginningDate, endingDate, minimumDuration, carGroup} from {beginningDate, endingDate, minimum- Duration}).

When the current algorithm iteration is done (i.e., all the i-sized combinations have been treated; see step 4c), function gen comb by SS generates feasible bases of size i+1 from the i- sized candidate sets (see step 4e and Section 4.5.2.2 for further details). The algorithm iterates until we are not able to generate feasible bases of size i+1.

4.5.2.1 The determines Function

This function is called when the three necessary conditions are guaranteed (i.e., we have identi- fied a feasible base). Then, we verify if this combination determines A by querying data. Prior to query the instances, we first introduce a final pruning rule:

Proposition 6. Let Z be a feasible base. We say that Z is yet a feasible base, if it is able to identify all instances of A. In other words, if the cardinality of A is lesser (or equal to) than the product of the cardinalities of the concepts in Z (i.e., Y

Zi∈Z

|Zi| ≥ |A|)

Note that this pruning rule discards combinations by just querying the RDBMS catalog, as follows (expressed in Oracle syntax):

SELECT NUM ROWS FROM USER TABLES WHERE TABLE NAME = t;

SELECT NUM DISTINCT FROM USER TABS COLS WHERE TABLE NAME = t AND COLUMN NAME = c;

Where t is the name of a table and c of a column. If the ontology concept maps to a relational table then by means of the first query we get the cardinality of t, and if the ontology concept maps to a relational attribute by means of the second query we get the number of different values it has. Those combinations satisfying this rule are still candidates to be a base, and we verify it by the following query (in Oracle syntax):

function Find Subsets (Comb CS1, Comb CS2, Set<Comb> Bases, Ordered Set<Comb> Candidates Sets, int z) returns Boolean 1. Set<Comb> SubSets := Generate Subsets(CS1, CS2);

2. For(int i = 0; i < sizeof(Bases); i++) (a) BaseAux := get base(Bases, i); (b) if(BaseAux in SubSets) then

i. return false;

3. For(int w = z+1; w < sizeof(Candidates Sets); w++) (a) CSAux := get candidate set(Candidates Sets, w); (b) if(CSAux in SubSets) then

i. SubSets -= {CSAux}; 4. foreach(SubSet in SubSets) do

(a) if(all root concepts(Subset)) i. return false; 5. return true;

Figure 4.13: AMDO: an algorithm for generating (i+1)-sized combinations

SELECT "base" FROM DUAL WHERE NOT EXISTS(SELECT attrSet FROM tables WHERE joinConds GROUP BY

attrSet HAVING COUNT(*) > 1)

Where DUAL is the dummy table in Oracle and attrSet are the attributes forming the feasible base to be verified, tables the list of tables containing that attributes and joinConds the join clauses needed to join tables involved in the query. If we are able to find two rows with the same values for the base hypothesis then, this combination, according to data, is not a base. Notice that we use a NOT EXISTS expression so that if we find a counter example for this combination then, the RDMBS engine could stop the query.

4.5.2.2 Generating Combinations of Size (i+1): The gen comb by SS Function

Once the i-sized combinations have been verified (i.e., either proved to be bases, and thus added to the bases set, or refuted, and thus, added to the candidate sets), the gen comb by SS function (see Figure 4.12) generates (i+1)-sized combinations from the i-sized candidate sets obtained in the previous iteration of the algorithm. This function looks for pairs of sets having (i-1) concepts in common (see Figure 4.12, step 2bii). To do so, it scans the candidate sets in such an order that it does not generate twice the same pair (see the configuration of the two main loops; steps 2 and 2b). From every pair identified, it produces a i+1-sized combination (note that the combinations paired share i-1 concepts and thus, produce a (i+1)-sized combination).

Importantly, this function only generates combinations fulfilling the three necessary condi- tions, as follows:

Prop. 4 is guaranteed in step 2(b)ii. A 2-sized set is generated if concepts combined are orthogonal (i.e., if they are not present in the FD-tree of each other). As shown in Section 4.5.3, if 2-sized sets are orthogonal then, Props. 3 and 4 guarantee that no orthogonal sets of size greater than 2 are generated. Thus, it is enough to check this proposition for 2- sized sets. In our example, after the algorithm first iteration, the gen comb by SS function is called for i = 2 (note that we first increase i and then, call this function; see step 4d). If the candidate sets contain {rentalDuration} and {minimumDuration} they would not be combined to form {rentalDuration, minimumDuration}, since since rentalDuration → minimumDuration.

Prop. 3 is guaranteed in step 2(b)iiA by the find subsets function (described in Figure 4.13). This function generates all the i-sized subsets of the current (i+1)-sized set treated (note that it can be done in linear time: for a i+1-sized combination, we must generate i+1 subsets overlooking each one of the concept in the i+1 combination), and verify them not to be bases as follows:

(1) If any of the subsets of the (i+1)-sized set is in the base sets then, it must not be considered a (i+1)-sized feasible base (see step 2), since it is not minimal.

(2) If all the i-sized subsets are in the candidate sets then, we guarantee that the (i+1)-sized set is minimal (see Figure 4.13, step 3) and thus, a feasible base.

– (3) Alternatively, due to our pruning rules, it may happen that a subset is neither in the candidate sets nor in the base sets. In this case, we have two possible scenarios: on the one hand, (3.1) if the subset is only compound of root concepts, the (i+1)- sized combination must be refuted, since our algorithm is exhaustive regarding root concepts. Thus, we can assure that this subset is a SS-descendant of a base (see step 4a). On the other hand, (3.2) in any other case, this subset is a FD-descendant of a i-sized feasible base refuted with data (see step 5) and therefore, fulfilling the three necessary conditions. We justify this decision in Section 4.5.3.

Finally, Prop. 5 is guaranteed by the candidate sets definition, since they have been refuted as bases.

For example, suppose that the gen comb by SS function combines {beginningDate, minimumDuration} and {beginningDate, money} to produce {beginningDate, minimumDuration, money}. This scenario could happen in the second iteration of the algorithm, when i = 3, and we combine 2-sized sets having in common one concept to even- tually produce 3-sized sets. Since i != 2, Prop. 4 does not apply, and we just need to focus on function find subsets, which generates the 2-sized subsets of {beginningDate, minimum- Duration, money}: {beginningDate, minimumDuration}, {minimumDuration, money} and {beginningDate, money}. According to (1), if any of them is a base then, the 3-sized set is not generated (since it is not minimal). Oppositely, according to (2), if all of them are in the candidate sets, we can guarantee that the 3-sized set is minimal and it is generated. Consider now that, according to (3), the {beginningDate, money} and {beginningDate, minimumDuration} subsets are neither in the candidate sets nor in

the base sets. In the first case, since it is compound of root concepts, it denotes that the 3-sized set is not minimal (since our algorithm is exhaustive for root concepts and thus, it means that either beginningDate or money is a base). In the second case, since {beginningDate, minimumDuration} is a FD-descendant of {beginningDate, rentalDuration} it does not invalidate the 3-sized set. The reason is that {beginningDate, minimumDuration} has been compulsory generated by the gen comb by FD function (had it been generated by the gen comb by SS function, this subset would have been included in the base sets or candidate sets and thus, considered by either (1) or (2)). Thus, it means that {beginningDate, rental- Duration} is a feasible base refuted with data. Consequently, since it was refuted as a base, ac- cording to Prop. 5, we could foresee that {beginningDate, minimumDuration} would not be a base and then, it was not even generated. However, this set fulfills the three necessary conditions and therefore, it does not invalidate the 3-sized set.

Documento similar