1.2 ESTADO ACTUAL DEL CONOCIMIENTO
1.2.1 EXPERIENCIAS DESARROLLADAS LOCALMENTE
After each strategy has been executed and STROMA has calculated a relation type, the final type is once more checked for validity, which is part of the type verification. This verification becomes necessary, because the strategies only regard the relation between the matching path leaf concepts, yet the relation type may also depend on the internal structure of the concepts within the ontology (concept path). According to evaluation results, is-a relations are especially susceptible for such erroneous decisions, but also equal and part-of relations can be erroneous because of the specifics of the ontology structures.
The verification is carried out by the verification module, the so-called verificator of STROMA. Each enriched correspondence is passed to this verificator, which comprises several verification techniques for each correspondence type. Correspondences of type related are not further verified, though, as there are no plausible techniques to prove or disprove such a co-hyponym relation.
Verification only becomes possible if at least one of the two matching concepts has a parent concept and the concepts are atomic. Therefore, the verificator has the same pre-conditions as the Structure Strategy.
6.7.1 Verifying Is-a Relations
In some cases, an obvious is-a correspondence like (Children shoes, is-a, Shoes) turns out to be anequal relation if the overall correspondence path is analyzed:
Clothing.Children_shoes ↔ Clothing.Children.Shoes
This correspondence is obviously anequal correspondence, because both concepts ex-press the same thing (shoes for children). However, several strategies (e.g., the Com-pound Strategy) would have suggested anis-a relation, because the two concepts Chil-dren shoes and shoes are in anis-a relation. The is-a verification investigates the parent element of each leaf concept and tries to figure out whether theis-a relation seems justi-fied or whether anequal relation might be correct.
In a correspondence between concepts X, Y , let X0 resp. Y0 be the parent concept of X resp. Y . The verificator uses several methods to proof or disproof that either (X0 + X) = Y holds or (Y0 + Y ) = X. The operator + can be interpreted as "combined" or
"concatenated" and only serves for the illustration of the problem. In the following, we will focus on the case (Y0+ Y ) = X, as in the above example, but the argumentation for the other case is the same.
The first and simplest method is to concatenate Y and Y0, put a space, underscore or hyphen in between and check whether it matches X. This would already work in the above examples. It holds Y0 = Children, Y = Shoes and the concatenation of the two terms, including a space character, yields the term X = Children Shoes.
There are much more complex scenarios conceivable, though. For instance, let us assume that the source concept is not Children Shoes, but Kids shoes. In this case, background knowledge is used to discover the equivalence between the two concepts. As it will be shown in Part III, SemRep is able to discover an equal relation between Children shoes and Kids shoes because of the equivalence between Children and Kids.
Furthermore, let us assume the source concept would not be Children shoes or Kids shoes, but Shoes for children. To some degree, STROMA is able to handle even such a case and recognize the equivalence between the two concepts. In this case, the concepts Y0 and Y are concatenated again and a lexicographic matcher calculates a similarity value between (Y0+ Y )and X. This matcher uses the similarity measures Trigram, Jaccard Distance and Jaro-Winkler Dinstance [133]. If the similarity value is above a specific threshold, the con-cepts are considered to be equivalent. In the above case, the similarity between Shoes for Children and Children Shoes would be high enough for STROMA to consider the corre-spondence asequal relation.
Such a verification is quite effective and can boost the mapping quality to some degree.
However, it is important to avoid any false conclusions, so to prevent an erroneous type
change of an originally correctis-a relation. For instance, the following correspondence is a trueis-a correspondence, but looks very similar to the correspondence above:
Clothing.Baby_shoes ↔ Clothing.Children.Shoes
However, STROMA would recognize that this is noequal relation and will not change the type, because SemRep would not confirm anyequal relation between baby and chil-dren resp. baby shoes and chilchil-dren shoes. Besides, the lexicographic overlap between Baby shoes and Children Shoes is too low and no equality relation would be assumed.
6.7.2 Verifying Equal-Relations
Similar to is-a relations, equal relations can be misleading as well, as the following example shows:
Clothing.Children.Shoes ↔ Clothing.Shoes.
Although the relation between the leaf concepts Shoes and Shoes is apparentlyequal, the concepts are in anis-a relation, because the left concept refers to children shoes, while the right concept refers to shoes in general.
To discover such a pitfall is more difficult and more prone to errors than the discovery of falseis-a relation. Let X1, X2, ..., Xmbe the concepts in the source path and Y1, Y2, ..., Yn the concepts in the target path with Xi, Yjbeing the specific concepts within the path and X1 = Y1 being the leaf concepts of the ontology. It obviously holds X1 equal Y1, as the correspondence was denoted as an equal-correspondence.
The implemented approach turns theequal type into is-a if there are i, j with i < m, j <
nso that Xiequal Yj and i > j. By contrast, it turns it intoinverse is-a if it holds j > i.
This method is called Common Predecessor.
In the above example, X3 and Y2 are obviously equivalent. It therefore holds i = 3 and j = 2 and thus i > j. The introduced strategy works in this case, as it can be quite naturally assumed that Shoes (X1) must be more specific than Shoes (Y1), because of the additional element Children (X2) between the two Clothing elements (X3, Y2), which is also illustrated in Fig. 6.5 a). However, this technique fails in many cases and can both improve and reduce the overall result. An counterexample is depicted in Fig. 6.5 b), where theequal relation is correct. The strategy is impaired by the additional element Shirts in the left taxonomy, which is more fine-granular than the right taxonomy.
Experiments showed that theequal verification leads to worse average results. It is there-fore not used in the default configuration, but can be enabled and disabled at any time.
6.7.3 Verifying Part-of relations
part-of relations are less prone to such errors than is-a or equal relations, though they
Figure 6.5: Two examples where the Common Predecessor technique works correctly (a) and where it fails (b).
Clothing.Zips ↔ Clothing.Zips.Pants Home.Door_handles ↔ Home.Handles.Door
Both correspondences are actually of type equal, i.e., they both express zips for pants resp. door handles, but only regarding the concepts, STROMA would decide onpart-of relations. Such falsepart-of relations become possible if a specific concept can be part of different objects, as a zip can be part of pants, jackets, cardigans, etc. If the taxonomy does not follow the natural part-of hierarchy (clothing – pants – zip or home – door – handle), STROMA may decide on apart-of relation although it is an equal relation.
Since part-of relations occur less frequently than is-a or equal relations, such cases hardly occurred during the evaluations. Still, this problem was also addressed and re-quires a similar strategy as in theis-a verification. Instead of combining the last two concepts of each concept path, which would yield terms like Zip Pants or Handles Door, the terms are swapped, i.e., leaf concept and parent concept are concatenated instead of parent concept and leaf concept. Then again, the verificator tries to figure out whether the concepts are equivalent by using match techniques, as well as the Compound Strategy and Background Knowledge Strategy.