• No se han encontrado resultados

Queda aún un punto por comentar para apreciar plenamente

The approach of the previous section is the basis for the experiments reported in this section. We have developed a simple matcher that allows us to have full control on the extraction process. This matcher compares the class and property names of the ontologies to be aligned by using the Levenshtein distance. On top of the matcher we compare the different extraction methods presented in the previous section. We use the following naming convention to refer to these methods. t → 1:1 - A standard approach of extracting an alignment from a similarity matrix.

First a threshold t is applied and then a 1:1 alignment is extracted from the remaining correspondences. In particular, we extract an 1:1 alignment that is optimal with respect to its sum of confidences.

t → 1:1 → ∆ - The standard approach is extended by a subsequent repairing step in which we compute a global optimal diagnosis. Thus, the 1:1 extraction is independent of the subsequent diagnostic approach.

9.2. EXTRACTING FROM A SIMILARITY MATRIX 113

t → ∆1:1 - After applying a threshold, the optimal 1:1 extraction method is com-

bined with the diagnostic approach to compute a global optimal diagnosis in a single step.

We restrict our experiments to the CONFERENCEdataset. Note that our min- imal matching system generates acceptable results for the CONFERENCEdataset, but cannot compete with the other systems on the other tracks. Moreover, we fo- cus only on the global optimal diagnosis. The results of the previous chapter have shown that the global optimal diagnosis is, with respect to the quality of the re- paired/extracted alignment, the better choice than the local optimal diagnosis,

The results of our experiments are presented in Table 9.2. In the first row we listed the threshold t that we applied prior to any other extraction method. Then there are three blocks that contain the results for each of the extraction methods. Each block comprises three columns headed with letters p, f, and r that refer to precision, f-measure, and recall. In the two rightmost columns we present the difference between the first method (t → 1:1) and the second method (t → 1:1 → ∆), and the difference between the second method (t → 1:1 → ∆) and the third method (t → ∆1:1) in terms of f-measure. In the last row we show the average

scores over all thresholds.

First of all, the results of repairing our simple matching system conforms with the results we measured in the previous chapter. Repairing the alignments of our simple matching system yields similar results like repairing the alignments of an OAEI participant. In average we can increase the f-measure of the 1:1 alignment by 0.018. In the worst case (highest threshold) we gain 0.009 in f-measure and in the best case (lowest threshold) we gain 0.035. There are only a few exceptions from a general trend: the lower the threshold the higher the gain in f-measure.

Our main interest with respect to these results, however, is related to the dif- ferences between the subsequent and the combined approach. The relevant differ- ences in terms of f-measure are depicted in the last column. Contrary to our ex- pectations, there are only minor differences that do not imply a general tendency. In some cases we do not observe any differences, sometimes results are slightly worse and sometimes slightly better. We cannot conclude that the f-measure can be improved with the combined approach.

However, a tendency can be observed when we directly compare precision and recall of both approaches. The combined approach slightly increases recall but decreases the precision of the alignments at the same time. This is also illustrated in Figure 9.1. It depicts the precision/recall value pairs of both approaches in a precision/recall graph. We can see that the red curve (∆1:1) sits above the black

curve (1:1 → ∆) for many different thresholds. However, there is also an offset to the left, which illustrates the lower precision.

Another interesting aspect is related to the relation between threshold and re- call. Recall values for high thresholds can be increased only to a very limited degree (from 0.447 to 0.52) by decreasing the threshold. A top-score of 0.52 for recall is reached at a threshold of 0.7 for the combined ∆1:1-approach. Note that

114 CHAPTER 9. ALIGNMENT EXTRACTION

t → 1:1 t → 1:1 → ∆ t → ∆1:1 +/- f-measure

t p f r p f r p f r repairing rep vs. ext 0.625 0.471 0.483 0.497 0.549 0.518 0.49 0.526 0.511 0.497 0.035 -0.007 0.65 0.513 0.513 0.513 0.585 0.543 0.507 0.567 0.537 0.51 0.03 -0.006 0.675 0.551 0.53 0.51 0.622 0.559 0.507 0.601 0.55 0.507 0.029 -0.009 0.7 0.592 0.551 0.516 0.665 0.579 0.513 0.654 0.579 0.52 0.028 0 0.725 0.623 0.557 0.503 0.699 0.583 0.5 0.688 0.581 0.503 0.026 -0.002 0.75 0.679 0.569 0.49 0.738 0.587 0.487 0.737 0.591 0.493 0.017 0.004 0.775 0.686 0.574 0.493 0.743 0.591 0.49 0.744 0.593 0.493 0.016 0.003 0.8 0.702 0.58 0.493 0.761 0.596 0.49 0.763 0.599 0.493 0.017 0.003 0.825 0.745 0.589 0.487 0.787 0.599 0.484 0.789 0.605 0.49 0.01 0.006 0.85 0.759 0.591 0.484 0.804 0.604 0.484 0.801 0.606 0.487 0.013 0.002 0.875 0.778 0.594 0.48 0.812 0.604 0.48 0.808 0.602 0.48 0.01 -0.001 0.9 0.786 0.596 0.48 0.821 0.606 0.48 0.817 0.605 0.48 0.01 -0.001 0.925 0.79 0.587 0.467 0.822 0.596 0.467 0.817 0.595 0.467 0.009 -0.001 0.95 0.79 0.587 0.467 0.822 0.596 0.467 0.817 0.595 0.467 0.009 -0.001 ∅ 0.676 0.564 0.492 0.731 0.583 0.489 0.723 0.582 0.492 0.018 -0.001

Table 9.3: Extracting from a similarity matrix. The column entitled ‘repairing’ refers to the difference in f-measure between t → 1:1 and t → 1:1 → ∆, the column entitled ‘rep vs. ext’ refers to the difference between the sequential approach of first extracting an 1:1 alignment that is repaired afterwards and the approach of combining 1:1 extraction and resolving incoherence in one step (i.e., it compares t → 1:1 → ∆ against t → ∆1:1).

it is not possible for our simple matching system to exceed a certain degree of re- call, without loosing a significant degree of precision. Before applying one of our extraction methods, directly after applying a threshold of 0.625, we have a preci- sion of 0.26 and a recall of 0.529. This is also the upper bound for any extraction method that is applied to the set of hypotheses. A recall of 0.52 is thus a good result and shows that some of the effects described in the example of the previous section occur also for real matching problems. However, their impact is only limited.

We can conclude that there are only minor differences between the sequential and the combined approach. Both approaches increase the quality of the alignment in terms of its f-measure to a similar degree. This differs from our expectations. With respect to recall we can observe a tendency. The combined approach results in an increased recall and a decreased precision. However, the results comply with our expectations only to a limited degree. The following two reasons have to be taken into account.

1. The simple string-based similarity measure, which is the basis of our matcher, cannot exceed a certain upper bound for recall. For that purpose an approach

9.3. EXTRACTING FROM A MERGED ALIGNMENT 115