A LO LARGO DE TODO EL TERRITORIO DE LA REPÚBLICA ARGENTINA PUEDEN RECONOCERSE

To give a better feel of the output of the system, we start with presenting the results for all three approaches for the concept apple. Table 8.28 shows this result. It is noteworthy that for each approach while we get a score, the scores are not comparable on their own, and the order of facts has more information than the score itself. In case of apple, the OOA approach does not provide the score for the first two facts, as they are not presented in the introduction of the Wikipedia entry. The most difference between facts in this example are, first, OOA result for Ganus Malus compared to the other two approaches (because of the more scientific nature of Wikipedia entry) and second, for the fact “apple is eatable” specially using SPMI (normally users do not use the term eatable for apple even when they mention eating it. In comparison if we replace eatablewith eat, the order would change to 3).

The main metric we consider for comparison of approaches is correlation of orders. The correlation shows how similar the results of ordering using the three approaches are to each other. For this purpose we use “Pearson Product-Moment Correlation Coeﬃcient” as presented in Equation 8.8. The reduction of the average order in the equation helps by removing some of the noise resulting from the lack of order in some of the facts in OOA approach. An example of this situation shows itself in the highest correlation in our data-set between games and OOA for the concept

Table 8.28: Order of facts for the concept apple.

Fact SPMI OOA Games

hasProperty Green 1 3

hasProperty Sweet 2 4

isA Fruit 3 2 1

usedFor Cooking 4 6 6

grownOn Apple tree 5 1 5

growsFrom Seed 6 5 7

usedFor Cider 7 8 8

hasProperty Eatable 8 7 2

GrownIn Central Asia 9 4 9

belongsTo Ganus Malus 10 3 10

“hate”, when we note that using OOA we just have the orders of 4 out of 9 total facts for this concept.

r= √Σ(x−x)(y¯ −y)¯

Σ(x−x)¯ 2_Σ(y−_y)_¯2 (8.8)

Figure 8.4 and Table 8.29 show the relation of number of hits and correlation of different approaches. The data shows that the number of results is related to higher correlation for concepts with high or low hits. We have ordered the number of hits and removed the numbers for the purpose of creating this figure mainly because the difference between the hit numbers is large which after scaling makes the numbers meaningless. On the other hand, the correlations show similar trend towards different concepts in general. With all the similarity in trend, the correlation between games and OOA has the largest difference which is mainly because these two approaches are not related to hit numbers in any way. As mentioned previously the highest

correlation we have is for concept hate in games and OOA, on the other hand, the

lowest correlation is for OOA and SPMI for the conceptArmadillo with -0.97. Both

OOA does not cover all the facts in the data-set (four out of nine facts were found in Armadillo Wikipedia article).

Figure 8.4: Number of hits and correlation relation

Table 8.29: Concepts and correlation of approaches

Concept Game & OOA Corr Game & SPMI Corr OOA & SPMI Corr Apple .19 .7 .32 Piano .71 -.78 -.76 Cow .71 .63 -.2 Hatred 1 .9 .4 Math .38 .08 .2 IPad .73 .59 .38 Armadillo -.77 .6 -.97 Brownie .9 .88 .9 Barack Obama .44 .85 .29

Next, we discuss the run time of OOA and SPMI. Considering that games do not have a speciﬁc time window to run and the time for each user to play a round of game is up to the user, for this part of the work we do not consider the games’ run times. In the following, ﬁrst we discuss the time required to retrieve the data for one fact, and later expand it to the run time of full measurement for each concept. For each fact, SPMI requires to perform two search queries and one simple

calculation. On the other hand, for each OOA calculation we have to retrieve one Web page and process on average 8 lines of text in our approach. For each sentence we have to remove the stop words, run a tokenizer and normalize the sentence and then calculate the similarity of the fact to the sentence in question. Considering all these steps, the time complexity of this approach is higher than SPMI for one fact. We also ran a separate set of experiments by removing the intermediary steps of the process (normalizing and tokenizing). This change simplifies and shortens the process considerably (to retrieve the page, and search for the word in the fact in the introduction which only depends on the length of it and has linear time complexity). Table 8.30 shows the results of these experiments. As shown in this table, for one fact the modified OOA is the fastest approach, and SPMI follows it with a small margin. The original OOA is slower than the other two and based on the ordering results has around 30% increased performance (meaning it found 3 more facts than the modified OOA out of 10 facts). When considering the scores for all the facts in one concept, SPMI run time greatly exceeds the other approaches considering that for all the approaches the most time consuming part is retrieving pages. In both OOA and modified OOA we just retrieve the page once and the other parts of the process runs locally while for SPMI it is required to do one search query for each fact, which increases the run time exponentially. Considering that SPMI always returns results and in most cases these results are satisfactory, there is a trade-off between better speed (OOA and modified OOA) and better performance (SPMI).

Table 8.30: Time consumption comparison of approaches (in seconds)

Approach Single fact One concept

SPMI 2.84 17.2

OOA 4.21 4.9

First, an important factor in all the results is the importance of IsA relation in our data-set. Recognition of each concept in more than 95% of concepts achieved the highest rate. This result is even consistent in some of the cases when there are multiple instances of isA relation for one concept. Still, there is an exception to this case. For concepts in which its category or named entity is well known that it is considered common sense for everyone, and to discuss the concept you do not need to explicitly mention what kind of object it is, we can see that the rank dramatically decreases (specially in SPMI or OOA approach).

In document Las zeolitas naturales de Iberoamérica (página 76-79)