5 Transporte de GNL en Galicia
5.1 Generación de las rutas
We conducted experiments with a small data set to verify the applicability of our approach. For the first set of experiments we used top 50 queries for each category(i.e. 50 for what, 50 for when etc. ) of questions as retrieved from Google and used our algorithm to convert them into RDF triples. Table 3.4 shows the the results of our experiment.
We can see from the table above that our algorithm performs consistently good for all types of questions. However, Which type of questions are a little tricky to convert since they mostly have complex nouns and complex verbs. Similarly, for the How many type questions we mostly have a complex verb, such that one of them is mentioned as a part of the other one. Currently our approach cannot handle such relationships. Hence we have a low conversion rate for these types of questions.
For the second set of experiments we considered question subsets from Dataset [2004]. We compare the answers generated by our approach with the first answer retrieved from Google query. We took a sample data set from Data [2012]. We used the RDF triple generated for the query translation to find its answers both in the RDF files as well as search query in Google search engine. We used wikipedia as the information source for semantic serach for the experiments. We used 20 random
Table 3.5: Question types and subtypes
Question Word Google Answer Ratio SW Answer Ratio
What 0.57 0.52 When 0.85 0.90 Where 0.83 0.95 Which 0.50 0.15 Who 0.77 0.66 How 0.50 0.50 How old 0.36 0.56 How long 0.98 0.98 How many 0.50 0.50
questions of each type (as defined in table1) and repeated the experiment 10 times. Then we averaged out the numbers and the results of these experiments are presented below. Table 3.5 shows the results of our experiments.
We can see from the results that our techniques performs fairly good in com- parison to the Google search results. Our approach performs better than Google for
When, Where and How old types of question where as Google does a better job at
What, Which and Who types of question. Where as we see similar results for both
Google and our approach for How, How long and How many types of questions.
We can see from the results that the best solution was found for quantitative type questions. It is attributed to the fact that if an answer is found it is very probable that it will be a correct answer. Since a no-match found would be termed as an invalid answer. The text based answers had the minimum ratio for successful answers. The reason for the low score lies in the fact that our technique works of exact matching and the fact that for these types of questions we may have multiple answers and all of them could be valid/true. This makes it is very difficult to figure out the best answer for text based questions and makes it harder to put much confidence behind a possible answer candidate. Similarly we can see that number based answers i.e. when, how far etc did show better results. These results give us better insight into the semantic QA process. These limited sets of experiments show the applicability of our approach and serve as a proof of concepts for our solution.
Our proof of concept experiment run highlighted some of the problem points of our approach. The first one belongs to the semantic meaning of the combination of query words, for example the question “what is the biggest hit of Insane Clown Posse” we have to interpret the term ”biggest hit”. Now there could be multiple interpretations of the term biggest hit e.g. the biggest hit in terms of revenue or popularity or number of records sold etc. The missing information could be guessed using the heuristic measures base on frequency of words i.e. biggest hit is mostly associated with the number of albums sold. However this is not the semantic meaning of this combination of words hence this type of questions are difficult to answer. The second issue is when a question has a domain specific multi stage answer e.g. if we ask the question ”who discovered prions” in this case there is no single subject answer for this question since discovery of prion is attributed to three different stages. During the 1960s radiation biologist Tikvah Alper and mathematician John Stanley Griffith developed the hypothesis, Francis Crick recognized the potential importance of the Griffith protein-only hypothesis for scrapie propagation in his book and finally in 1982, Stanley B. Prusiner of the University of California, San Francisco announced that his team had purified the hypothetical infectious prion. Hence we can see a lot of domain specific knowledge is needed to construct answers for this kind of single questions. The third issue is with the questions of the type ”who was the lead singer of nirvana”. Our current approach of using N-grams will rank the N-gram lead singer as the most appropriate search tuple since it has a very healthy frequency. Although lead singer could be translated into singer by a human who knows the semantic rule that if there is only one singer in the band, then he/she should be the lead singer. But these types of domain specific rules do not exist for question answer systems. One of the approaches used to overcome this problem is to use the root of N-grams
being searched however this is computationally expensive and does not work in all cases.