ANÁLISIS MICRO DE LA SITUACIÓN DEL SECTOR

CAPÍTULO III – ORGANIZACIONES RELACIONADAS

CUADRO 6: RELACION FACTURACION CON DESIGNACIONES 1° DIVISIÓN DATOS ACAMA (FUENTE: ELABORACIÓN PROPIA)

3 ANÁLISIS MICRO DE LA SITUACIÓN DEL SECTOR

R1 Woody Allen hasWonPrize Academy Award for Best Director

Woody Allen directed Annie Hall Woody Allen actedIn Annie Hall

R2 Mel Gibson hasWonPrize Academy Award for Best Director

Mel Gibson directed Braveheart Mel Gibson actedIn Braveheart

R3 Clint Eastwood hasWonPrize Academy Award for Best Director

Clint Eastwood directed Million Dollar Baby Clint Eastwood actedIn Million Dollar Baby

Table 1.2.: Results for the example query ”directors who have won an Academy Award and movies they directed and in which they also acted”

1.4. Research Challenges

In order to fully utilize RDF Knowledge bases, there are still major challenges to be addressed which we highlight next.

Data Incompleteness. While large RDF knowledge bases contain a vast amount

of information in the form of SPO triples that are either obtained from structured data sources, or via automatic information extraction from semi-structured and textual sources, the majority of information on the Web is available in the form of free text. Thus, augmenting RDF knowledge bases with text can increase the scope of such knowledge bases making them very rich sources of information. For example, the set of RDF triples in Table 1.1 represents information about movies, directors, actors and awards. While this covers a wide range of interest- ing information, there is still information that cannot be easily captured in terms of RDF triples. For example movie plots, taglines, users’ comments and so on. Such information naturally appears as free text and by omitting them altogether, we lose a lot of valuable information.

Flexible Querying. Even though triple-pattern queries are highly expressive,

they are also very restrictive since they deploy Boolean matching (i.e., a result is either a match to a query or not). It is thus crucial to equip triple-pattern

search with flexible querying capabilities and to support approximate matching to allow a more effective searching of RDF knowledge bases. For example, consider our example query asking for directors who have won an Academy Award and movies they directed and in which they also acted. Directors who have been nominated for an Academy Award or have won a Golden Globe, and movies they directed and in which they also acted, are all potentially relevant results to the original information need. Similarly, directors or even actors who have won any award and movies they directed and in which they also acted are again somehow relevant to the given query. Thus, allowing approximate matches can improve the recall of such advanced queries, especially for queries with in- sufficient number of exact matches.

In addition, assume that the user is interested in finding movies that have something to do with, say, boxing or movies that were directed by controver- sial directors. Unless there exists explicit entities corresponding to ”boxing” and ”controversy”, and explicit relationships linking these entities to movies and directors, there is noway such queries can be expressed. However, if RDF knowledge bases were extended with text, and keyword conditions were allowed, this can go a long way in addressing a wider range of information needs such as the ones just mentioned.

Finally, triple-pattern search, even when augmented with keywords, is still best targeted for expert users or programming APIs. Average users are accus- tomed to keyword search which is the paradigm to search for information on the Web. It is thus beneficial to consider sacrificing the expressiveness of triple- pattern queries, and also support plain keyword search over RDF knowledge bases. While we sacrifice query expressiveness, searching RDF knowledge bases with keywords still gains from the conciseness of RDF data combining information from different sources; information that does not necessarily exist in one particular source and thus could not be retrieved by traditional search engines.

Result Ranking. Large RDF knowledge bases may contain noisy or incorrect

information and thus queries may produce many results of highly varying qual- ity, in particular when keyword conditions are allowed or approximate matching is deployed. It is thus highly desirable to present users with a ranked list of results rather than a mere a list of unranked matches. For example, when asking for directors who have won an Academy Award and movies they di-

1.4. Research Challenges

rected and in which they also acted, it is essential to provide exact matches first, followed by any approximate matches. Also, if we add keyword conditions to such a query, say finding those movies that have something to do with ”boxing”, ranking of results should take into consideration how relevant they are to the keyword conditions. Finally, with keyword search in place, we add an addi- tional level of ambiguity that is not present in the case of triple-pattern search, and in that case result ranking is again very crucial.

Efficient Query Processing. Triple-pattern search over RDF knowledge bases

involves pattern matching. This becomes in particular very expensive when keyword conditions are allowed and when approximate matching is supported. Moreover, result ranking adds another level of complexity since all matches for a given query should be identified, ranked based on some scoring function and then returned to the user in the order of their scores. Incremental retrieval and ranking of results is thus needed to improve the response time of such queries.

Result Diversity. While ranking ensures that the most relevant results are

ranked on top, it is often the case that the top results tend to be homogeneous, making it difficult for users interested in less popular aspects to find relevant results. For example, considering our example query, we do not want to have movies by the same director dominating the top results, or movies of the same genre, or in case query reformulation is allowed, people that have won the same award. Thus, result diversity can play a big role in ensuring that the users get a broad view of the different aspects of the results matching their queries, and ensures that, on average, almost all users can find relevant results to their queries in the top ranks.

Knowledge Exploration. As mentioned earlier, results to queries over RDF

knowledge bases are typically tuples of triples joined together. While this is a very concise representation of answers to users’ information needs, it is often the case that users like to explore the knowledge base in order to learn more about a certain topic or subject. It is thus necessary to provide users with tools that allow them to interactively explore an RDF knowledge base.

In document Regularización y reorganización de una asociación sin fines de lucro (página 45-49)