Marco Jurídico Internacional

3.5 Marco Jurídico

3.5.2 Marco Jurídico Internacional

We have explained how to construct a language model for each resource X, whether an entity or a relation. Now, we associate each resource X in our knowledge with a list of candidate substitutions which is defined as follows.

Definition 4.13 : Substitution List

Given a resource X, a substitution list L consists of a set of resources Y which are ordered by their similarity to the given resource.

We first explain how to compute the similarity between two given resources Xand Y and then explain how we construct the substitution lists.

Similarity between Resources

The similarity between two resources X and Y is computed as the distance between their language models. Specifically, we use the square-root of the Jensen- Shannon divergence (JS divergence) between the language models of the two

4.2. Query Reformulation Framework

resources X and Y, which is a metric, to measure the distance between the two resources. The JS divergence is defined as follows.

Definition 4.14 : Jensen-Shannon Divergence

The Jensen-Shannon divergence between two probability distributions P and Q, is a symmetric measure of the distance between two probability distributions.

Given two probability distributions P and Q, the JS divergence between them is computed as follows:

JS(P||Q) = KL(P||M) + KL(Q||M) (4.8) where KL(R||S) is the Kullback-Leibler divergence (KL divergence) between two probability distributions R and S, which is computed as follows:

KL(R||S) = ΣwR(w)log R(w) S(w) (4.9) and M = 1 2(P + Q) (4.10)

We use the square root of the JS divergence since it is a metric between 0 and 1, and thus it can be used to measure the similarity between two resources.

Substitution Lists Construction

We have so far shown how to represent a resource and how to measure the similarity between two resources. To recap, for each resource X in the knowledge base KB, we construct a language model. The similarity between two resources Xand Y is then computed as the distance between the language models of the two resources. Specifically, we use the square-root of the Jensen-Shannon divergence (JS divergence) between the two language models. Now, a substitution list for a resource X is a simply a ranked list of other resources, ranked based on the square-root of the JS divergence between their language models and the language model of resource X.

Adding Variables to Substitution Lists

Recall that a triple-pattern query can be reformulated by replacing one of the resources that appear in it with a variable. We interpret replacing a resource

Academy Award for Best Actor Thriller

BAFTA Award for Best Actor Crime

Golden Globe Award for Best Actor Drama Horror

var Action

Golden Globe Award for Best Actor Musical or Comedy Mystery New York Film Critics Circle Award for Best Actor var

directed bornIn actedIn livesIn created originatesFrom produced var var diedIn type isCitizenOf

Table 4.1.: Example resources and their top-5 substitutions

with a variable as being equivalent to replacing that resource with any other resource in the knowledge base.

To handle variable substitutions, we interpret replacing a resource X with a variable as replacing X with any other resource in the knowledge base. To carry this out, we construct a special language model for all other resources in the knowledge base which is a mixture model of all the language models of all the resources in the knowledge base other than X. The similarity between the resource X and a variable is then computed using the square-root of the JS divergence between the language model of the resource X and the special language model corresponding to all other resources in the knowledge base. Using this technique, a variable is now simply another entry in the substitution list of resource X, .

Table 4.1 shows example resources from an RDF knowledge base about movies. For each resource, it shows the top-5 substitutions from the resource substitution list. The entry var represents the variable substitution. As previously explained, a variable substitution indicates that there were no other specific substitutions which had a higher similarity to the given resource.

4.2. Query Reformulation Framework

Pruning the Substitution Lists

Maintaining a substitution list for every resource in the knowledge base can be very impractical when these lists are long. Recall that for a given resource, its substitution list contains all other resources in the knowledge base and their similarities to the given resource. These lists can thus be extremely long in large knowledge bases. Pruning such lists is thus crucial to avoid storage bottleneck, as these lists need to be maintained somewhere in the knowledge base. Pruning can also be beneficial for efficient query processing since our query reformulation algorithm described next scans such lists to generate reformulated queries, which would then be evaluated. Pruning can limit the number of such reformulated queries to a reasonable number.

The most basic way to prune substitution lists is to use a pruning threshold. That is, reduce the list and cut off its tail whenever the similarity score between the substitution and the resource the list belongs to becomes less than a prede- fined threshold. In our framework, we use the score of the variable substitution as the threshold value after which we prune the lists. More precisely, any substitution ranked below the variable substitution is pruned and removed from the substitution list. As an example, consider the substitution list for the rela- tiondirected_{shown in Table 4.1. This substitution list can be pruned after the}

fourth entry. This seems to be very intuitive since substitutions beyond the variable substitution can be seen as very dissimilar from the given resource, that they may as well be ignored and represented by the variable substitution.

In document La incidencia y vigilancia permanente de la Comisión Internacional contra la Impunidad (CICIG) en la República de Guatemala, producto del combate a la corrupción de altos funcionarios del gobierno, durante el período 2015-2017: caso de Defraudación Aduane (página 128-134)