1.3. JUSTIFICACIÓN DE LA INVESTIGACIÓN
2.1.4. Percepción de la corrupción
2.1.4.12. Cifras de corrupción en el Perú siguen generando preocupación
Algorithms 2 and 4 described in the previous section assume there exists an instantiation list Lifor every triple pattern qi. Licontains all the triples ti instan- tiating pattern qi in descending order of their scores S(ti, qi)which is computed as follows:
S(ti, qi) =
c(ti) Σt∈Lic(t)
(5.22) where c(ti)is the witness count of triple ti.
Moreover, Algorithms 2 and 4 keep track of two values in order to compute score bounds: first(qi)which is set to the maximum score of all the triples ti ∈ Li and last(qi)which is initialized to the maximum score of all the triples ti ∈ Li.
Thus, given a triple pattern qi, we must do the following three tasks: 1. Compute the sum of witness counts of all triples ti instantiating qi 2. Compute the maximum score of all triple triples ti instantiating qi
3. Retrieve all the triples ti instantiating triple pattern qi ordered on theirs scores S(ti, qi)which are computed according to Equation 5.22
5.3. Data Store and Indices
We now explain how to carry out all these three tasks efficiently using our data store. We start with the third task and then follow with the first two tasks.
Efficient Retrieval of Triples. Given a triple pattern qi, we can retrieve all the triples instantiating it by issuing a simple SQL select statement. For example, let qibe:
?m hasGenre Thriller
The following select statement can be issued to retrieve all the triples instantiat- ing this triple pattern ordered on their witness counts:
SELECT SUBJECT, PREDICATE, OBJECT, WITNESSES FROM TRIPLES WHERE PREDICATE = ’hasGenre’ AND OBJECT = ’Thriller’
ORDER BY WITNESSES DESC
The ResultSet of the above SQL statement would then be the instantiation list of triple pattern qiand each time the method qi.next()is invoked from Algorithm 2 or 4, the next triple from the ResultSet corresponding to triple pattern qiwould be retrieved and its score would be computed by normalizing its witness count by the total witness count of all the triples in the ResultSet.
In order to be able to efficiently process SQL statements like the one above, we make use of the following observation. Given a triple pattern q = (s, p, o) with subject s, predicate p and object o, there are three different possibilities: 1) all three components are variables, 2) two of the three components are variables, or 3) one of the three components is a variable.
In case all three components of the triple pattern q = (s, p, o) are variables, all the triples in table TRIPLES would instantiate this pattern. We thus create an index INDEX000 over all the fields in table TRIPLES where the triples are sorted in descending order based on their witness counts WITNESSES.
In case two of the three components of the triple pattern q = (s, p, o) are variables, there are three distinct cases: 1) s and p are variables, in which case all the triples with object o would instantiate the triple pattern, 2) p and o are variables, in which case all the triples with subject s would instantiate the triple pattern or 3) s and o are variables, in which case all the triples with predicate p would instantiate the triple pattern.
We thus create three indices over all the fields in table TRIPLES where in the first index INDEX100 the triples are sorted based on their subjects and then de- scendingly on their witness counts. In the second index INDEX010, the triples are sorted based on their predicates and then descendingly on their witness counts. In the third index INDEX001 the triples are sorted based on their ob- jects and then descendingly on their witness counts. Using indices INDEX100, INDEX010 and INDEX001, we can quickly retrieve the set of triples with a given subject, predicate or object,respectively, where the triples are sorted descending order of their witness counts.
Finally, in case one of the three components of the triple pattern q = (s, p, o) is a variable, there are again three distinct cases: 1) s is a variable, in which case all the triples with predicate p and object o would instantiate the triple pattern, 2) p is variable, in which case all the triples with subject s and object o would instantiate the triple pattern or 3) o is variable, in which case all the triples with subject s and predicate p would instantiate the triple pattern.
We thus create three indices over all the fields in table TRIPLES where in the first index INDEX110 the triples are sorted based on their subjects, then on their predicates and then descendingly on their witness counts. In the second index INDEX011, the triples are sorted based on their predicates, then on their objects and then descendingly on their witness counts. In the third index INDEX101 the triples are sorted based on their subjects, then on their objects and then de- scendingly on their witness counts. Using indices INDEX110, INDEX011 and INDEX101, we can quickly retrieve the set of triples with a given subject and a given predicate, a given predicate and a given object, or a given subject and a given object,respectively, where the triples are sorted descendingly based on their witness counts.
Efficient Computation of Sum of Witness Counts and Maximum Scores.
To compute the sum of witness counts of all the triples instantiating a triple pattern qi, we can issue an aggregation SQL statements. For example, let qibe:
?m hasGenre Thriller
The following aggregation statement can be issued to retrieve the sum of witness counts of all the triples instantiating this triple pattern:
5.3. Data Store and Indices
SELECT SUM(WITNESSES) FROM TRIPLES
WHERE PREDICATE = ’hasGenre’ AND OBJECT = ’Thriller’
Similarly, to compute the maximum score of all triples instantiating a triple pattern, an aggregation SQL statement over the table TRIPLES can be issued. For example, the following SQL statement can be issued to compute the maximum witness count of all the triples instantiating our example triple pattern qi:
SELECT MAX(WITNESSES) FROM TRIPLES
WHERE PREDICATE = ’hasGenre’ AND OBJECT = ’Thriller’
The maximum score would then be the result of the above statement divided by the result of the previous one. To be able to efficiently process SQL statements like the ones above, we make use of a set of materialized views. In particular, we create 7 materialized views that account for all the possible triple patterns that our triples can instantiate. In these 7 materialized views, we store the sum of witness counts of all the triples that instantiate each possible triple pattern and the maximum witness count of all triples instantiating each possible triple