Esquema 16. Procedimientos sintéticos empleados para modificar grafeno con cadenas cortas de PE.
4.2.5.1. Comparación entre las distintas estrategias
There exist many techniques that are utilized for semantic data integration. In the following, rule-based systems techniques that are used during the development of this work are described.
Datalog
One of the techniques used in this work for data integration is Datalog. Datalog is a declarative programming language used to work with deductive databases. Since Datalog rules are a representation of clauses in the function-free Horn fragment of first-order logic (FOL), Datalog revealed itself relevant also for semantic web applications such as ontological modeling and reasoning [54]. A Datalog rule can be expressed as follows:
L0 : −L1, . . . , Ln, n ≥ 0 (2.1)
hasRefSemantic(X,T) ∧ hasRefSemantic(Y,Z) ∧ sameRefSemantic(T,Z) ⇒sameAttribute(X,Y)
Listing 2.3: Example of a Datalog Rule. The rule represents the semantic equivalence between two elements, i.e., Attributes of the AML standard checking whether the value for their respective semantic references is equivalent.
The atom L0 is the head while the set of atoms L1, ..., Ln are called the body. In other terms, a Datalog rule is a function-free Horn clause. In Datalog, every variable in the head of a rule must appear in the body of the rule. A Datalog program is a finite set of Datalog rules. Datalog and OWL can be jointly employed since they share the same interpretations:
• OWL individuals are constants;
• OWL classes are unary predicates; and 14
Chapter 2 Background and Preliminaries
• OWL properties are binary predicates.
For example, the rule in Listing 2.3describes the fact that two attributes, i.e., X and Y are considered as the same where the respective value of their semantic references, i.e., T and Z, represented by the predicate hasRefSemantic are the same. In this case, thehasRefSemantic
which is a binary predicate, can be seen as an OWL object property connecting two constants, i.e., OWL individuals.
Probabilistic Soft Logic
Probabilistic Soft Logic (PSL) [55, 56] is a framework for collective, probabilistic reasoning which allows defining probabilistic models over continuous variables. The basic building block of PSL are: (1) atoms to model the continuous random variables; (2) predicates which describe relations or properties; and (3) rules combining predicates and atoms to capture dependencies or constraints of the domain based on which it builds a joint probabilistic model over all atoms. Each rule has an associated non-negative weight that captures the relevance of a rule for a given domain. PSL uses soft truth values in the interval [0,1], which allows incorporating similarity functions into the logical model. A PSL model is defined using a set of weighted rules in first-order logic, as follows:
Component(A, X) ∧ Component(B, Y )∧
SimilarAttributes(X, Y ) ⇒ Component(A, B) | 5.0 (2.2)
PSL utilizes the Lukasiewicz t-norm and co-norm to provide relaxation over the logical connectives AND (∧), OR (∨), and NOT (¬) as follows:
p˜∧q = max{0, p + q − 1},
p˜∨q = min{p + q, 1}, ˜
¬p = 1 − p
A rule is grounded when substituting constants for variables in the atoms of a rule. For a ground rule r ≡ rbody → rhead ≡ ˜¬rbody∨r˜ head. rbody and rhead are logical formulas which are composed by atoms and logical operators. The rule r is satisfied (i.e., I(r) = 1, iff I(rbody) ≤ I(rhead)). An
Interpretation (I) over the atoms in r determines whether r is satisfied, and, if not, its distance
to satisfaction. With the Interpretation (I), the rule’s distance to satisfaction is defined by the following equation:
φr(I) = max{0, I(rbody) − I(rhead)} (2.3)
An Interpretation (I) of a set of ground atoms is a full assignment of soft-truth values to that set. PSL defines the distance to satisfaction for each grounded instance of a rule. For example, assuming the following evidence: I = Component(A, X) = 0.9, Component(B, Y ) = 0.8 and SimilarComponent(X, Y ) = 0.9, r is the result of the ground of Rule 2.2. Then,
Component(A, X) ∧ Component(B, Y ) ∧ SimilarAttributes(X, Y ) = max{0, 0.9 + 0.8 + 0.9 − 1},
i.e., 1.6. The value of the head, Component(A, B) = 0.8. Therefore, the distance to satisfaction
φr(I) = max{0, 1.6 − 0.8} = 0.8. In general, a PSL program defines a probability distribution
from a logical formulation expressing relationships between continuous random variables. The
2.3 Data Integration
probability distribution function is as follows:
f (I) = 1 Zexp " −X r∈R wrφr(I)p # (2.4)
Where R is the set of ground rules, wr is the weight of rule r, p ∈ {1, 2} is a modeling parameter which defines whether rules are quadratic or linear and Z the normalization constant. PSL utilizes the most probable explanation inference (MPE). MPE finds the overall interpretation with the maximum probability given a set of evidence. When the value of the probability is the highest, then, the probability of the interpretation is the lowest distance to satisfaction. PSL finds the interpretation that tries to satisfy the rules as much as possible. In this setting, MPE allows to find the interpretation that minimizes P
r∈R
wrφr(I)p. As recognized in the literature [57],
PSL can efficiently and scalable solve this optimization problem.
Expressive and Declarative Ontology Alignment Language
The Expressive and Declarative Ontology Alignment Language (EDOAL) [58] allows for repres- enting correspondences between entities of different ontologies.15 EDOAL uses classes, relations, properties, and instances constructs to represent ontological entities. Each correspondence models a relationship between the entities, i.e., equivalence, subsumption, disjointness, and membership of an individual to a class [59]. These correspondences are defined as rules. Next, the rules are executed in the Alignment API [60] to obtain the differences and perform the alignment between two ontologies.