CAPÍTULO I La Fijación de Precios en el Mercado y el Derecho Antitrust:
F. La Protección de la Libertad Económica de los Competidores en
The problem of RDF data validation has been researched using Description Logics considering both Open World and Closed World Assumption. The Web Ontology Language (OWL) [65] is an expressive ontology language based on Description Logics (DL). The semantics of OWL addresses distributed knowledge representation scenarios where complete knowledge about the domain cannot be assumed. Motik et al. [66] proposed an extension of OWL that attempts to mimic the intuition behind integrity constraints in relational databases. They divided axioms into regular axioms and constraints. To address the above mismatch some approaches use OWL expressions with Closed World Assumption and a weak Unique Name Assumption so that OWL expressions can be used for validation purposes such as Stardog ICV4 and Tao et al. [67].
Generally, Description Logics (DLs), in turn, bear a first-order predicate logic semantics. DLs are monotonic and adhere to the Open World Assumption (OWA). This means that negative or positive conclusions drawn from a knowledge base must be based on information explicitly present in a knowledge base. Therefore, negative conclusion may lead to possible logical issues. Under the Closed World Assumption (CWA) all non-provable expressions are assumed to be false [68]. In [68], Patel- Schneider explored Description Logics as a mean to provide the necessary framework for both checking constraints and providing CWA facilities. They utilized inference as a mean for constraint checking, which is the core service provided by Description
3.3 Knowledge Base Validation 39
Table 3.1 Summary of Linked Data Quality Assessment Approaches.
Paper Degree of Au- tomation
Goal Dataset
Feature Bizer et al.
[48]
Manual WIQA quality assessment framework enables in- formation consumers to apply a wide range of poli- cies to filter information.
Static
Acosta et al.[54]
Manual A crowd-sourcing quality assessment approach for quality issues that are difficult to uncover automat- ically. Static Ruckhaus et al.[64] Semi- Automatic
LiQuate, a tool based on probabilistic models to analyze the quality of data and links.
Static
Paulheim et al.[62]
Semi- Automatic
SDType approach using statistical analysis to pre- dicts classes of RDF resources thus completing missing values of rdf:type properties.
Static
Furber and Hepp [59]
Semi- Automatic
Focus on the assessment of accuracy, which in- cludes both syntactic and semantic accuracy, time- liness, completeness, and uniqueness.
Dynamics (Time- liness analysis) Flemming [56] Semi- Automatic
Focuses on a number of measures for assessing the quality of Linked Data covering wide-range of different dimensions such as availability, accessi- bility, scalability, licensing, vocabulary reuse, and multilingualism. Static Mendes et al.[49] Semi- Automatic
Sieve framework that uses user configurable qual- ity specification for quality assessment and fusion method. Dynamic (Time- liness analysis) Knuth et al.[60] Semi- Automatic
They outline validation which, in their opinion, has to be an integral part of Linked Data lifecycle.
Static
Rula et al.[58]
Automatic Start from the premise of dynamicity of Linked Data and focus on assessment of timeliness in order to reduce errors related to outdated data.
Dynamic (Time- liness analysis) Kontokostas et al.[50]
Automatic Propose a methodology for test-driven quality as- sessment of Linked Data.
Dynamic
Emburi et al.[61]
Automatic They developed a framework for automatic crawl- ing the Linked Data datasets and improving dataset quality.
Dynamic (Temporal Analysis) Li et al. [63] Automatic They proposed an automatic method to detect error
between multi attributes which can not be detected only considering single attribute.
Dynamic
Assaf et al.[57]
Automatic They propose a framework that handles issues re- lated to incomplete and inconsistent metadata qual- ity.
Static
Debattista et al.[35]
Automatic They propose a conceptual methodology for assess- ing Linked Datasets, proposing Luzzu, a frame- work for Linked Data Quality Assessment.
Logics. OWA makes it challenging to perform certain validation tasks. For example, a minimum cardinality constraint cannot be violated under OWA because there is always a possibility that a triple exists somewhere. Furthermore, under certain circumstances reasoners can find some inconsistencies using the axioms present in OWL model. This utility can lead to a confusion to think of ontological languages as validation languages. Nevertheless, the underpinning principles used in OWL such as the use of Open World Assumption (OWA) and Non-Unique Name Assumption, can lead to unexpected and confusing results in a validator [68, 69].
Ontology based learning is commonly defined as a field that comprises techniques for automated acquisition of ontological knowledge from data. Thus, the paradigm has shifted such that many approaches do not aim to generate a full fledged, gold- standard ontology from data anymore, but they rather focus on acquiring axioms of certain shapes such as concept definitions, atomic subsumptions, disjointness axioms. There are several works done on induction of Description Logic axioms using methods, such as:
Association rule mining (ARM). Abedjan et al. [70] present rule-based approaches for predicate suggestion, data enrichment, ontology improvement, and query relaxation. They identified inconsistencies in the data through predicate sug- gestion, enrichment with missing facts, and alignment of the corresponding ontology. Also they allow users to handle inconsistencies during query formu- lation through predicate expansion techniques.
Probabilistic graphical models (PGMs). An approach of probabilistic graphical models (PGMs) allows to generate interpretable models that are constructed and then manipulated by reasoning algorithms [71]. These models can also be learned automatically from data, allowing the approach to be used in cases where the manual building of a model is difficult or even impossible.
Statistical Relational Learning (SRL). It is a branch of machine learning that tries to model a joint distribution over relational data [72]. SRL is a combination of statistical learning which addresses uncertainty in data and relational learning which deals with complex relational structures [73].
Inductive logic programming (ILP). Buhmann et al. [74] present an approach of inductive lexical learning of class expressions by combining an existing