• No se han encontrado resultados

The semantic web technology stack proposed by Tim Berners-Lee et al. (2001) contains a number of layers. This section outines the key elements that are used in the thesis including RDF, Resource Description Framework Schema (RDFS), Ontology Web Language (OWL) and SPARQL.

RDF is a mark-up language built on top of Internationalized Resource Identifiers (IRI)2 which is a generalization of URIs3. RDF provides an important data representational model and syntax for describing Web resources and their relationships. The underlying data struc- ture of RDF is a labeled directed graph whose syntactic construct consists of triple state- ments including three components subject, object and predicate (Horrocks, 2008). A triple statement specifies a single edge (predicate) connecting two nodes (subject and object). For example, in a triple statement dbr:Don_Quixote, dbp:author, dbr:Miguel_de_Cervantes, dbr:Don_Quixote is the subject, dbp:author is the predicate, and dbr:Miguel_de_Cervantes is the object. Each RDF resource can be described by a number of predicates whose values are expressed by the objects. The predicate may be unary or binary. Specifically, the unary predicates connect with value objects (e.g. number or literal string), while binary predicates point to another resources. As illustrated in Table 2.1, the rdfs:label and rdf:type are unary predicates indicating the name and characteristic of entity dbr:Don_Quixote. The author- ship relation between the entity dbr:Don_Quixote and the entity dbr:Miguel_de_Cervantes is a binary predicate. The advantage of RDF triple statements is the interoperability across systems in extending and integrating common RDF resources. Triple statements store the knowledge of semantic resources and can be perceived as a graph, where subjects and objects of RDF statements represent nodes of graph, while predicates denote edges.

A small subgraph of DBpedia related to dbr:Don_Quixote is shown in Figure 1.2, which shows that the blue resources and green resources are connected by a special predicate rdf:type. It gives KGs capability of defining meanings to certain resources, such as the triple dbr:Don_Quixote, rdf:type , dbo:Book. Specifically, the rdf:type denotes the class- instance relationship to represent the knowledge that dbr:Don_Quixote is an instance of book. The term “book” is a special word that is able to express the abstraction of real world entities. Such abstract terms are usually defined as concepts in ontologies in order to provide well-defined meaning to identify and distinguish entities. In computer science, an ontology is a model of the world that introduces vocabulary describing various aspects of the domain being modeled and provides explicit specification of the intended meaning of that

2https://www.w3.org/International/ 3http://www.ietf.org/rfc/rfc3986

Subject Predicate Object

dbr:Don_Quixote rdfs:label “Don Quixote” dbr:Don_Quixote rdf:type dbo:Book

dbr:Don_Quixote dbp:author dbr:Miguel_de_Cervantes dbr:Spain dbp:capital dbr:Madrid

dbr:Madrid rdf:type yago:City108524735

dbr:Spain rdf:type dbo:Country

Table 2.1: DBpedia Triples about Don Quixote, Madrid and Spain

vocabulary (Horrocks, 2008). Moreover, the specification often includes a concept taxonomy to distinguish various conceptual features, such as singers are artists. RDFS is a basic RDF vocabulary description language that extends RDF and consists of several resources to define concepts in ontology such as rdfs:Class and rdfs:subClassOf. For example, the concept singer and artist can be defined as a rdfs:Class, while their hierarchical relations can be represented by the predicate rdfs:subClassOf in a RDF triple singer, rdfs:subClassOf, artist. In fact, due to the commonality of concept taxonomies in KGs, in order to represent both super-concept and sub-concept relation, Simple Knowledge Organization System (SKOS)4 is usually used to describe large scale concept taxonomy, such as Wikipedia category in DBpedia.

Moreover, as RDFS can only define ontologies with very limited elements, OWL (McGuin- ness et al., 2004) becomes a de-facto ontology language standard (Horrocks, 2008) of KGs in order to express various relationships between semantic resources with more details, such as dbp:author, dbp:capital, and other logical features. OWL is fundamentally built on top of Description Logics (DL) (Baader, 2003) consisting of logic-based knowledge-representation formalism which is described in terms of instances, concepts and properties. Instances cor- responds to entities (such as dbr:Don_Quixote), concepts (also called “classes” in RDF such as dbo:Book ) describe sets of instances sharing similar characteristics, and properties spec- ify relationships between concepts and instances. In consequence, OWL is able to provide logical expressions, local properties, and to define certain domain and range for predicates. In addition, description capabilities such as constructs (e.g. union, intersection) and axioms (e.g. subclass and equivalent class) are also available in OWL.

1 PREFIX owl:<http://www.w3.org/2002/07/owl#>

2 PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 3 PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

4 PREFIX dbr: <http://dbpedia.org/resource/>

5 PREFIX dbp: <http://dbpedia.org/property/> 6 PREFIX dbo: <http://dbpedia.org/ontology/>

7 PREFIX yago: <http://dbpedia.org/class/yago/>

8 SELECT ?singer WHERE {

9 ?singer rdf:type yago:Singer110599806 .

10 ?singer dbp:nationality ?country .

11 dbr:Don_Quixote dbp:author ?writer .

12 ?writer dbp:nationality ?country .

13 }

Table 2.2: Examples of the usage of SPARQL to retrieve a list of singers.

In addition to the languages for describing the resources and defining meanings of metada, SPARQL5 is a W3C recommendation of the RDF query language which can query and manipulate data stored in RDF. Thus, this is also a semantic query language for re- trieving entity-centric information from RDF-based KG. Furthermore, SPARQL enables to formulate queries with triple patterns like triple statement stored in RDF database. There- fore, triple patterns can be viewed as graph patterns which can be executed as graph pattern matching in the specific database. Graph pattern matching can answer more complex queries to infer information based on the given triple patterns. For example, to answer the query “singers from the same country than the author of Don Quixote”, the SPARQL query shown in Table 2.2 can be used to return the proper singers. Although we do not specify who is the the author of the book Don Quixote, the SPARQL query construct a query graph to infer the author as a intermediate node in the graph query, therefore, the execution of the SPARQL query can return a list of singers having the same nationality of the writer of Don Quixote.