3.4 TÉCNICAS DE PROCEDIMIENTO, ANÁLISIS Y DISCUSIÓN DE
3.4.1 PROCESAMIENTO Y ANÁLISIS DE ENTREVISTAS APLICADAS
Consider a book store (like the two book stores of Section 2.4.2) that provides an online catalogue contain- ing the books it offers. Searching a book usually requires searching the book titles and maybe abstracts of the content. If a customer wants to search by topic rather than by title (e.g. “history books”), this kind of search usually misses many of the relevant entries and yields a large number of false positives. For example, the book entitled “Folket i Birka” (Swedish: “The People of Birka”, a historical novel for chil- dren illustrating the life of people in a Viking Age town) is only found when searching for “Birka”, which already requires much knowledge over the domain of interest. A “semantic” query would be able to in- clude the book “Folket i Birka” when searching for books about the “Viking Age” or “History Books for Children” without requiring to include more specific search parameters.
The Book Ontology
Using the Semantic Web, such semantic queries become feasible. The online book store might provide an ontology describing the relations between different categories of books, and the properties of these categories. The following example uses the Web Ontology Language OWL [118] for describing a simple part of this book ontology:
<rdf:RDF xmlns:owl = "http://www.w3.org/2002/07/owl#" xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs = "http://www.w3.org/2000/01/rdf-schema#" > <owl:Class rdf:ID="Book"/> <owl:Class rdf:ID="Novel"> <rdfs:label>Novel</rdfs:label> <rdfs:subClassOf rdf:resource="#Book"/> </owl:Class> <owl:Class rdf:ID="History"> <rdfs:label>History Book</rdfs:label> <rdfs:subClassOf rdf:resource="#Book"/> </owl:Class> <owl:Class rdf:ID="Classic_History">
<rdfs:label>Book about Classic History</rdfs:label> <rdfs:subClassOf rdf:resource="#History"/>
CHAPTER 5. XCERPT USE CASES Novel Historical Classic Novel Book History Mediaeval Modern
Figure 5.3: Part of a book ontology for an online book store. Solid lines indicate subconcepts, dotted lines intersection of concepts.
</owl:Class>
<owl:Class rdf:ID="Mediaeval_History">
<rdfs:label>Book about Mediaeval History</rdfs:label> <rdfs:subClassOf rdf:resource="#History"/>
</owl:Class>
<owl:Class rdf:ID="Modern_History">
<rdfs:label>Book about Modern History</rdfs:label> <rdfs:subClassOf rdf:resource="#History"/> </owl:Class> <owl:Class rdf:ID="Historical_Novel"> <rdfs:label>Historical Novel</rdfs:label> <owl:intersectionOf rdf:parseType="Collection"> <owl:Class rdf:about="#Novel"/> <owl:Class rdf:about="#History"/> </owl:intersectionOf> </owl:Class> </rdf:RDF>
This ontology describes the following hierarchy of concepts (cf. Figure 5.3): • a Novel and a History Book is a Book
• books about Classic History, Mediaeval History, and Modern History are History books.
• Historical Novel is the intersection of History Book and Novel (both referenced byrdf:about). Note that intersection is stronger than simply being the subconcepts of two concepts.
Note that OWL ontologies may be serialised in XML in many different manners, e.g. using nestedowl:Class
definitions.
Subclass Checking with Xcerpt
Using the rules for transitive closure of the Clique of Friends, checking subclasses in this hierarchy of con- cepts is straightforward. The following Xcerpt program defines a relationsubclass-of that relates con- cepts with all parent concepts based on the serialisation of the book ontology above. The first rule defines
5.3. SEMANTIC WEB REASONING
a relationimmediate-subclass-of, which provides a simplified view on the ontology data and relates a concept (variable X ) with its immediate parent concepts (variable Y ). The second rule definessubclass-of
as the transitive closure overimmediate-subclass-of(note the similarity with thefriend-of-friend
rule in the previous section). Note also the use of XML namespaces in this program.
ns-prefix owl = "http://www.w3.org/2002/07/owl#"
ns-prefix rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" ns-prefix rdfs = "http://www.w3.org/2000/01/rdf-schema#"
CONSTRUCT
immediate-subclass-of [ var X, var Y ]
FROM in { resource [ "file:books.owl" ], rdf:RDF {{ var X → owl:Class {{ rdfs:subClassOf {{
attributes {{ rdf:resource { /#(var YRef →.*)/ } }} }}
}},
var Y → owl:Class {{
attributes {{ rdf:ID { var YRef } }} }}
}} }
END CONSTRUCT
subclass-of [ var X, var Y ]
FROM or {
immediate-subclass-of [ var X, var Y ],
and {
immediate-subclass-of [ var X, var Z ], subclass-of [ var Z, var Y ]
} }
END
Checking for all parent concepts or child concepts of a specific concept is now easy. For example, the following goal retrieves all child concepts of the concept withrdfs:label“History Book” by chaining with the rules above:
ns-prefix owl = "http://www.w3.org/2002/07/owl#"
ns-prefix rdfs = "http://www.w3.org/2000/01/rdf-schema#"
GOAL
subconcepts {
all var Concept
}
FROM
subclass-of [
CHAPTER 5. XCERPT USE CASES
owl:Class {{
rdfs:label { "History Book" } }}
]
END
Annotating Books with Meta-Data
To add “semantic” meta-data to the XML document used for representing the book store database, it is necessary to annotate the data as follows. Each book is given a uniquerdf:ID, and relationships between books (identified by the value ofrdf:IDand concepts in the ontology are established (usingrdf:type). Changes to the original document are indicated by red colour:
<bibxmlns:owl = "http://www.w3.org/2002/07/owl#"
xmlns:rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <book year="1995"rdf:ID="vikinga_blot">
<title>Vikinga Blot</title> <authors>
<author>
<last>Ingelman-Sundberg</last> <first>Catharina</first> </author>
</authors>
<publisher>Richters</publisher> <price>5.95</price>
</book>
<book year="1998"rdf:ID="boken_om_vikingarna"> <title>Boken Om Vikingarna</title>
<authors> <author>
<last>Ingelman-Sundberg</last> <first>Catharina</first> </author>
</authors>
<publisher>Prisma</publisher> <price>22.95</price> </book>
<book year="1999"rdf:ID="folket_i_birka"> <title>Folket i Birka p˚a Vikingarnas Tid</title> <authors>
<author>
<last>Wahl</last> <first>Mats</first> </author>
<author>
<last>Nordqvist</last> <first>Sven</first> </author>
<author>
<last>Ambrosiani</last> <first>Bj¨orn</first> </author>
</authors>
<publisher>BonnierCarlsen</publisher> <price>39.95</price>
</book>
<book year="1997"rdf:ID="vikingar_i_¨osterled"> <title>Vikingar i ¨Osterled</title>
<editor>
<last>Larsson</last> <first>Mats</first>
<affiliation>Lunds universitet</affiliation> </editor>
<publisher>Atlantis</publisher> <price>49.95</price> </book> <owl:Thing rdf:about="#vikinga_blot"> <rdf:type rdf:resource="#Mediaeval_History"/> <rdf:type rdf:resource="#Novel"/> </owl:Thing> <owl:Thing rdf:about="#boken_om_vikingarna"> Sebastian Schaffert 129
5.3. SEMANTIC WEB REASONING <rdf:type rdf:resource="#Mediaeval_History"/> </owl:Thing> <owl:Thing rdf:about="#folket_i_birka"> <rdf:type rdf:resource="#Mediaeval_History"/> <rdf:type rdf:resource="#Novel"/> </owl:Thing>
<owl:Thing rdf:about="#vikingar_i_¨osterled"> <rdf:type rdf:resource="#Mediaeval_History"/> </owl:Thing>
</bib>
Although the constructowl:Thing looks strange, it is required by the OWL specification when de- scribing resources defined elsewhere. Note that a book is only associated with the most specific concepts it belongs to: the book “Folket i Birka” has therdf:ID folket i birka and belongs to the concepts
Mediaeval HistoryandNovel, but not explicitly to the conceptsHistoryorBook.
Ontology Reasoning: Querying by Topic
Using the book ontology, the Xcerpt rules for checking subconcepts, and the extended book document, it is now possible to perform “semantic” queries as described in the introduction. Consider a customer that is interested in “History Books”, i.e. all books belonging to either the concept itself, or to any subconcept. In Xcerpt, a query for such books can be expressed by the following rule (namespace prefixes are omitted for brevity but are defined as above):
CONSTRUCT
history_books {
all var Book
} FROM and { in { resource [ "file:bib.xml" ], bib {{
var Book → book {{
attributes {{ rdf:ID { var ID } }} }},
owl:Thing {{
attributes {{ rdf:about { /#(var ID →.*)/ } }}, rdf:type {{
attributes {{ rdf:resource { /#(var Class →.*)/ } }}, }} }} } }, subclass-of [ owl:Class {{
attributes {{ rdf:ID { var Class } }} }},
owl:Class {{
rdfs:label { "History Book" } }}
] }
CHAPTER 5. XCERPT USE CASES
This rule queries the extendedbib.xmldocument for all books, and retrieves the respective classes they belong to (by querying theowl:Thing subterms describing therdf:IDof the books). By querying the results of the rule defining the relationsubclass-of, it is then verified whether the book is indeed a “History Book”.
The rule above can be generalised for arbitrary instance tests in a straightforward manner. The following Xcerpt rule defines a relationbelongs-tothat explicitly associates a book with all concepts (“classes”) it belongs to. Note that both variablesClassandSuperClassare grouped inside theclassessubterm, as some books might yield more than one binding for the variableClass.
CONSTRUCT
belongs-to [
var Book
classes {
all var Class,
all var SuperClass
} ] FROM and { in { resource [ "file:bib.xml" ], bib {{
var Book → book {{
attributes {{ rdf:ID { var ID } }} }},
owl:Thing {{
attributes {{ rdf:about { /#(var ID →.*)/ } }}, rdf:type {{
attributes {{ rdf:resource { /#(var Class →.*)/ } }}, }} }} } }, subclass-of [ owl:Class {{
attributes {{ rdf:ID { var Class } }} }},
var SuperClass
] }
END
Note that this rule is not capable of inferring that a concrete book that is a “Novel” about “Medieaeval History” is also a “Historical Novel”, as it lacks support for OWL’s intersection construct.
Ontology Reasoning: Intersection
Recall that the ontology used in this Section also contains the concept “Historical Novel”, which is defined as the intersection of “History Book” and “Novel”. Whereas querying for “History Book” is rather straight- forward, querying for “Historical Novel” requires more complex rules taking into account the intersection of concepts.
The following Xcerpt rule builds upon the genericbelongs-torelation to also include concepts that contain intersections. For each book, it creates abelongs-to-extendedterm containing the book and all
5.3. SEMANTIC WEB REASONING
concepts (contained in the subterm labelledclasses), including those defined by the intersection of other concepts:
CONSTRUCT
belongs-to-extended {
var Book,
classes { all var Class, var ISClass } }
FROM and {
belongs-to [ var Book, classes {{ var Class }} ],
in {
resource [ "file:books.owl" ], rdf:RDF {{
var ISClass → owl:Class {{ owl:intersectionOf {{ }} }} }} }, not { and { in { resource [ "file:books.owl" ], rdf:RDF {{
var ISClass → owl:Class {{ owl:intersectionOf {{
owl:Class {{
attributes {{ rdf:about { var CRef } }} }}
}} }},
var SomeClass → owl:Class {{
attributes {{ rdf:ID { var CRef } }} }}
}} },
belongs-to [ var Book, classes {{ without var SomeClass }} ], }
} }
END
This rather complex rule is evaluated as follows. The first two query terms of the conjunction in the query part retrieve books with associated concepts (variablesBookandClass), and possible candidate concepts defined by intersection (variableISClass). The last part of this conjunction is a negated subquery that checks whether all of the concepts used in the definition ofISClass(bound to the variableSomeClass
by dereferencing using the variable CRef) are associated with the book bound toBook, i.e. there does not exist a concept ofISClassbound toSomeClassthat is not contained in the concepts associated with
Book. Note that, due to range restrictedness, it is necessary to query both the ontology and thebelongs-to
CHAPTER
SIX
Range Restrictedness, Standardisation Apart, and Stratification
This Chapter discusses syntactic restrictions to which Xcerpt programs in this thesis are considered to conform. These restrictions either simplify the formal semantics in Chapter 7, or avoid programming mistakes, or both. All of them are purely syntactical properties that they can be verified statically when programs are parsed.
Range Restrictedness1(Section 6.2) is a restriction on the kinds of admissible rules and goals. Rules/- Goals that are range restricted do not contain variables in the construct part that are not “justified” (i.e. bound by a non-negated query term) in the query part. Range restricted programs can be evaluated in both a backward chaining and a forward chaining fashion, whereas programs that are not range restricted often are difficult to treat in forward chaining algorithms, because results of a rule are not necessarily ground.
Stratification (Section 6.4) is a further restriction on programs that contain the grouping constructsall
andsomeor the negation constructsnotandwithout. In stratified programs, negation is only allowed if it does not affect recursive rule evaluations. Stratification avoids many of the problems that come with non-monotonic negation.
As this thesis does not intend to cover the wide areas of non-monotonic negation and different ap- proaches to forward chaining, stratification and range restrictedness are suitable assumptions for the for- malisation of Xcerpt as it is presented here. Other, less rigid, approaches are feasible and not excluded by Xcerpt per se.