4. Burge y el anti-individualismo de los contenidos mentales
4.3 Diferencias entre los argumentos de Burge y Putnam
4.10.1 LIMITATION OF RA AND SQL
In the last section we mentioned the need for extending SQL. In fact, there are other limitations which makes various extensions desirable. One particularly important feature is lack of recursion in both relational algebra and SQL. In fact, lack of recursion is an important reason why embedded SQL is needed.
Consider retrieval of all ancestors for John in the following ancestor relation. (Table 4.10) You cannot do it in RA or SQL. Using formal languages extended from RA or using embedded SQL, you can consider use of a loop (or recursion) to handle this.
Table 4.10 Parent relation
Child Parent Tom Mary Dave Tim Mary Tim Tim Bob
In the rest of this section, we briefly examine the issue of deductive databases. Particularly, we use Datalog to illustrate how relational algebra can be extended.
4.10.2 BASICS OF DATALOG
4.10.2.1 EDB and IDB
A database is a model of some set of integrity constraints, and a query is some formula to be evaluated with respect to this model [Reiter 1984]. From the viewpoint of logic, a DBMS can be seen as a query answering system that views facts (tuples) as axioms of a theorem and queries as the conclusion of a theorem. The inference mechanism provided with logic can be used to deduce the query on the basis of the set of facts and rules. In addition, logic can be used as a uniform language for expressing facts, rules, programs, queries, views, and integrity constraints.
Datalog is the simplest model of deductive databases, which are databases with inference power. Datalog is a version (or variation, not really subset) of Prolog (as discussed in Chapter 3) suitable for database systems. A Datalog program consists of two parts: an extensional database and an intensional database, as discussed below.
• Extensional database (or EDB): This part contains the actual instances in a conventional relational database. It consists of predicates whose relations (instances) are stored in the database; it consists of facts (tuples). As we have already learned in Chapter 4, the following predicates usually are expressed in a table, but can also be expressed using predicates:
parent (tom, mary). parent (mary, bob). parent (ron, john). parent (ann, john).
• Intensional database (or IDB): This part contains rules involving predicates (namely, relations). They are defined by logical rules; they are actually views.
The following is an example of IDB:
ancestor (X, Y) :- parent (X, Y).
ancestor (X, Y) :- parent (X, Z), ancestor (Z, Y).
Note that the same predicate may be associated with both EDB and IDB. For example, we can define "grandfather" as:
Grandfather(tom,john). %This is part of EDB Grandfather(X,Y) :- father(X,Z), father(Z,Y).
%This is part of IDB
4.10.2.2 Recursion
Note that the definition of intensional database can use recursion; for example, predicate "ancestor" in both head and body of the same rule. This is a very important property of using intensional databases.
Before we go on, let us briefly summarize what we have achieved so far. The use of IDB makes recursion be introduced in a database program. Also please note that in our discussion we did not mention negation (namely, we did not consider to negate a predicate) -- in fact, at this point we have not included negation in our simple Datalog model. The relationship among RA, RC and the simplest Datalog model can be described as following:
A query can be converted from RA to nonrecursive Datalog;
• A query can be converted from safe, nonrecursive Datalog possibly with negated subgoals to RA;
• A query can be converted from safe DRC to safe nonrecursive Datalog;
• A query can be converted from RA to safe nonrecursive Datalog to safe DRC.
Therefore, we have established the equivalence of the following: RA; safe, nonrecursive datalog programs with negation; safe DRC; safe TRC. In summary, we have the following two formulae:
(Safe) Datalog = RA + recursion - negation RA = (Safe) Datalog - recursion + negation
These two formulae indicate what is lacking from the basic version of Datalog is negation. In the following, we take a look at this issue.
4.10.2.3 Recursive queries with negation in rule body: Using stratification
Negation can be added into Datalog by introducing new concepts related to stratification. A stratified program is a program whose tables can be classified
into strata. A stratified program is evaluated stratum by stratum, starting with stratum 0.
In order to explain the basic idea of handling recursive queries with negation in rule body, we informally introduce some terminology. We use the term strata to denote layers in a Datalog program. The intuition is to process the Datalog program in a stratum-by-stragum fashion. Furthermore, we can define stratified rules: rules are stratified if whenever there is a rule with head predicate p and a negated subgoal with predicate q, there is no path from p to q (namely, p does not depend on q, directly or indirectly).
A simple example of stratified program is taken from [Ullman 1989]. This program is recursive, because of (2) (although it actually does nothing).
(1) p(X) :- r(X). (2) p(X) :- p(X).
(3) q(X) :- s(X), not p(X).
In this program, p depends on r, which does not involve negation. Q depends on s and p, but the calculation of p is already done. (q is at a higher stratum than p.) We can use dependency graph to depict how the predicates are depending on each other, as shown in Figure 4.2. Note that in a dependency graph, nodes are predicates. Arcs in the dependence graph indicate how predicates depend on each other: There is an arc from predicate p to predicate q if there is a rule with a subgoal whose predicate is p and with a head whose predicate is q. Using dependency graph, we can easily check a recursive program: A program is recursive if its dependency graph has one or more cycles. There are three nodes in Figure 4.2, p, q, and r. There is an edge from r to p, due to rule (1). There is also an edge from p to q, due to rule (3). Due to rule (2), there is a cycle local to node p, which indicates the Datalog program is recursive. The good news is that, however, although q depends on p (in rule 3, where p is negated), p does not depend on q. If the calculation of p involves q, we will be in trouble.
Figure 4.2 A dependency graph
Stratification has an important role in deductive database reasoning. Recall that we have the following relationships:
(safe) Datalog = RA + recursion - negation RA = (safe) Datalog - recursion + negation.
Stratified Datalog with negation subsumes both Datalog and RA, and thus plays an important role in deductive databases. An in-depth discussion of Stratified Datalog can be found in [Ullman, 1989].
(2) p A (3)
(1) q r
4.10.3 DEDUCTIVE QUERY EVALUATION
We now briefly discuss how to evaluate (or process) a deductive query (namely, how to get answer(s)). One important concern here is how these methods apply to recursive queries.
4.10.3.1 Bottom-up versus top-down
Top-down (query-driven, similar to Prolog): It starts from query, finds the head of a rule to match, then propagates the variable binding from head to body (from the first subgoal to the last subgoal). The problem of determining precisely the relevant facts is difficult to solve. "Pure" top-down processing for recursive queries has intrinsic problems, and is avoided entirely.
Bottom-up (data-driven):It starts from using facts, but does not consider the query (facts used may not be useful to answer a query at all). Note rules are used in this manner: first, values of the variables are determined to satisfy the body (RHS), then the variable bindings are propagated to the head (LHS). We can use a bottom-up proof procedure for computing consequences of KB until the result does not change. The final C generated in the algorithm is called a fixed point because any further application of the rule of derivation will not change C. So the fixed point is actually the solution of the given problem. A fixed point of the Datalog equations (with respect to EDB R1, …Rk) is a solution for the relations corresponding to the IDB predicate to the IDB predicates of these equations.
In the following, we introduce two bottom-up query processing methods, using the following example. We assume that "parent(X, Y)" is in EDB.
ancestor(X,Y):- parent(X,Y).
ancestor(X,Y):- parent(X,Z), ancestor(Z,Y).
Query: ancestor(X, tom). (Namely, find all of Tom's ancestors.)
• Naive method: It can be performed using RA. The problem of this method is that it does redundant, useless work, because it does not take advantage of the actual query: it generates all the facts that can be derived, then selects those related to the query. As an example, we use naïve method for finding Tom's ancestors ("=" denotes assignment):
ancestor = ∅; %initialization While ancestor changes do
ancestor = ancestor ∪ parent |×| ancestor %calculating ancestor for all persons select ancestors with "Tom" as a descendent
• Semi-naive method: It applies rules to new tuples produced at the previous step only. That is, it focuses on the change. In this sense, it uses an "incremental" method for query processing. As an example, we use semi-naïve method for finding Tom's ancestors ("=" denotes assignment,
∆ stands for change, and denotes the join operator defined earlier in this chapter):
∆ ancestor = parent; ancestor = ∆ancestor; While ∆ancestor changes do
∆ancestor = parent ∆ancestor;
%calculating changes of ancestors;
%note that the naïve method does not record the actual change of ancestors.
select ancestors with "Tom" as a descendent 4.10.3.2 Magic sets approach for recursive query processing
The discussion made in this section so far can be summarized as follows. Top-down approach has the advantage of being efficient (because it is query- driven) but is not a realistic method to use, while bottom-up approaches (even semi-naive approach) are not efficient. In fact, in the above example, although the query is only concerned with John's ancestor, this fact is not considered until the last step. To overcome the problems of existing methods, magic sets approach employs a rule-rewriting technique so that bottom-up processing is combined with a top-down flavor. The purpose is to discard irrelevant tuples early in the bottom-up query processing. How to tell the query processing system which information is relevant and which is not? The trick is to use the given (known) portion of the query to form a "pseudo-fact" (the "magic" thing!) so that bottom-up processing can take advantage of top-down processing (while avoid the troubles of using "pure" top-down processing). It is still bottom-up processing but only searches for paths related to query. From the perspective of relational algebra, this can be considered as pushing selection to avoid irrelevant inferences.
The magic set rule-writing algorithm given by [Ullman 1989] (Section 13.1 in Volume II) describes detailed steps of rewriting. The result has five groups of rules. The following is revised from an example discussed there:
r1: same_gen(X,X) :- person(X). r2: same_gen(X,Y) :-
parent(X,Xp), same_gen(Xp, Yp), parent(Y,Yp). Note that "same_gen" is a recursive predicate, and the program is thus a recursive one. The first rule says a person is always at the same generation of himself (or herself). This is the base case of the recursive. The second rule says X and Y are the same generation if their parents are at the same generation. This is the general case of the recursion.
Note that only the IDB part and the query are shown there; EDB facts (tables) such as person or parent are not shown. Here we will not discuss how to rewrite the rules; we will only explain how this new (namely, re- written) program will be processed to answer the query. The query is "same_gen(john, W)", namely, find W (the second argument which is a variable) who is in the same generation with person "john" The magic sets approach requires to construct (from the query) a magic predicate "m- same_gen(john)" as shown in Group V in the algorithm, which indicates that
what is to be retrieved should be associated with that particular individual "john" only. This magic predicate is treated as a fact for further processing; in this sense, the magic sets approach uses bottom-up. Take a look at the attachment (with the instructor's remark in handwriting). The purpose of this algorithm is to convert the original program so that the re-written program can be processed in the manner as described here. Magic sets approach has been used for maintaining materialized views and query optimization (e.g. [Staudt and Jarke 1996, Harinarayan, 1997]).