In this section, we discuss some further connections between provenance annotations and bag semantics.
We note that by Theorem 4.5, N[X]-containment of UCQs implies bag-containment. Since
the former is decidable and the latter is not, it follows that there exist UCQs for which bag- containment holds butN[X]-containment does not. This justifies the “⇒
´
” betweenN[X] andN in Figure4.2(d). Also, we can show that:Proposition4.25. For containment of UCQs, we have
1. N6⇒B[X]andB[X]6⇒N 2. N6⇒Why(X)andWhy(X)6⇒N 3. N⇒
´
Lin(X)This justifies the “⇒
´
” betweenNand lineage in Figure4.2(c) and shows thatNis incomparablethere withB[X]andWhy(X).
Next, the “⇔” betweenNandN[X]in Figure4.2(d) follows from the following result: Theorem4.26. For UCQsP, ¯¯ Q we haveP¯≡NQ iff¯ P¯≡N[X] Q¯
Proof. N[X] ⇒Nfollows from Theorem4.5. We prove N⇒N[X]by contrapositive. Suppose
¯
P6≡N[X] Q. Then for some¯ N[X]-instanceIand tuplet, we have ¯P(I)(t) =Aand ¯Q(I)(t) =Band A6=B. Since AandBare non-identical polynomials, one can always find a valuationν :X→N
such thatEvalν(A)6=Evalν(B). By Proposition2.5, we have ¯P(ν(I)(t))6=Q¯(ν(I)(t)). Since ν(I)
is anN-instance, it follows that ¯P6≡NQ. Therefore¯ N[X]6⇒N, as required.
By Theorem4.23it follows from the above that bag equivalence of UCQs is also the same as
isomorphism. Prior to receiving the reviews of the paper upon which this chapter is based, it seemed to us that the community considers the decidability of equivalence of UCQs under bag semantics an open problem. However, one of the referees pointed out (as related work on bag-set semantics) the papers [30, 31]. [30] stated the result that bag-set equivalence of UCQs (called
disjunctive queriesthere) is the same as isomorphism, and added as an observation that this also holds for bag semantics. The outline of the proof of the bag-set semantics result is provided in [27]
and although bag semantics is not discussed further there we have observed that, in fact, results on bag-set semantics do correspond to results on bag semantics via the followingtransfer lemma:
Lemma4.27. There exists a mapping ϕ :CQ→CQ (which we extend to UCQs by applying it compo-
nentwise on CQs), a mapping f from bag instances to set instances, and a mapping g from set instances to bag instances, such that for any UCQQ, bag instance I, and set instance J, we have¯
1. JQ¯K I = JϕQ¯K f(I) 2. JϕQ¯K J= JP¯K g(J)
whereϕQ¯ is the result of ϕapplied toQ.¯
Proof. We first define ϕ: CQ→CQ, as follows. LetQbe a CQ over schemaΣ, and letΣ0 be the
schema obtained fromΣby replacing eachn-ary relational predicateRwith ann+1-ary relational predicateR0. Then ϕmaps Qto the CQ ϕQover schema Σ0 obtained from Qby replacing each join atomR(x1, . . . ,xk)with an atomR0(x1, . . . ,xk,u)whereuis a fresh variable. For example, if Qis the CQ
Q(x,y) :- R(x,y),R(y,z)
then ϕQis the CQ
ϕQ(x,y) :- R(x,y,u),R(y,z,v)
We extendϕto map a UCQ ¯Qto a UCQ ϕQ¯ by applying it componentwise on CQs.
Next we define an encoding f of bag-instances over schemaΣto bag-set instances over schema
• f maps a tuple t in relation R with multiplicity k to kdistinct tuples t01, . . . ,t0k in R0 each obtained fromtby appending a fresh constant in the last column
• gassigns to tupletinRthe multiplicitykwherekis the number of tuplest0inR0 such that tandt0agree on all columns oft
It is straightforward to verify thatϕ, f, andgsatisfy the conditions required by the Lemma.
Lemma4.27implies that bag-containment (bag-equivalence) of CQs/UCQs is polynomial time
reducible to bag-set containment (bag-set equivalence). Moreover, for UCQs ¯P, ¯Q, the transforma- tion ϕ defined in the proof of Lemma4.27satisfies ¯P ∼= Q¯ iff ϕ(P¯) ∼= ϕ(Q¯). Thus Lemma 4.27
transfers to bag semantics the isomorphism results for equivalence under bag-set semantics.
Trio
For CQs,Trio(X)-containment turns out to coincide withWhy(X)-containment:
Theorem4.28. For CQs P,Q we have PvTrio(X)Q iff PvWhy(X)Q.
Therefore, Theorem4.11(Theorem4.12) applies toTrio(X)-containment (equivalence) as well,
and we have a “⇔” betweenTrio(X)andWhy(X)in Figure4.2(a) and Figure4.2(b).
To establish the decidability ofTrio(X)-equivalence of UCQs, we note thatTrio(X)contains an embedded copy ofN, henceTrio(X)⇒Nfor UCQs. Combined with Theorem4.26this implies: Theorem4.29. For UCQsP, ¯¯ Q we haveP¯≡Trio(X)Q iff¯ P¯∼=Q.¯
This justifies the “⇔” betweenTrio(X)andNin Figure4.2(d).
To establish decidability in pspace of Trio(X)-containment of UCQs, we follow a similar ar-
gument as used forN[X]in Section4.4. However, this time the argument is complicated by the
fact that unlikeN[X],Trio(X)cannot easily “simulate its own computations” by factoring them through calculations involving abstractly-tagged databases. As a simple example, consider the CQs
Q1(x,y):-R(x,z),R(z,y) Q2(x,y):-R(x,y)
and consider the followingTrio(X)-relationRand its abstractly-tagged version:
R def= a a 2 b c pq abTrio(X)(R) = a a r b c s Note that JQ1K R(a, a) = 4 while JQ2K
R(a, a) = 2. On the other hand,
JQ1K
abTrio(X)(R)(a, a) =
JQ2K
abTrio(X)(R)(a, a) =p(recall thatTrio(X)“drops exponents” in its computations). Hence we do
To overcome this limitation, we introduce variation of abstractly-tagged instances called k- abstractly tagged instances. The tags in these instances aresumsofkfresh variables fromX. As we shall see, these turn out to suffice for Trio(X) to “simulate its own computations” for UCQs of degree≤k.
To make this precise, we order the variables ofX = {x1,x2, . . .}, and we define the set Sk ⊆
Trio(X)of sums ofkfresh variables fromX:
Sk
def
= {(x1+· · ·+xk),(xk+1+· · ·+x2k), . . .}
The k-abstractly tagged version abk(I) of a Trio(X)-instance I is the Trio(X)-instance obtained by replacing each tuple’s annotation inIwith a fresh element ofSk. For example, considering again theTrio(X)-relationRfrom earlier, we have
abk(R) =
a a r+s b c u+v
Now, we defineTriok(X)⊆Trio(X)to be the set of annotations fromTrio(X)obtained via calcula- tions of degree≤kinvolving elements ofSk:
Triok(X)def= {Evalν(a):a∈N[X],ν:X→Sks.t.ahas degree≤kandνis injective} Proposition4.30. Suppose valuationν :X →Sk is injective. Then for any a,b∈ N[X]of degree ≤k, a=b iffEvalν(a) = Evalν(b). As a consequence, any element c∈ Triok(X)decomposes uniquely into a sum of products of elements of Sk.
To illustrate, consider the previous example, and note thatJQ1K
abk(R)(a, a) =r+2rs+swhich
decomposes uniquely to the expression (r+s)(r+s) involving only tags fromabk(R). On the other hand,JQ2K
abk(R)(a, a) =r+sand now we have enough information to recover the results
ofJQ1K
Rand
JQ2K
R(by replacingr+swith 2 in the calculations). We are now ready to state the analog of Lemma4.20forTrio(X):
Lemma 4.31. Suppose P, ¯¯ Q ∈ UCQ have degree ≤ k, and suppose I is a Trio(X)-instance such that
JP¯K I 6≤ Trio(X)JQ¯K I. Then JP¯K abk(I)6≤ Trio(X)JQ¯K abk(I).
Proof. (Sketch) Let I0 = abk(I)and letν : Sk →Trio(X) be the valuation which records how the annotations from Iwere replaced by elements ofSk. (Thusν(I0) = I.) We claim thatνextends to
a mappinghν :Triok(X) →Trio(X) that behaves like a semiring homomorphism, so long as the computations stay withinTriok(X):
1. h
2. for alla,b∈Triok(X),h
ν(a+b) =hν(a) +hν(b)
3. for alla,b∈Triok(X), ifa·b∈Triok(X)thenh
ν(a·b) =hν(a)·hν(b)
Using Lemma4.31, we can show thath
νexists and is uniquely determined by the above properties.
Since ¯P, ¯Q have degree ≤ k, it is not hard to see that all annotations in JP¯K
I0 and
JQ¯K
I0 are in
Triok(X). Also, it is not hard to show thathν enjoys two relevant properties of proper semiring
homomorphisms with respect to query evaluation:
• hv is compatible with the natural order: ∀a,b∈ Triok(X), if a≤Trio(X) bthen hν(a) ≤Trio(X) hν(b)
• Query evaluation commutes withhν:hν(JP¯K
I0) = JP¯K hν(I0)= JP¯K Iandh ν(JQ¯K I0) = JQ¯K h(I0)= JQ¯K I
As a consequence, we have thatJP¯K
I0≤ Trio(X)JQ¯K I0implies JP¯K I≤ Trio(X)JQ¯K I.
Thus, we have established a bound on the size of annotations of the counterexamples that need to be considered. The last step is to additionally bound the number of tuples and establish our “small counterexample property”:
Theorem4.32. For UCQsP, ¯¯ Q of degree≤k, ifP¯6vTrio(X)Q, then¯ JP¯KI6≤
Trio(X)JQ¯K
Ifor a k-abstractly taggedTrio(X)-instance I containing at most`tuples, where`is the maximum number of atoms occurring in a CQ inQ.¯
Proof. (Sketch) Very similar to the proof for Theorem4.21.
Corollary4.33. Trio(X)-containment of UCQs is decidable inptime.
Finally, we note that one can find examples of UCQs showing that N[X]
´
⇒ Trio(X) and Trio(X)´
⇒N, as indicated in Figure4.2(d).4.5
Datalog
We have seen that for UCQs, containment under various provenance semantics turns out to be decidable, even for models such asN[X]andTrio(X)which are closely related to bag semantics. Is it possible that these decidability results extend to Datalog? We leave this question open for most of the provenance models considered in this dissertation, but give two results in this section suggesting that the answer is likely negative, even for the easier question of the two, equivalence.
Under set semantics, Shmueli [128] has shown that equivalence (and therefore also contain-
ment) of Datalog queries is undecidable. This result immediately extends toK-containment and K-equivalence of Datalog programs forKa distributive lattice by dint of the following observation:
Theorem 4.34. For any distributive lattice K and Datalog programs Q1,Q2, we have Q1 vK Q2 iff
Q1vBQ2.
This generalizes an earlier result [56] (later rediscovered in [62]) from UCQs to Datalog pro-
grams. Since PosBool(X) is a distributive lattice, this settles the question negatively for that provenance model.
For the other provenance models, it is possible that undecidability of Datalog-equivalence may be established by reduction from the same problem for set semantics, along the lines of the following:
Theorem4.35. N∞-equivalence of Datalog programs is undecidable.
Proof. By reduction from the same problem for set semantics. IfQis a Datalog program, letQ0be the Datalog obtained fromQby renaming the output predicate toQ0, and adding one more rule:
Q0(x1, . . . ,xn):-Q0(x1, . . . ,xn)
where nis the arity of the output predicate. This rule effectively “amplifies” the multiplicity of any output tuple in the query result to∞. Now given Datalog programsQ1,Q2, letQ01 and Q20
be the corresponding Datalog programs derived from them as above. We claim thatQ1≡BQ2iff
Q1≡N∞ Q2. To see this, consider an arbitrary inputN∞-instance I, and let J be the set instance
corresponding to its support. For any output tuplet, observe that
JQiK J(t) =0 ⇐⇒ JQ 0 iK I(t) =0 JQiK J(t) =1 ⇐⇒ JQ 0 iK I(t) =∞
Since any set instance J corresponds to the support of someN∞-instance I, this establishes the claim.
Appendix
Definition4.36(Semiring homomorphism). Let K1,K2be semirings. A mappingh : K1 →K2is
called asemiring homomorphism if h(0) = 0, h(1) = 1, and for all a,b ∈ K1, we have h(a+b) =
Proposition4.37. Let K1,K2be naturally-ordered commutative semirings. If h:K1→K2is a semiring
homomorphism then for all a,b ∈ K1, a≤K1 b =⇒ h(a) ≤K2 h(b). If h is also surjective, then for all
a,b∈K1, a≤K1 b ⇐⇒ h(a)≤K2 h(b).
Proof. Straightforward calculation.
Given a semiringKdefine † :K→Bas follows: †(0) def= false
†(a) def= truewhena6=0
Proposition4.38. The following are equivalent:
1. †is a semiring homomorphism
2. K satisfies
a) 06=1
b) a+b=0implies a=0or b=0 c) ab=0implies a=0or b=0
A semiring K is calledpositive if it satisfies either of the (equivalent) statements in Proposi- tion4.38.
Definition4.39(Congruence relation). If Kis a semiring and≈is an equivalence relation on K, then we say that≈ is acongruence relationonK if a≈ a0 and b ≈ b0 impliesa+b ≈ a0+b0 and a·b≈a0·b0.
Definition4.40(Quotient semiring). LetK be a semiring and let≈be a congruence relation on K. If a ∈ K then denote the equivalence class ofa in≈ by a/≈. Then the quotient of K by ≈is the semiring whose domain is the set K/≈ of equivalence classes of≈, 0def= 0K/≈, 1
def
= 1K/≈,