El Manual de l’Àrbitre
Tema 9: Confecció de l’acta
We now have the basic ingredients to implement range propagation, both from parent to child and from child to parent. Range propagation can either stem from a SID range, as introduced, for example, by selection push-down using a min-max index, or from RID ranges, as introduced by range partitioning on the current table count during parallelization of query plans. The resulting parent and child ranges respect the original clustering, allowing us to employ highly efficient merge-based join plans.
We restrict our discussion to the most complex, and interesting, range prop- agation scenario: from a child-side RID range to the corresponding parent RID range. Our goal is to generate the virtual FRID column for a given child-side range, i.e. for each child tuple in the range, fill in the RID of the matching tuple in the parent, without performing a join. This requires range propagation to find the corresponding range in the parent, as that is where the JI column and its updates are stored. Furthermore, the start of the parent range gives us a JI count, but we likely have to start decompressing at a certain offset within that count, as the corresponding child tuple is not necessarily the first in its cluster. The process is outlined in Algorithm 15, where we restrict ourselves to the start offset of the RID range, ridC, as the end is simply a matter of counting.
Algorithm 15 takes as input a child RID, ridC, which it first converts to the SID associated with that tuple, sidC. At line 2, we then convert this child
7.5. UPDATE OPERATORS 161
Algorithm 15 JI.initializeDecompressedScan(ridC)
Initialize a scan over a virtual FRID column, starting from an arbitrary LRID in the child, provided as ridC.
1: sidC ← child.RidT oSid(ridC)
2: minP ← JIS.childLoT oP arentLo(sidC)
3: (sidPsync, sidCsync) ← JIS.f indSyncP oint(minP ) 4: ridPsync← parent.SidT oRidLo(sidPsync)
5: ridCsync← child.SidT oRidLo(sidCsync) 6: skipC ← ridC − ridCsync
7: this.initScan(ridPsync) 8: this.skipScan(skipC)
9: return
SID to a parent SID. For this, we use JIS.childLoToParentLo(sidC)5 , which
finds the partition sidC falls into, and returns the current value of the MIN_P field in the corresponding JIS entry, which is a guaranteed lower-bound for the parent-side SID we are searching for. We then use Algorithm 14 to convert this (potentially dirty) minP SID to the nearest stable sync point.
Now that we have a stable SID sync-point, we need to convert it to a con- servative RID sync point, by including any PDT inserts that reside at sidPsync
in the parent or sidCsyncin the child. Note that the fact that the sync point is
stable, guarantees us that none of the child inserts refers to an earlier partition, so that the first of them is a cluster head, either for the original parent-side sync tuple, or for a newly inserted one (at sidPsync).
Given the pessimistic RID sync point, at line 6 we compute how many tuples we can skip to reach our ridC of interest. We then initiate a (decompressing) join index scan from the safe sync RID. The skipScan routine then performs a Merge of the JI column, producing up-to-date counts, discarding the first skipC worth of cumulative counts. Now the join index scan is positioned at the destination ridC, and we are ready to produce uncompressed FRIDs.
7.5
Update Operators
Now that we know how to add updates to a PDT and the impact they might have on indexing structures, we are ready to provide a high-level outline of the full update operators, Insert, Delete and Modify.
7.5.1
Insert
The Insert operator adds a batch of tuples to a table. The batch should be sorted according to the sort key ordering of the destination table, and enumerated by the RID positions to insert each tuple at. The RID positions can be obtained by MergeFindInsertRID (see Section 6.4.6), the output of which can be fed directly into Insert. If we insert into the child side of a join index association, each new tuple should furthermore be annotated with the parent-side RID (FRID) of the
5
The naming of this routine indicates that we are converting the low end of the child-side range, and wish to convert this to a conservative (safe) lower bound in the parent. Similar routines exist for high ends, and also for the inverse direction, from parent to child.
162 CHAPTER 7. INDEX MAINTENANCE
tuple it refers to. These FRIDs are obtained from a foreign-key join between the insert batch and the parent table.
Algorithm 16 Table.Insert(tuples, rids)
Inserts an ordered batch of tuples into a Table (this) at the RID positions given in rids. The batch should be ordered on the sort key of the destination table, which implies that rids is non-decreasing. If this acts as the referencing side in a join index relationship, each tuple must be annotated with F RID, the parent-side RID of the tuple being referenced.
1: i ← 0
2: for (tuple, rid) in (tuples, rids) do
3: rid ← rid + i
4: tpdt.AddInsert(rid, tuple)
5: sid ← this.RidT oSid(rid)
6: minmax.updateAll(sid, tuple)
7: if isJoinP arent(this) then
8: tpdt.InitJoinIndexCounts(rid)
9: end if
10: if isJoinChild(this) then
11: f rid ← tuple[”F RID”]
12: ji.parent.tpdt.IncrementJoinCount(f rid, 1)
13: f sid ← ji.parent.RidT oSid(f rid)
14: lsid ← this.RidT oSid(rid)
15: ji.jis.T estAndSetM inF oreignSid(f sid, lsid)
16: end if
17: i ← i + 1
18: end for
Algorithm 17 JIS.TestAndSetMinForeignSid(f sid, lsid)
Checks whether we should update the MIN_P field of the JIS partition lsid falls into. If fsid is smaller than the current MIN_P value, we update it to f sid.
.
1: partitionIdx ← lsid/this.partitionSize
2: mutex_lock(this.mutex)
3: if f sid < this.partition[partitionIdx].M INP then 4: this.partition[partitionIdx].M INP ← f sid 5: end if
6: mutex_unlock(this.mutex)
The Insert operator itself is outlined in Algorithm 16, where we iterate over the tuples in the insert batch. We add each tuple to the PDT, adjusting the insert-RID by i to accommodate for the shift introduced by tuples inserted during earlier iterations. The next step finds the corresponding SID, and uses it to update the global min-max index of the destination table (using a mutex for protection from concurrent modifications).
If the destination table participates as a parent in one or more join index as- sociations, we initialize the join index (JI) counts to zero. For child-side inserts,
7.5. UPDATE OPERATORS 163
we increment the +JI field of the referenced parent tuple, at FRID, by one, to ac- count for the new reference. Finally, we convert both FRID and the local insert RID, rid, to (FSID, LSID), which we pass to the JIS.TestAndSetMinForeignSid routine (Algorithm 17) to maintain MIN_P of the JIS partition LSID falls into
6. As with min-max, the JIS index is maintained “optimistically”, meaning that
we directly manipulate the global data structure, accepting potential pollution in case a transaction happens to abort.
7.5.2
Delete
Delete is similar to Insert in that we need to manipulate join index counts. However, we do not perform maintenance on min-max and JIS indices. When deleting from the parent table in a join index association, we should ensure that referential integrity constraints are not violated. I.e. a parent tuple may not have any child-side references at the time we try to delete it. Given the reference counts in the JI column, we can easily verify that the current count is 0, as is done for all incoming join indices in Algorithm 18.
Algorithm 18 Table.Delete(rids, f rids)
Deletes the tuples at RID positions given in rids from a table (this). The optional frids argument must be provided in case we delete from the referencing (i.e. child) side in a join index association, and should contain, for each deleted tuple, the parent-side RID of the tuple being referenced.
1: qpdt = pdt_create()
2: for (rid, f rid) in (rids, f rids) do
3: if isJoinP arent(this) then
4: for jiColumn ← this.N extJoinIndexColumn() do
5: if jiColumn.GetJoinCount(rid) 6= 0 then
6: return “ERROR: referential integrity violation”
7: end if 8: end for 9: end if 10: if isJoinChild(this) then 11: ji.parent.tpdt.DecrementJoinCount(f rid, 1) 12: end if 13: qpdt.AddDeleteBySid(rid) 14: end for 15: tpdt.P ropagate(qpdt) 16: qpdt.destroy()
When deleting from a child-side table, for every deleted tuple we also need to know the foreign-RID (FRID) that identifies the referenced tuple in the par- ent. Those FRIDS can be readily obtained by scanning along the up-to-date and decompressed join index in a Delete plan. They are used in the call to DecrementJoinCount to decrement the -JI field of the referenced tuple in the parents trans-PDT.
6 In a real-world “vectorized” implementation, we first gather a batch of (FSID, LSID)
164 CHAPTER 7. INDEX MAINTENANCE
Algorithm 18 also shows the usage of a fourth PDT layer, the query-PDT, identified by qpdt. It starts out empty, and contains updates with respect to the RID image produced by the current trans-PDT, i.e. the SIDs in qpdt, refer to the RID enumeration generated by a merge of the trans-PDT. The purpose of the query-PDT is to provide a query-local isolation layer to effectively sort an arbitrary (i.e. unordered) sequence in rids on the fly. Recall that for Insert, where new tuples either come in sort-key order, or are appended to the end of a table, we had to adjust the destination RID to compensate for previously inserted tuples, allowing in-place modification of the trans-PDT. If such an ordering can not be assumed, we avoid direct manipulation of the trans-PDT, and treat input rids as SIDs of the query-PDT, as illustrated by our use of AddDeleteBySid. When all input RIDs are processed, we use Propagate to migrate the updates from the query-PDT into the trans-PDT.
7.5.3
Modify
All we need to do in case of Modify, is to add a PDT update for each attribute being altered, and to inform the min-max index about the changes to relevant columns, so that it can check for changes to minimum or maximum attribute values in the relevant SID range. The process is summarized in Algorithm 19. Modify never changes the SID or RID enumeration of tuples, and modifications of sort key attributes are rewritten into Delete followed by Insert.
Algorithm 19 Table.Modify(colnos, valueLists, rids)
Updates a list of attributes identified by colnos, for all tuples at positions in rids with the corresponding attribute values from valueLists.
1: for (valueList, rid) in (valueLists, rids) do
2: for i = 0; i < colnos.size(); i = i + 1 do
3: tpdt.AddM odif y(rid, colnos[i], valueList[i])
4: sid ← this.RidT oSid(rid)
5: minmax.updateColumn(sid, colnos[i], valueList[i])
6: end for
7: end for
7.6
Concurrency Issues
In Section 7.5 we described optimistic maintenance of the global min-max and JIS indices belonging to a table, where we used simple mutual exclusion mech- anisms to avoid corruptions caused by concurrent updates. There are, however, more subtle concurrency issues that are semantic in nature, as they are caused by the inherent volatility of positional information under updates. Section 7.6.2 discusses the issue of maintaining indices in a second database image, as gener- ated during a background checkpointing transaction. In Section 7.6.3 we discuss obstacles during serialization of trans-PDTs from the child-side of a join index association. Solutions to both problems rely on a generic solution to the prob- lem of matching child-side PDT inserts to the (volatile!) foreign-RID of the parent tuple they reference, at any moment in time, without performing a join. Therefore, Section 7.6.1 first presents a solution to that problem.
7.6. CONCURRENCY ISSUES 165