POLÍTICA DE PROGRESO ACADÉMICO SATISFACTORIO (SAP) Introducción

POLÍTICAS INSTITUCIONALES

POLÍTICA DE PROGRESO ACADÉMICO SATISFACTORIO (SAP) Introducción

In addition to the index structure already reviewed in this chapter, there are still a few structures worth noticing. For example, the search engine conglomerate Google has built a customized database system for orga- nizing the petabytes of data their search engines need to search through. Their database system is called Bigtable [21], and it is designed to be a highly scalable database system that can be used to index vast amounts of data in a distributed database environment. What is interesting from our point of view is that Bigtable also offers some basic versioning func- tionality: data items are stored with versions, and they can be used for querying. There is a catch, however, as Bigtable was not designed to be primarily a multiversion database index. The data items are ordered on the item keys, and all the versions of a data item are stored in the same page (or cell, by the terminology of Bigtable). Cells can have multiple versions which are chained together by placing the most recent version of the cell in the front of the chain. The Bigtable thus has problems similar to those of the versioned B+_{-tree described in Section 3.2 and of indexes}

that use reverse chaining (described in Section 3.5, p. 39): early versions cannot be directly accessed (because of the version chain), and key-range queries are not efficient. The problem for key-range queries of the most recent version is alleviated somewhat, because not all of the versions of a given data item need to be scanned. Key-range queries of the earlier versions, however, must still traverse the version-chains to locate the correct version. From this we can arrive to the conclusion that the Bigtable is not optimal for indexing multiversion data.

Jouini and Jomier [43] have also recently published an article compar- ing three different approaches for indexing fully persistent transaction- time data (i.e., data with a possibly branched evolution). Their structures

3.9 Other Structures

are called the B+V-tree, the OB+tree, and the BT-tree. From these, the B+V-tree resembles the versioned B+_{-tree introduced in Section 3.2, but}

with support for branched history evolution. Like the versioned B+_-tree,

the B+V-tree is not efficient for range queries because the entries are clustered primarily by their keys, and only then by their versions. The second structure, the OB+tree, builds multiple B+_{-trees that are allowed}

to share unchanged branches. This approach is an example of path copy- ing (see Section 3.5, p. 39), and any update performed to a leaf page at database version v to create a new version v+ _{therefore necessitates the}

creation of a new root-to-leaf path, thus requiring Ω(log mv) space for

each update, where mv is the number of entries alive at version v. The

final index structure, the BT-tree, indexes entries based on both the key and the version attributes, thus making this structure closer to the more efficient approaches described in the next section. As the details of the index structure are not discussed, we cannot really determine the characteristics of this index structure. The other two structures are suboptimal for indexing partially persistent data because of the design choices they are based on, as explained above.

CHAPTER 4

Time-Split and Multiversion B+-trees

The previous chapters have discussed the theory behind multiversion databases (that is, partially persistent transaction-time databases), and reviewed some approached for indexing such data. So far, none of the structures introduced have been optimal, according to our definition (Def- inition 3.3). In this chapter, we review three of the more recent multiversion index structures, one of which are optimal. In addition, we shortly discuss a multiversion database system that uses one of these structures and is built on top of the commercial Microsoft SQL Server.

We begin this chapter in Section 4.1 by listing some of the common design ideas shared by all the efficient structures reviewed in this chapter. Then, in Section 4.2, we review the first of these structures, the time-split B+_{-tree of Lomet and Salzberg (TSB-tree [58, 59]). Af-}

ter that, Section 4.3 describes Immortal DB [54–57], which is based on the TSB-tree. Immortal DB is a multiversion database management system that Microsoft is researching and developing on top of the Microsoft SQL Server. In Section 4.4, we review the multiversion B+_{-tree of Becker}

et al. (MVBT, [7, 8]), the first optimal multiversion index structure. Fi- nally, Section 4.5 describes the multiversion access method of Varman and Verma (MVAS, [92]), which was developed at about the same time as the MVBT, and shares many characteristics of the MVBT. Varman and Verma use a slightly more relaxed definition of optimality for multiversion indexes, and the MVAS is not optimal according to our definition.

4.1 Common Design Bases

All of the index structures presented in this chapter have a similar structure: the database pages cover regions in key-version space, the index structures form a directed acyclic graph of database pages (although these structures may still be called trees), and for each version, there

exists a search tree Sv (see definition below) that is used to locate the en-

tries belonging to that version. Let us call these multiversion structures region-based multiversion index structures:

Definition 4.1. A region-based multiversion index is a multiversion database index structure in which all the pages cover regions in key- version space. The structure of the pages forms a directed acyclic graph. Let kvr(p) denote the key-version region of a page p. Each parent page p at level l contains links to child pages Q at level l− 1 so that q ∈ Q ⇔ kvr(p) ∩ kvr(q) ≠ ∅. The key-version regions of pages on the same level of

the graph do not overlap. ◻

Definition 4.2. For each version v in a region-based multiversion index, there is a search tree Sv that is a subgraph of the entire multiversion

index graph. The subgraph Sv is a tree and all the entries of the data

items that are alive at version v are located in the pages of Sv. An ex-

ample of a search tree S10 is shown in Figure 3.1(b) on page 37. For

each search tree Sv and all levels l of Sv, the pages that belong to Sv at

the same level l partition the entire key-space into disjoint regions. Each search tree thus covers the entire key space at each level of the search

tree. ◻

Definition 4.3. A region-based multiversion index structure is said to be structurally consistent, if all the index-specific invariants of the structure hold; and balanced, if (1) it is structurally consistent; (2) all the pages of the search tree Sv contain at least a minimum number of entries

that are alive at version v, for each version v; and (3) for any search tree Sv, all the root-to-leaf paths of Sv are of the same length. ◻

Note that our definition of a balanced index structure requires that the lengths of all the search paths within any one version are of the same length. In practice, this is guaranteed by designing the structure- modification operations so that the index never becomes unbalanced.

Figure 3.1(b) in the previous chapter shows the general structure of these multiversion index structures, and Figure 4.1 shows the structure of a balanced region-based multiversion index that has an auxiliary root∗

structure for locating the roots of different search trees. Note that search trees Sv1 and Sv2 rooted at pages p6 and p9 have a different height. Let us

also define what we mean by live entries and live pages in the multiversion indexes:

Definition 4.4. For all versions v, an entry that represents a data item is alive at version v if the data item is alive at version v; and a database page is alive at version v if it is part of the search tree Sv. ◻

4.2 Time-split B+_-tree

In document DEWEY UNIVERSITY CATALOGO ACADÉMICO (página 54-59)