INSTRUCTIVO DE USO DEL SISTEMA INTEGRAL DE GESTIÓN SOCIAL

Estado de las obras

INSTRUCTIVO DE USO DEL SISTEMA INTEGRAL DE GESTIÓN SOCIAL

PostgeSQL is a popular fully open source4 _relational _DBMS_{implementation. It is}

written in C and provides several options for user extensions.

The query processing pipeline in PostgreSQL has several stages, as shown in Figure 2.5: 1) parser, 2) rewrite system, 3) planner/optimizer, and 4) executor. The parser tokenizes the query string and builds a tree. The rewrite system then rewrites the queries by applying transformation rules from the system catalogue. For example, accesses on views are rewritten as queries on the underlying tables of the views. As a query might include several columns with indices from one table, there exists a choice of which index to use for that table. Furthermore, some operations have different implementations. For instance, joins can be implemented as inner loop joins, hash joins, or merge joins. The standard planner/optimizer is based on System R [17] and creates all possible plans and compares them using a cost function. As generating all plans becomes computationally expensive for many joins a genetic query planner was introduced in 19975 _[₁₈_{]. The genetic} 4_{Other popular open source databases have some features reserved for their commercial}

branch.

2.2. DATABASE MANAGMENT SYSTEMS 31 42 Parser

1

Rewrite

2

Planner/Optimizer

3

4

Aggregate Sort

Nested Loop Semi Join public.orders

public.lineitem_-

pkey Executor

Figure 2.5: PostgreSQL simplified backend flowchart.

planner generates a subset of (join) plans based on a heuristic for more complex queries.

Some metrics are not known during this stage, for example, the number of rows returned by a subquery or in a view (as these can be data dependant). Thus the cost has to be estimated. After the cheapest plan is chosen, it is extended to an executable plan by the planner stage and passed to the executor. The executor recurses through the plan, executing the nodes of the plan and returns the results. Each node represents an operation (e.g., join, limit, sort). The executor recurses from the root (result node) of the tree, going towards the leaves (inputs nodes). Every node has to generate one row or report that it has finished. This process is repeated until the result node (root node) reports that it has finished.

PostgreSQL provides multiple extension paths. In open source software like PostgreSQL, one can always modify the code, but there are more elegant extensions points. PostgreSQL’s main extension mechanism is hooking. Hooks are predefined points in the execution of PostgreSQL where user-defined functions can be called. These user functions can then modify the current processing state. For example, there are hooks between all stages of the query processing pipeline. The user function that hooks into the pipeline is passed the current query state and plan (depending on the stage of the hook) as arguments and can modify both. One advantage of these extension mechanisms is that there is no need to modify the

32 CHAPTER 2. FOUNDATIONS

source of PostgreSQL, hence, allowing to upgrade PostgreSQL to a newer version independently of the customized hooks.

Unfortunately, PostgreSQL offers no hooks to load user-defined operations (nodes) and makes this virtually impossible as many core functions contain switch statements with hardcoded references to each node type. While it is technically possible to insert custom nodes into a plan and execute them solely through hooks, it is impractical as large parts of PostgreSQL’s backend would have to be replaced by custom code from hooks. The code for EXPLAIN and the planner are just two examples where nodes are hardcoded into the source code.

The SQL SELECT Command

Neglecting commands to manage schemas, SQL has four commands to manage the data inside a schema: 1) SELECT, 2) UPDATE, 3) DELETE, and 4) INSERT. Out of theseSELECT is the only command not (explicitly) modifying data. This thesis focuses on read only, therefore SELECT is the only one relevant. Other commands modifying data can be implemented but require additional measures to ensure correct handling of locks andMVCC. Furthermore, UPDATE andDELETE

make the most sense if executed on a subset of the rows, which requires previous selection. While the selection of the rows for an UPDATE or DELETE statement is not necessarily done through an explicitSELECT statement, the methods presented here for theSELECT statement can be reused. While an INSERT statement might be ported to the FPGA, this seems not sensible as the main challenge of an insert is the serialization of the operation (for writing). While the FPGA excels at parallelization, it provides no benefits over CPUs for serial tasks. In fact, the higher operating frequencies of CPUs result in better serial performance; thus moving INSERT to the FPGA is likely to have an adverse impact on performance.

Figure 2.6 shows a simplified diagram of the SELECT statement. The full (PostgreSQL) Backus-Naur form ofSELECT is provided inAppendix A. Projection (the selection of the columns to return) is expressed by the column list (see Figure 2.6), and selection of the rows is mainly carried out through theWHEREpart of the clause. Selection based on grouped rows (if grouping is used) is handled by theHAVING clause. GROUP BY groups rows based on the columns in theGROUP BY

clause and returns a single row for them. Aggregate functions are applied to all rows that end up in the same group. Windowing, like grouping, also partitions the result set. However, it returns all the rows in a partition and allows for aggregate

2.2. DATABASE MANAGMENT SYSTEMS 33 SELECT DISTINCT * column , FROM relation WHERE condition GROUP BY condition HAVING condition WINDOW condition ORDER BY condition LIMIT condition OFFSET condition

34 CHAPTER 2. FOUNDATIONS

functions. The aggregate functions operate on all rows in the partition unless an ordering of the partition is defined, in which case the aggregate functions operate on all rows that come before the current row or that are tied with the current row. The ORDER BY clause is used to sort the results, potentially by multiple columns and each column may be sorted either in ascending or descending order. LIMIT

and OFFSET extract a partial result of LIMIT rows starting at OFFSET into the result of the base query.

Chapter 3 Partial Reconfiguration

I was a chameleon, the woman men wanted me to be.

Jane Fonda

Partial reconfiguration, while supported by the vendors, is not supported in a way suitable to create pipelines on the fly. While a lot of the techniques described in the following chapter are conceptually known and proof of concept uses exist, they rarely have been applied in large scale. A noteworthy exception is GoAhead [12], an external tool for partial reconfiguration flows.

For this Thesis, the techniques have been refined and implemented with TCL, a scripting language that allows to script nearly everything inside Vivado. While newly introduced constraints made some parts easier compared to ISE, other techniques failed due to changes in the tool suite.

In document Visualización del sistema integral de gestión social empresarial SIGSE, en el geoportal de la Empresa de Acueducto y Alcantarillado de Bogotá (página 40-57)