III. Metodología
4.2 Identificación del problema central de la red de valor y su estructura causal
4.2.2 Bajos niveles de Infraestructura y soporte a la comercialización
if i e Succ(i) then Succ(i) -= {i> Pred(i) -= {i} fi
for each j e Pred(i) do
Succ(j) := (Succ(j) - {i}) u Succ(i) od
for each j e Succ(i) do
Pred(j) := (Pred(j) - {i}) u Pred(i) od nblocks -= 1 for j := i to nblocks do Block[j] := Block[j+l] Succ(j) := Succ(j+1) Pred(j) := Pred(j+1) od for j := 1 to nblocks do for each k e Succ(j) do
if k > i then
Succ(j) := (Succ(j) - {k}) u {k-l> fi
od
for each k e Pred(j) do if k > i then Pred(j) (Pred(j) - {k}) u {k-1} fi od od end I I delete.block
FIG. 4.17 The ican routine delete_block( ) that removes an empty basic block.
The procedure
delete.block(/, nblocks,ninsts,Block, Succ,Pred)
defined in Figure 4.17 deletes basic block i and adjusts the data structures that represent a program.
96 Intermediate Representations
4.9
Other Intermediate-Language Forms
In this section, we describe several alternative representations of the instructions in a basic block of medium-level intermediate code (namely, triples; trees; directed acyclic graphs, or DAGs; and Polish prefix), how they are related to mir, and their advantages and disadvantages relative to it. In the output of a compiler’s front end, the control structure connecting the basic blocks is most often represented in a form similar to the one we use in m ir, i.e., by simple explicit gotos, i f s , and labels. It remains for control-flow analysis (see Chapter 7) to provide more information about the nature of the control flow in a procedure, such as whether a set of blocks forms an if-then-else construct, a while loop, or whatever.
Two further important intermediate-code forms are static single-assignment form and the program dependence graph, described in Sections 8.11 and 9.5.
First, note that the form we are using for mir and its relatives is not the conventional one for quadruples. The conventional form is written with the operator first, followed by the three operands, usually with the result operand first, so that our
t l <r- X + 3
would typically be written as + t l , x , 3
We have chosen to use the infix form simply because it is easier to read. Also, recall that the form shown here is designed as an external or printable notation, while the corresponding ican form discussed above can be thought of as an internal representation, although even it is designed for reading convenience—if it were truly an internal form, it would be a lot more compact and the symbols would most likely be replaced by pointers to symbol-table entries.
It should also be noted that there is nothing inherently medium-level about any of the alternative representations in this section—they would function equally well as low-level representations.
Figure 4.18(a) gives an example mir code fragment that we use in comparing
mir to the other code forms.
4.9.1
Triples
Triples are similar to quadruples, except that the results are not named explicitly in a triples representation. Instead, the results of the triples have implicit names that are used in other triples when they are needed as operands, and an explicit store operation must be provided, since there is no way to name a variable or storage location as the result of a triple. We might, for example, use “a sto fe” to mean store b in location a and “a * s t o b n for the corresponding indirect store through a pointer. In internal representations, triple numbers are usually either pointers to or index numbers of the triples they correspond to. This can significantly complicate insertion and deletion of triples, unless the targets of control transfers are nodes in a representation of the basic-block structure of the procedure, rather than references to specific triples.
Section 4.9 Other Intermediate-Language Forms 97 i <- i + 1 (i) i + 1 (2) i sto (1) tl i + 1 (3) i + 1 t2 <- p + 4 (4) p + 4 t3 <- *t2 (5) *(4) p <- t2 (6) p sto (4) t4 <- tl < 10 (7) (3) < 10 *r <- t3 (8) r *sto (5) if t4 goto LI (9) if (7), (1) (b)
FIG. 4.18 (a) A mir code fragment for comparison to other intermediate-code forms, and (b) its translation to triples. <-
/ \
i add/ \
i 1 (a)FIG. 4.19 Alternative forms of trees: (a) with an explicit assignment operator, and (b) with the result variable labeling the root node of its computation tree.
i :add
i 1 (b)
In external representations, the triple number is usually listed in parentheses at the beginning of each line and the same form is used to refer to it in the triples, providing a simple way to distinguish triple numbers from integer constants. Fig ure 4.18(b) shows a translation of the mir code in Figure 4.18(a) to triples.
Translation back and forth between quadruples and triples is straightforward. Going from quadruples to triples requires replacing temporaries and labels by triple numbers and introducing explicit store triples. The reverse direction replaces triple numbers by temporaries and labels and may absorb store triples into quadruples that compute the result being stored.
Using triples has no particular advantage in optimization, except that it simpli fies somewhat the creation of the DAG for a basic block before code generation (see Section 4.9.3), performing local value numbering (see Section 12.4.1) in the process. The triples provide direct references to their operands and so simplify determining the descendants of a node.
4.9.2
Trees
To represent intermediate code by trees, we may choose either to have explicit assign ment operators in the trees or to label the root node of an expression computation with the result variable (or variables), as shown by Figure 4.19(a) and (b), respec tively, a choice somewhat analogous to using quadruples or triples. We choose to use
98 Intermediate Representations
the second form, since it corresponds more closely than the other form to the DAGs discussed in the following section. We label the interior nodes with the operation names given in Figure 4.6 that make up the ican type IROper.
Trees are almost always used in intermediate code to represent the portions of the code that do non-control-flow computation, and control flow is represented in a form that connects sequences of trees to each other. A simple translation of the (non-control-flow) mir code in Figure 4.18(a) to tree form is shown in Figure 4.20. Note that this translation is, in itself, virtually useless—it provides one tree for each quadruple that contains no more or less information than the quadruple.
A more ambitious translation would determine that the t l computed by the second tree is used only as an operand in the sixth tree and that, since t l is a temporary, there is no need to store into it if the second tree is grafted into the sixth tree in place of the occurrence of t l there. Similar observations apply to combining the third tree into the fifth. Notice, however, that the fourth tree cannot be grafted into the seventh, since the value of p is changed between them. Performing these transformations results in the sequence of trees shown in Figure 4.21.
This version of the tree representation has clear advantages over the quadruples: (1) it has eliminated two temporaries ( t l and t2) and the stores to them; (2) it provides the desired input form for the algebraic simplifications discussed in Section 12.3.1; (3) locally optimal code can be generated from it for many machine architectures by using Sethi-Ullman numbers, which prescribe the order in which instructions should be generated to minimize the number of registers used; and (4) it provides a form that is easy to translate to Polish-prefix code (see Section 4.9.4) for input to a syntax-directed code generator (see Section 6.2).
Translating from quadruples to trees can be done with varying degrees of effort, as exemplified by the sequences of trees in Figures 4.20 and 4.21. Translation to the first form should be obvious, and achieving the second form can be viewed as an optimization of the first. The only points about which we need to be careful are that, in grafting a tree into a later one in the sequence, we must make sure that there
i:add tl:add t2:add t3:ind p:t2 t4:less riindasgn
i 1 i 1 p 4 p tl 10 t3
FIG. 4.20 Translation of the (non-control-flow) mir code in Figure 4.18(a) to a sequence of simple trees.
t4:less
i l p p 4 i l t3
Section 4.9 Other Intermediate-Language Forms 99
b:add
a : add a : add
a l a 1
FIG. 4.22 Result of trying to translate the mir instructions a <- a + 1; b ^ a + a to a single tree. t4:less add 10 t5:add t4:less i 1 t5 10 t5 <- i + 1 t4 <- t5 < 10 i 1
FIG. 4.23 Example of translation from minimized tree form to mir code.
are no uses of any of the result variables that label nodes in the first one between its original location and the tree it is grafted into and that its operands are also not recomputed between the two locations.
Note that a sequence of m ir instructions may not correspond to a single tree for two distinct reasons— it may not be connected, or it may result in evaluating an instruction several times, rather than once. As an example of the latter situation, consider the code sequence
a <r- a + 1 b <- a + a
This would result in the tree shown in Figure 4.22, which corresponds to evaluating the first instruction twice. We could, however, remove the label from the second “ a : add” node.
Translation from trees to quadruples is simple. We may proceed by either a pre order traversal or a postorder one. In the first case, we perform a preorder traversal of each tree, in the order they appear. For each interior node (i.e., non-leaf node) with at least one descendant that is an interior node, we create a new temporary and divide the tree into two (call them the “ upper tree” and the “ lower tree” ) along the edge connecting the two interior nodes. We label the root of the lower tree with the new temporary, and insert the pair of trees in sequence (with the lower tree first) in place of the one we are dividing. We repair the upper tree by putting the new tempo rary in place of the lower tree. An example from Figure 4.21 appears in Figure 4.23. When we no longer have any interior nodes with interior-node descendants, each tree corresponds to a single m ir instruction, and the remainder of the translation is obvious.
100
Interm ediate Representations t4:less r :indasgn/\
tl:add10
/ \
t3:ind/ \
p:add/ \ J
/\
i 1 P 4FIG. 4.24 DAG for the non-control-flow code of mir code in Figure 4.18(a).
In the second approach to translating from trees to m ir, we perform a postorder traversal of the given tree, generating a m ir instruction for each subtree that con tains only a single operator and replacing its root by the left-hand side of the mir
instruction.
4.9.3
Directed Acyclic Graphs (DAGs)
The DAG representation of a basic block can be thought of as compressing the minimal sequence of trees that represents it still further. The leaves of such a DAG represent the values of the variables and constants available on entry to the block that are used within it. The other nodes of the DAG all represent operations and may also be annotated with variable names, indicating values computed in the basic block. We draw DAG nodes like the tree nodes in the preceding section. As an example of a DAG for a basic block, see Figure 4.24, which corresponds to the first seven instructions in Figure 4.18(a). In the DAG, the lower left “ add” node represents the m ir assignment “ i <- i + 1” , while the “ add” node above it represents the computation of “ i + 1” that is compared to 10 to compute a value for t4 . Note that the DAG reuses values, and so is generally a more compact representation than either trees or the linear notations.
To translate a sequence of m ir assignment instructions to a DAG, we process the instructions in order. For each one, we check whether each operand is already represented by a DAG node. If it is not, we make a DAG leaf for it. Then we check whether there is a parent of the operand node(s) that represents the current operation; if not, we create one. Then we label the node representing the result with the name of the result variable and remove that name as a label of any other node in the DAG.
Figure 4.25 is a sequence of m ir instructions and the graphic form of the corresponding DAG is shown in Figure 4.26. Note that, in the DAG, the neg node is an operator node that has no labels (it is created for instruction 4 and labeled d, but that label is then moved by instruction 7 to the mul node), so no code need be generated for it.
As mentioned above, the DAG form is useful for performing local value num bering, but it is a comparatively difficult form on which to perform most other optimizations. On the other hand, there are node-listing algorithms that guide code generation from DAGs to produce quite efficient code.
Section 4.10 Wrap-Up 101 1 c <- a 2 b <- a + 1 3 c <- 2 * a 4 d <- -c 5 c <- a + 1 6 c <- b + a 7 d <- 2 * a 8 b <— c
FIG. 4.25 Example basic block of mir code to be converted to a DAG.
neg b ,c:add
FIG. 4.26 Graphic form of the DAG corresponding to the mir code in Figure 4.25.
binasgn i add i 1