4.2. Segunda evaluación
4.2.1. Número de manos, número de dedos por mano y grados de los dedos
optional if it is “by 1” . The Boolean expression is optional and, if not supplied, the value tru e is used. The value of the variable may not be changed in the body of the loop.
An enumerative iterator, such as “ each n e n (n * a b c )” (see Figure 2.1, lines 11 through 13 and lines 16 through 25), or “ each p ,q e T (p * q ) ”, selects all and
only the elements of its set operand that satisfy the parenthesized Boolean expression following the set operand, in an indeterminate order. If the parenthesized Boolean expression is missing, the value tru e is used. If the variable series has more than one element, they all must satisfy the same criteria. For example, “ each m,n e n (1 < m & m ^ n & n ^ 2 ) ” causes the pair of variables <m,n> to range over < 1 ,1>, <1,2>, and <2,2>, in some order. For any set S that appears in the iterator, the body of the for statement must not change S’s value.
The body is a sequence of zero or more statements.
2.8.9
Repeat Statements
A repeat statement has the form re p e a t
rep e atjb o d y u n t i l condition
The body is a sequence of zero or more statements. The condition is a Boolean valued expression and is evaluated in the short-circuit manner.
2.8.10
Keywords in ICAN
The keywords in i c a n are given in Table 2.8. They are all reserved and may not be used as identifiers.
TABLE 2.8 The keywords in ican.
array begin boolean by case character default do
each elif else end
enum esac false fi
for goto if in
inout integer nil od
of out procedure real
record repeat return returns
sequence set to true
Section 2.11 Exercises 41
2.9
W rap-Up
This chapter is devoted to describing ican, the informal notation used to present algorithms in this book.
The language allows for a rich set of predefined and constructed types, including ones that are specific to compiler construction, and is an expressive notation for expressions and statements. Each compound statement has an ending delimiter, and some, such as while and case statements, have internal delimiters too.
The informality of the language lies primarily in its not being specific about the semantics of constructions that are syntactically valid but semantically ambiguous, undefined, or otherwise invalid.
2.10
Further Reading
There are no references for ican, as it was invented for use in this book.
2.11
Exercises
2.1 (a) Describe how to translate an xbnf syntax representation into a representation that uses only concatenation, (b) Apply your method to rewrite the xbnf description
E — ► V \ A E \ ( E ) \ S T \ S E \ T E AE — ► [ { E \ n i l } ]
S T — ► - AC+ " SE — ► 0 K E* > TE — ► < £ tx , >
2.2 Show that arrays, sets, records, tuples, products, and functions are all “ syntactic sugar” in ican by describing how to implement them and their operations in a version of ican that does not include them (i.e., express the other type constructors in terms of the union and sequence constructors).
2.3 (a) Write an ican algorithm to run a maze. That is, given a set of nodes N £ Node, a set of undirected arcs E £ Node x Node, and start and finish nodes s t a r t , g o a l € N, the algorithm should return a list of nodes that make up a path from s t a r t to g o a l, or n i l if there is no such path, (b) What is the time complexity of your algorithm in terms of n = INI and e = IEI ?
2.4 Adapt your algorithm from the preceding exercise to solve the traveling salesman problem. That is, return a list of nodes beginning and ending with s t a r t that passes through every node other than s t a r t exactly once, or return n i l if there is no such path.
2.5 Given a binary relation R on a set A, i.e., R £ A x A, write an ican procedure RTC(R,x,y) to compute its reflexive transitive closure. The reflexive transitive clo sure of R, written R*, satisfies a R* b if and only if a = b or there exists a c such that a R c and c R* b, so RTC(R,x,y) returns tr u e if x R* y and f a l s e otherwise.
42 Informal Compiler Algorithm Notation (ICAN)
ADV 2.6 We have purposely omitted pointers from ican because they present several serious issues that can be avoided entirely by excluding them. These include, for example, pointer aliasing, in which two or more pointers point to the same object, so that changing the referent of one of them affects the referents of the others also, and the possibility of creating circular structures, i.e., structures in which following a series of pointers can bring us back to where we started. On the other hand, excluding pointers may result in algorithms’ being less efficient than they would otherwise be. Suppose we were to decide to extend ican to create a language, call it pican, that includes pointers, (a) List advantages and disadvantages of doing so. (b) Discuss the needed additions to the language and the issues these additions would create for programmers and implementers of the language.
CHAPTER 3
Symbol-Table Structure
I
n this chapter we explore issues involved in structuring symbol tables to ac commodate the features of modern programming languages and to make them efficient for compiled implementations of the languages.We begin with a discussion of the storage classes that symbols may belong to and the rules governing their visibility, or scope rules, in various parts of a program. Next we discuss symbol attributes and how to structure a local symbol table, i.e., one appropriate for a single scope. This is followed by a description of a representation for global symbol tables that includes importing and exporting of scopes, a programming interface to global and local symbol tables, and ican implementations of routines to generate loads and stores for variables according to their attributes.
3.1
Storage Classes, Visibility, and Lifetimes
Most programming languages allow the user to assign variables to storage classes that prescribe scope, visibility, and lifetime characteristics for them. The rules gov erning scoping also prescribe principles for structuring symbol tables and for repre senting variable access at run time, as discussed below.
A scope is a unit of static program structure that may have one or more variables declared within it. In many languages, scopes may be nested: procedures are scoping units in Pascal, as are blocks, functions, and files in C. The closely related concept of visibility of a variable indicates in what scopes the variable’s name refers to a particular instance of the name. For example, in Pascal, if a variable named a is declared in the outermost scope, it is visible everywhere in the program1 except
1. In many languages, such as C, the scope of a variable begins at its declaration point in the code and extends to the end of the program unit, while in others, such as PL/I, it encompasses the entire relevant program unit.
44 Symbol-Table Structure
within functions that also declare a variable a and any functions nested within them, where the local a is visible (unless it is superseded by another declaration of a variable with the same name). If a variable in an inner scope makes a variable with the same name in a containing scope temporarily invisible, we say the inner one shadows the outer one.
The extent or lifetime of a variable is the part of the execution period of the program in which it is declared from when it first becomes visible to when it is last visible. Thus, a variable declared in the outermost scope of a Pascal program has a lifetime that extends throughout the execution of the program, while one declared within a nested procedure may have multiple lifetimes, each extending from an entry to the procedure to the corresponding exit from it. A Fortran variable with the save attribute or a C static local variable has a noncontiguous lifetime—if it is declared within procedure f ( ), its lifetime consists of the periods during which f ( ) is executing, and its value is preserved from each execution period to the next.
Almost all languages have a global storage class that gives variables assigned to it an extent that covers the entire execution of the program and global scope, i.e., it makes them visible throughout the program, or in languages in which the visibility rules allow one variable to shadow another, it makes them visible wherever they are not shadowed. Examples of global scope include variables declared extern in C and those declared in the outermost scope in Pascal.
Fortran has the common storage class, which differs from most scoping concepts in that an object may be visible in multiple program units that are not related by nesting and it may have different names in any or all of them. For example, given the common declarations
common /blockl/il,jl
and
common / b l o c k l / i 2 ,j2
in routines f l ( ) and f2 ( ), respectively, variables i l and i2 refer to the same storage in their respective routines, as do j l and j2 , and their extent is the whole execution of the program.
Some languages, such as C, have a file or module storage class that makes a variable visible within a particular file or module and makes its extent the whole period of execution of the program.
Most languages support an automatic or stack storage class that gives a variable a scope that is the program unit in which it is declared and an extent that lasts for a particular activation of that program unit. This may be associated with procedures, as in Pascal, or with both procedures and blocks within them, as in C and PL/I.
Some languages allow storage classes that are static modifications of those described above. In particular, C allows variables to be declared s t a t i c , which causes them to be allocated storage for the duration of execution, even though they are declared within a particular function. They are accessible only within the function, and they retain their values from one execution of it to the next, like Fortran save variables.
Section 3.2 Symbol Attributes and Symbol-Table Entries 45
Some languages allow data objects (and in a few languages, variable names) to have dynamic extent, i.e., to extend from their points of (implicit or explicit) alloca tion to their points of destruction. Some, particularly lisp, allow dynamic scoping, i.e., scopes may nest according to calling relationships, rather than static nesting. With dynamic scoping, if procedure f ( ) calls g ( ) and g ( ) uses a variable x that it doesn’t declare, then the x declared in its caller f ( ) is used, or if there is none, in the caller of f ( ), and so on, regardless of the static structure of the program.
Some languages, such as C, have an explicit v o la t i l e storage class modifier that specifies that a variable declared volatile may be modified asynchronously, e.g., by an I/O device. This imposes restrictions on optimizations that may be performed on constructs that access the variable.
3.2
Symbol Attributes and Symbol-Table Entries
Each symbol in a program has associated with it a series of attributes that are derived both from the syntax and semantics of the source language and from the symbol’s declaration and use in the particular program. The typical attributes include a series of relatively obvious things, such as the symbol’s name, type, scope, and size. Others, such as its addressing method, may be less obvious.
Our purpose in this section is to enumerate possible attributes and to explain the less obvious ones. In this way, we provide a description of the constituents of a symbol table, namely, the symbol-table entries. A symbol-table entry collects together the attributes of a particular symbol in a way that allows them to be easily set and retrieved.
Table 3.1 lists a typical set of attributes for a symbol. The provision of both s iz e and boundary on the one hand and b i t s iz e and b itb d ry on the other allows for both unpacked and packed data representations.
A type referent is either a pointer to or the name of a structure representing a constructed type (in ican, lacking pointers, we would use the latter). The provision of type, basetype, and machtype allows us to specify, for example, that the Pascal type
array [ 1 . . 3 , 1 . . 5 ] of char
has for its type field a type referent such as t2 , whose associated value is < a rra y ,2 , [<1,3 ) ,< 1 ,5>] ,char>, for its basetyp e simply char, and for its machtype the value byte. Also, the value of n e lt s for it is 15. The presence of the basereg and d isp fields allows us to specify that, for example, to access the beginning of our Pascal array we should form the address [r7+8] if b ase reg is r7 and d isp is 8 for it.
The most complex aspect of a symbol-table record is usually the value of the type attribute. Source-language types typically consist of the predefined ones, such as in teg er, char, r e a l, etc. in Pascal, and the constructed ones, such as Pascal’s enumerated, array, record, and set types. The predefined ones can be represented by an enumeration, and the constructed ones by tuples. Thus, the Pascal type template