• No se han encontrado resultados

ENTREVISTA ESTRUCTURADA

3 Cuando usa Twitter como fuente ¿Qué

A translation error report is generated when a source fragment appears to be of the correct superficial form, but for some reason can’t be translated. Some obvious causes of this are undeclared names, names whose descriptor types don’t match the required context and so on. Figure 2.17 gives some examples. These errors are almost all discovered by checking the tree against the contents of

2.6. TRANSLATION ERROR HANDLING 33

Statement: ADD CORRESPONDING NORTH-OFFICE TO HEAD-OFFICE

Tree: add.corr name name NORTH-OFFICE HEAD-OFFICE Descriptors: MANAGER WAGES PLANT NORTH-OFFICE OFFICE OFFICE MANAGING-DIRECTOR SALARY PLANT HEAD-OFFICE

34 CHAPTER 2. INTRODUCTION TO TRANSLATION

1. Operands of a node are of wrong type Statement: if a then x := b+1

Tree: conditional a assignment x + b 1 Symbol Table: a is a real variable b is a Boolean variable

2. Name not declared - so its descriptor is empty e.g. name 'x' not declared in example above 3. Ambiguous data name (COBOL)

Statement: MOVE PLANT TO PLANT

Symbol Table: as in figure 2.12 above. 4. Invalid combination of phrases

Statement: if u=v=0 then ....

Tree: conditional = .... = 0

u v

2.6. TRANSLATION ERROR HANDLING 35 the symbol table. They are all syntactic errors – semantic errors happen at run-time.

Case 4 of figure 2.17 can come about when the syntax analyser doesn’t check every syntactic rule as the tree is built. This form of error corresponds to syntactic rules of the form ‘a phrase X cannot be a subphrase of a phrase Y’. Such errors are easier to detect and to report upon during translation than during syntax analysis – see the discussion in chapters 5 and 17.

Translation error detectiondoes not slow downthe translator. Since the types of operands must be checked in order to select a code fragment, type combination errors will be detected automatically. Since the descriptor must be used to translate a leaf name, the lack of information will be detected automatically. Since the translator must search the COBOL hierarchy to distinguish between different data objects, lack of distinction will be detected automatically. Since it must investigate the types of subphrase nodes in order to select the appropriate code fragment, invalid combinations will be detected automatically – and so on. Error reporting in any phase of compilation should always be in source language terms and the translation phase is no exception. It is possible to use the tree and the symbol table to produce a version of the source statement, although without the particular arrangement of newlines and spaces which was used in the source program to lay the statement out on the page. This fact is the basis for the existence of ‘pretty printers’, which take the parse tree and from it produce a judiciously laid out version of the source program.

Error recovery during translation is simple – forget about the error, assume that the node was correctly translated and carry on. Of course a faulty pro- gram should not be run and some action such as deleting the translator’s output should be taken to ensure that it the program isn’t loaded and executed. Er- ror correction during translation is more than usually dangerous and is rarely attempted.

Summary

Translation takes a source program, fragment by fragment, and produces cor- responding fragments of object program. The relationships between fragments define a tree, and if the input to the translator is the tree itself then trans- lation is particularly straightforward. Input to the translator can be in the form of a string but this merely restricts translation options without giving any corresponding advantage in compiler efficiency.

In order to be able to generate object machine instructions which refer to the run-time addresses of objects manipulated by the object program, the trans- lator must use a symbol table which contains information about those objects including type, kind, run-time address, etc. The names are inserted in the ta- ble by the lexical analyser as they are encountered in the source program and the descriptors are filled in by the object description phase before translation commences.

36 CHAPTER 2. INTRODUCTION TO TRANSLATION Because the interfacing and relative positioning of code fragments in the linear sequence of object machine instructions can affect efficiency, a pure tree-walking translator will not generate entirely optimal code. Code optimisation attempts to overcome these deficiencies and is discussed in chapter 10. A tree-walking translator may however be designed so that the fragment it generates for a node depends on the immediate context of the node and the particular characteristics of the node itself. In this way a sophisticated simple translation phase may reduce the need for optimisation, perhaps even to the point at which it is no longer cost-effective.

Sections II and III concentrate on the translation of particular source program fragments, giving attention to the actual machine instructions which a tree walker might generate for each of them. Certain nodes are ‘crucial’ in that the code fragments generated as their translation will have a major effect on the speed and size of the object program. Concentration on such crucial code fragments can be enormously productive and may largely make the difference between a good translator and one which is merely effective. The major ad- vantage of tree walking as a translation technique, outweighing all others, is that incremental development and incremental improvement of the translator is possible. In this way both novice and expert can produce a good, working, translator faster than with any other technique.

Chapter 3

Introduction to Syntax

Analysis

If the task of translation is to go from the tree to the object code, then the task of analysis must be to go from source program to tree. Lexical analysis, as chapter 4 shows, discovers how the input characters are grouped into items and the task of syntax analysis is to discover the way in which the items link together into phrases, the phrases link into larger phrases and so on. The output of the syntax analyser must be a tree or some equivalent representation such as a sequence of ‘triples’ or ‘quadruples’ or a postfix string.1 It’s convenient

to divide up the description of syntax analysis into two parts: first to describe how torecognisea phrase and second how tooutputa tree node which describes that phrase. It turns out that the recognition technique used, of which there are many, doesn’t affect and is not affected by considerations of how to produce analysis output.

Syntax analysis is a well-understood topic, by which I mean that serious the- oretical analysis has proved that certain approaches must work given certain properties of the source language. Thus designing a syntax analyser is mostly a matter of picking a technique and following the rules which tell how to produce an analyser based on that technique. This chapter gives a brief introduction to the principles of syntax analysis so that the intensive treatment of translation which follows in sections II and III can rely on some background understand- ing. It illustrates two of the techniques which are dealt with in more detail in section IV – top-down one-track analysis, which is discussed in chapter 16, and operator-precedence analysis, which is discussed in chapter 17. Chapter 18 deals with the technique known as LR analysis, which isn’t introduced in this chapter.

1 A ‘triple’ consists of an operator and two operands, a ‘quadruple’ is an operator and three

operands. In effect each of these representations shows the tree viewed from underneath – see figure 3.10.

38 CHAPTER 3. INTRODUCTION TO SYNTAX ANALYSIS

FORTRAN

A logical IF statement is of the form IF (e) s

where ‘e’ is a logical expression and ‘s’ is any statement except a DO statement or another logical IF ...

A logical expression is a logical term or a construct of the form logical expression .OR. logical term

A logical term ...

COBOL

COMPUTE identifier- [ROUNDED] =

identifier-1 literal

arithmetic-expression [; ON SIZE ERROR imperative-statement]

Arithmetic expressions are data-names, identifiers or numeric literals or a sequence of ...

Algol 60

<for statement>::=<for clause> <statement>

|<label>: <for statement><for statement> <for clause>::=for<variable>:=<for list>do

<for list>::= ...

Pascal <procedure heading>::=procedure<identifier>;

|procedure <identifier>(<formal parameter section> {;<formal parameter section>} ) ;

3.1. LANGUAGE DESCRIPTIONS (GRAMMARS) 39 Building a syntax analyser which can process ‘correct’ programs is pretty trivial – it is just a matter of constructing an analyser which is based on the syntax description of a language, by blindly following the rules associated with one of the well-known syntax analysis techniques. Real live users don’t always submit syntactically correct programs: they often make mistakes and the true tasks of the analyser include recognising, reporting upon and recovering from the consequences of those mistakes. Error handling is by no means well understood and therefore a major difficulty when building a syntax analyser is to provide reasonable behaviour in the face of all the source program errors which may eventually arise.

3.1

Language Descriptions (Grammars)

Programming language descriptions all fall naturally into two parts

1. A description of thesyntax of the language which details the ‘superficial form’ of a program, laying down rules of punctuation and showing how a program may be built up out of items and characters.

2. A description of thesemanticsof the language, which defines the actions which a processor2will carry out when presented with a (correct) program in the language.

Part 1 of the description itself breaks into two further parts

1a: Short-range syntax

How particular phrases must be written in terms of the sub-phrases they must contain and how these sub-phrases must be separated by punctuation items.

1b: Long-range syntax

How the various uses of a name (an identifier) in a program must be correlated in terms of consistency between declaration and use.

The breakdown into short- and long-range syntax can always be made, even in languages such as ALGOL 60, whose description categorises long-range syn- tax (wrongly, in my view) under the heading of ‘semantics’. The definition of ALGOL 68 mixes together short- and long-range syntax in a single formal description – see the discussion of ‘two-level grammars’ in chapter 15.

Syntax analysis is concerned simply with the rules of short-range syntax. Long- range syntax can be investigated only after the superficial form of the tree is known, when the context in which each name is used has been discovered. The object description phase can then process the declarative sections of the parse tree, writing symbol table descriptors which convey information about the kind

40 CHAPTER 3. INTRODUCTION TO SYNTAX ANALYSIS

1. <Boolean expression>::=<Boolean term>

|<Boolean expression>or<Boolean term> 2. <Boolean expression>:=<Boolean term>

|<Boolean term>or<Boolean expression>

3. A Boolean expression is a Boolean term or a sequence of Boolean terms separated by theoroperator.

4. A Boolean expression is a Boolean variable, a Boolean constant, a Boolean function call, a relational expression or a construct of the form

Boolean-expression Boolean-operator Boolean-expression The Boolean operators are, in increasing order of priority,or, ...

Figure 3.2: Alternative descriptions of expressions <statement>::=<number>:<statement>|begin<compound>

|goto<number>|<identifier>:=<expression> |<identifier>(<expression>)

|if<B-expression>then<statement>

<compound>::=<statement>end| <statement>;<compound> <expression>::=<B-expression>|<A-expression>

<A-expression>::=<A1>| <A-expression>+<A1> | <A-expression>-<A1> <A1>::= <A2>| <A1>*<A2>| <A1>/ <A2> <A2>::= <identifier>| <number> <B-expression>::= <B1>| <B-expression>or<B1> <B1>::= <B2>| <B1>and<B2>

<B2>::= <identifier>| true| false

| <A-expression>=<A-expression> Figure 3.3: Grammar of a simple language

of object denoted by each name. Finally the translator, in attempting to make use of information about the run-time object which a name has been declared to denote, can effectively check the long-range syntax rules.

Figure 3.1 shows some fragments of the syntax of FORTRAN, COBOL, ALGOL 60 and PASCAL. Note that each of these syntax descriptions describes a tree structure like that used in chapter 2 – each gives the form of a source language phrase in terms of the sub-phrases it may contain, the form of the sub-phrases in terms of sub-sub-phrases, and so on. Syntax analysis merely consists of searching for patterns of phrases in the source text. Obviously it is necessary to find the small phrases (those made up of single items) first, then to find patterns involving small phrases which combine into a larger phrase, but different analysis mechanisms use different methods of searching for and detecting the phrases.

3.2. BOTTOM-UP ANALYSIS OF EXPRESSIONS 41

Documento similar