The above hinges on the difference between operators, which are terminal symbols and between which precedence relations are defined, and operands, which are non- terminals. This distinction is captured in the following definition of an operator gram- mar:
'
A CF grammar is an operator grammar if (and only if) each right-hand side con- tains at least one terminal or non-terminal and no right-hand side contains two consecutive non-terminals.
So each pair of non-terminals is separated by at least one terminal; all the terminals except those carrying values (nnin our case) are called operators.
For such grammars, setting up the precedence table is relatively easy. First we cal- culate for each non-terminal A the set FIRST
OP(A), which is the set of all operators that
can occur as the first operator in any sentential form deriving from A, and LAST
OP(A),
which is defined similarly. Note that this first operator in a sentential form can be pre- ceded by at most one non-terminal in an operator grammar. The FIRST
OP’s of all
non-terminals are constructed simultaneously as follows:
1. For each non-terminal A, find all right-hand sides of all rules for A; now for each right-hand side R we insert the first terminal in R (if any) into FIRST
OP(A). This
gives us the initial values of all FIRST
OP’s.
2. For each non-terminal A, find all right-hand sides of all rules for A; now for each right-hand side R that starts with a non-terminal, say B, we add the elements of FIRST
OP(B) to FIRSTOP(A). This is reasonable, since a sentential form of A may
start with B, so all terminals in FIRST
OP(B) should also be in FIRSTOP(A).
3. Repeat step 2 above until no FIRST
OPchanges any more.
We have now found the FIRST
OP of all non-terminals. A similar algorithm, using the
last terminal in R in step 1 and a B which ends A in step 2 provides the LAST
OP’s. The
sets for the grammar of Figure 9.2 are shown in Figure 9.8.
Now we can fill the precedence table using the following rules, in which q, q1 and
q2 are operators and A is a non-terminal.
'
Sec. 9.2] Precedence parsing 191 FIRST OP(SS) = {##} LASTOP(SS) = {##} FIRST OP(EE) = {++,××,((} LASTOP(EE) = {++,××,))} FIRST OP(TT) = {××,((} LASTOP(TT) = {××,))} FIRST OP(FF) = {((} LASTOP(FF) = {))} Figure 9.8 FIRST
OPand LASTOPsets for the grammar of Figure 9.2
This keeps operators from the same handle together.
'
For each occurrence q1A, set q1<·q2for each q2 in FIRST
OP(A). This demarcates
the left end of a handle.
'
For each occurrence Aq1, set q2>· q1 for each q2 in LAST
OP(A). This demarcates
the right end of a handle.
If we obtain a table without conflicts this way, that is, if we never find two dif- ferent relations between two operators, then we call the grammar operator-precedence. It will now be clear why((=˙=˙))and not))==˙˙((, and why++>·++(becauseEE++occurs inEE-->>EE++TT
and++is in LAST
OP(EE)).
In this way, the table can be derived from the grammar by a program and be passed on to the operator-precedence parser. A very efficient linear-time parser results. There is, however, one small problem we have glossed over: Although the method properly identifies the handle, it often does not identify the non-terminal to which to reduce it. Also, it does not show any unit rule reductions; nowhere in the examples did we see reductions of the form EE-->>FF or TT-->>FF. In short, operator-precedence parsing generates only skeleton parse trees.
Operator-precedence parsers are very easy to construct (often even by hand) and very efficient to use; operator-precedence is the method of choice for all parsing prob- lems that are simple enough to allow it. That only a skeleton parse tree is obtained, is often not an obstacle, since operator grammars often have the property that the seman- tics is attached to the operators rather than to the right-hand sides; the operators are identified correctly.
It is surprising how many grammars are (almost) operator-precedence. Almost all formula-like computer input is operator-precedence. Also, large parts of the grammars of many computer languages are operator-precedence. An example is a construction like CCOONNSSTT ttoottaall == hheeaadd ++ ttaaiill;;from a Pascal-like language, which is easily rendered as:
Stack rest of input
#
# <· CCOONNSSTT <· == <· ++ ·>
ttoottaall hheeaadd ttaaiill
; ; ##
Ignoring the non-terminals has other bad consequences besides producing a skele- ton parse tree. Since non-terminals are ignored, a missing non-terminal is not noticed. As a result, the parser will accept incorrect input without warning and will produce an incomplete parse tree for it. A parser using the table of Figure 9.5 will blithely accept the empty string, since it immediately leads to the stack configuration##==˙˙##. It produces a parse tree consisting of one empty node.
The theoretical analysis of this phenomenon turns out to be inordinately difficult; see Levy [Precedence 1975], Williams [Precedence 1977, 1979, 1981] and many others
in Section 13.8. In practice it is less of a problem than one would expect; it is easy to check for the presence of required non-terminals, either while the parse tree is being constructed or afterwards.