• No se han encontrado resultados

3.3 Evaluación y selección de alternativas

3.3.3 Frenos

The breadth-first solution to the top-down parsing problem is to maintain a list of all possible predictions. Each of these predictions is then processed as described in Sec- tion 6.2 above, that is, if there is a non-terminal in front, the prediction stack is replaced by several new prediction stacks, as many as there are choices for this non-terminal. In each of these new prediction stacks, the non-terminal is replaced by the corresponding choice. This prediction step is repeated for all prediction stacks it applies to (including the new ones), until all prediction stacks have a terminal in front. Then, for each of the prediction stacks we match the terminal in front with the current input symbol, and strike out all prediction stacks that do not match. If there are no prediction stacks left, the sentence does not belong to the language. So, instead of one prediction stack/analysis stack pair, our automaton now maintains a list of prediction stack/analysis stack pairs, one for each possible choice, as depicted in Figure 6.7.

matched input rest of input analysis1 prediction1 analysis2 prediction2

... ...

Figure 6.7 An instantaneous description of our extended automaton

The method is suitable for on-line parsing, because it processes the input from left to right. Any parsing method that processes its input from left to right and results in a left-most derivation is called an LL parsing method. The first L stands for Left to right, and the second L for Left-most derivation.

Now, we almost know how to write a parser along these lines, but there is one detail that we have not properly dealt with yet: termination. Does the input sentence belong to the language defined by the grammar when, ultimately, we have an empty prediction stack? Only when the input is exhausted! To avoid this extra check, and to avoid problems about what to do when we arrive at the end of sentence but haven’t fin- ished parsing yet, we introduce a special so-called end-marker ##, that is appended at the end of the sentence. Also, a new grammar rule SS’’-->>SS##is added to the grammar, where SS’’ is a new non-terminal that serves as a new start symbol. The end-marker behaves like an ordinary terminal symbol; when we have an empty prediction, we know that the last step taken was a match with the end-marker, and that this match suc- ceeded. This also means that the input is exhausted, so it must be accepted.

6.3.1 An example

Figure 6.8 presents a complete breadth-first parsing of the sentence aaaabbcc##. At first there is only one prediction stack: it contains the start-symbol; no symbols have been accepted yet (a). The step leading to (b) is a simple predict step; there is no other right-hand side forSS’’. Another predict step leads us to (c), but this time there are two possible right-hand sides, so we obtain two prediction stacks; note that the difference of the prediction stacks is also reflected in the analysis stacks, where the different suffixes of SS represent the different right-hand sides predicted. Another predict step with several right-hand sides leads to (d). Now, all prediction stacks have a terminal on top;

(a) aaaabbcc## (b) aaaabbcc## S S’’ SS’’11 SS## (c) aaaabbcc## (d) aaaabbcc## SS’’11SS11 DDCC## SS’’11SS11DD11 aabbCC## SS’’11SS22 AABB## SS’’11SS11DD22 aaDDbbCC## S S’’11SS22AA11 aaBB## S S’’11SS22AA22 aaAABB## (e) aa aabbcc## (f) aa aabbcc## SS’’11SS11DD11aa bbCC## SS’’11SS11DD11aa bbCC## SS’’11SS11DD22aa DDbbCC## SS’’11SS11DD22aaDD11 aabbbbCC## SS’’11SS22AA11aa BB## SS’’11SS11DD22aDaD22 aaDDbbbbCC## SS’’11SS22AA22aa AABB## S’S’11SS22AA11aaBB11 bbcc## S S’’11SS22AA11aaBB22 bbBBcc## S S’’11SS22AA22aaAA11 aaBB## S S’’11SS22AA22aaAA22 aaAABB## (g) aaaa bbcc## (h) aaaa bbcc## S S’’11SS11DD22aaDD11aa bbbbCC## SS’’11SS11DD22aaDD11aa bbbbCC## S S’’11SS11DD22aDaD22aa DDbbbbCC## SS’’11SS11DD22aaDD22aaDD11 aabbbbbbCC## S S’’11SS22AA22aAaA11aa BB## SS’’11SS11DD22aaDD22aaDD22 aaDDbbbbbbCC## S S’’11SS22AA22aaAA22aa AABB## SS’’11SS22AA22aaAA11aaBB11 bbcc## S S’’11SS22AA22aaAA11aaBB22 bbBBcc## S S’’11SS22AA22aaAA22aaAA11 aaBB## S S’’11SS22AA22aaAA22aaAA22 aaAABB## (i) aaaabb cc## (j) aaaabb cc## S S’’11SS11DD22aaDD11aabb bbCC## SS’’11SS11DD22aaDD11aabb bbCC## S S’’11SS22AA22aAaA11aaBB11bb cc## SS’’11SS22AA22aaAA11aaBB11bb cc## S S’’11SS22AA22aaAA11aaBB22bb BBcc## SS’’11SS22AA22aaAA11aaBB22bbBB11 bbcccc## S S’’11SS22AA22aaAA11aaBB22bbBB22 bbBBcccc## (k) aaaabbcc ## (l) aaaabbcc## S S’’11SS22AA22aAaA11aaBB11bbcc ## SS’’11SS22AA22aAaA11aaBB11bbcc##

Figure 6.8 The breadth-first parsing of the sentenceaaaabbcc##

all happen to match, resulting in (e). Next, we again have some predictions with a non-terminal in front, so another predict step leads us to (f). The next step is a match step, and fortunately, some matches fail; these are just dropped as they can never lead to a successful parse. From (g) to (h) is again a predict step. Another match where, again, some matches fail, leads us to (i). A further prediction results in (j) and then two matches result in (k) and (l), leading to a successful parse (the predict stack is empty). The analysis is

Sec. 6.3] Breadth-first top-down parsing 127

SS’’

11SS22AA22aaAA11aaBB11bbcc##.

For now, we do not need the terminals in the analysis; discarding them gives

SS’’

11SS22AA22AA11BB11.

This means that we get a left-most derivation by first applying rule SS’’

1

1, then rule SS22,

then ruleAA

2

2, etc., all the time replacing the left-most non-terminal. Check:

SS’’ -->> SS## -->> AABB## -->> aaAABB## -->> aaaaBB## -->> aaaabbcc##.

The breadth-first method described here was first presented by Greibach [CF 1964]. However, in that presentation, grammars are first transformed into Greibach Normal Form, and the steps taken are like the ones our initial pushdown automaton makes. The predict and match steps are combined.

6.3.2 A counterexample: left-recursion

The method discussed above clearly works for this grammar, and the question arises whether it works for all context-free grammars. One would think it does, because all possibilities are systematically tried, for all non-terminals, in any occurring prediction. Unfortunately, this reasoning has a serious flaw that is demonstrated by the following example: let us see if the sentence aabb belongs to the language defined by the simple grammar

S

S -->> SSbb || aa

Our automaton starts off in the following state:

a abb## S S’’

As we have a non-terminal at the beginning of the prediction, we use a predict step, resulting in: a abb## S S’’11 SS##

Now, another predict step results in:

a abb## S S’’11SS11 SSbb## S S’’11SS22 aa##

a abb## SS’’11SS11SS11 SSbbbb## SS’’11SS11SS22 aabb## S S’’11SS22 aa##

By now, it is clear what is happening: we seem to have ended up in an infinite process leading us nowhere. The reason for this is that we keep trying theSS-->>SSbbrule without ever coming to a state where a match can be attempted. This problem can occur whenever there is a non-terminal that derives an infinite sequence of sentential forms, all starting with a non-terminal, so no matches can take place. As all these sen- tential forms in this infinite sequence start with a non-terminal, and the number of non-terminals is finite, there is at least one non-terminal A occurring more than once at the start of those sentential forms. So, we have: A → . . . →Aα. A non-terminal that derives a sentential form starting with itself is called left-recursive. Left recursion comes in two kinds: we speak of immediate left-recursion when there is a grammar rule

AAα, like in the ruleSS-->>SSbb; we speak of indirect left-recursion when the recursion goes through other rules, for instance ABα, BAβ. Both forms of left-recursion can be concealed byε-producing non-terminals. For instance in the grammar

S S -->> AABBcc B B -->> CCdd B B -->> AABBff C C -->> SSee A A -->> εε

the non-terminalsSS,BB, andCCare all left-recursive. Grammars with left-recursive non- terminals are called left-recursive as well.

If a grammar has noε-rules and no loops, we could still use our parsing scheme if we use one extra step: if a prediction stack has more symbols than the unmatched part of the input sentence, it can never derive the sentence (noε-rules), so it can be dropped. However, this little trick has one big disadvantage: it requires us to know the length of the input sentence in advance, so the method no longer is suitable for on-line parsing. Fortunately, left-recursion can be eliminated: given a left-recursive grammar, we can transform it into a grammar without left-recursive non-terminals that defines the same language. As left-recursion poses a major problem for any top-down parsing method, we will now discuss this grammar transformation.