• No se han encontrado resultados

An ideal tree transducer model should: • capture a large class of transformations;

• enjoy decidable equivalence and type-checking; • be closed under composition and regular look-ahead; • operate over strings, ranked trees, and unranked trees;

The first three requirements ask us to strike the right balance between expressive- ness and decidability: What is the largest class of tree transformations that enjoys decidable

procedures and good closure properties? The most widely accepted answer to this ques-

tion is the notion of monadic-second-order-logic-definable tree transductions [Cou94]. This formalism represents tree transductions as graph transformations expressed using monadic second-order (MSO) logic formulas over nodes and edges. The class of MSO- definable transductions enjoys decidable equivalence and type-checking, and is closed under sequential composition and regular look-ahead. Moreover, strings, ranked trees, and unranked trees are naturally expressible in this formalism. MSO-definable trans- formations include complex ones such as swapping subtrees and reversing the order of children in an unranked tree. However, due to the declarative nature, transformations expressed using MSO are hard to execute efficiently on a given input.

Several transducer models have been proposed to address this limitation, but these models typically sacrificed other properties. Executable models for tree transducers include bottom-up tree transducers, visibly pushdown transducers [RS09], and multi bottom-up tree transducers [ELM08]. Each of these models computes the output in a single left-to-right pass in linear time. However, none of these models can compute all MSO-definable transductions.

Finite copying Macro Tree Transducers (MTTs) with regular look-ahead [EM99] can compute all MSO-definable ranked-tree-to-ranked-tree transductions. MTTs are top- down transducers enriched with parameters that can store intermediate computations. In this model, regular look-ahead cannot be eliminated without sacrificing expressive- ness and the executing the model requires multiple passes [Man02].

Finally, most models of tree transducers do not naturally generalize to unranked trees. Exceptions include visibly pushdown transducers [RS09] and macro forest transducers [PS04], but these suffer the other limitations we just discussed.

5.1.2 Contributions

We propose the model of streaming tree transducers (STT), which has many desirable properties.

Expressiveness: STTs capture exactly the class of MSO-definable tree transductions.

Analyzability: Decision problems such as type-checking and checking whether two STTs

are functionally equivalent are decidable.

Flexibility: STTs can operate over strings, ranked trees, and unranked trees.

Single-pass linear-time processing: An STT is a deterministic machine that computes the

output using a single left-to-right pass through the linear encoding of the input tree, processing each symbol in constant time.

The transducer model integrates features of visibly pushdown automata, equivalently

nested word automata [AM09], and streaming string transducers [AC10, AC11]. In our

model, the input tree is encoded as anested word, which is a string over alphabet sym- bols, tagged with open/close brackets (or equivalently, call/return types) to indicate the hierarchical structure. For example, the tree a(b,c(d,d))is encoded by the nested

word

hahb bi hchd di hd diciai.

The streaming tree transducer reads the input nested word left-to-right in a single pass. It uses finitely many states, together with a stack, but the type of operation applied to the stack at each step is determined by the hierarchical structure of the tags in the in- put. The output is computed using a finite set of variables with values ranging over output nested words, possibly with holesthat are used as place-holders for inserting subtrees. At each step, the transducer reads the next symbol of the input. If the symbol is an internal symbol, then the transducer updates its state and the output variables. If the symbol is a call symbol, then the transducer pushes a stack symbol along with updated values of variables, updates the state, and reinitializes the variables. While processing a return symbol, the stack is popped, and the new state and new values for the variables are determined using the current state, current variables, popped symbol, and popped values from the stack. In each type of transition, the variables are updated using expressions that allow adding new symbols,string concatenation, andtree insertion

(simulated by replacing the hole with another expression). A key restriction is that vari- ables are updated in a manner that ensures that each value can contribute at most once to the eventual output, without duplication. Thissingle-use restrictionis enforced via a binaryconflictrelation over variables: no output term combines conflicting variables, and variable occurrences in right-hand sides during each update are consistent with the conflict relation. The transformation computed by the model can be implemented as a single-pass linear-time algorithm.

We show that the model can be simplified in natural ways if we want to restrict either the input or the output to either strings or ranked trees. For example, to compute transformations that output strings it suffices to consider variable updates that allow only concatenation, and to compute transformations that output ranked trees it suffices to consider variable updates that allow only tree insertion. The restriction to the case

of ranked trees as inputs gives the model ofbottom-up ranked-tree transducers. As far as we know, this is the only transducer model that processes trees in a bottom-up manner and can compute all MSO-definable tree transformations.

The main technical result in the chapter is that the class of transductions definable using streaming tree transducers is exactly the class of MSO-definable tree transductions. The starting point for our result is the known equivalence of MSO-definable tree transduc- tions and Macro Tree Transducers with regular look-ahead and single-use restriction, over ranked trees [EM99]. Our proof proceeds by establishing two key properties of STTs: STTs are closed underregular look-aheadand undersequential composition. These proofs are challenging due to the requirement that a transducer can use only a fixed number of variables and such variables can only be updated by assignments which obey the single-use-restriction rules. We develop the proofs in a modular fashion by introducing intermediate results (for example, we establish that allowing variables to range over trees containing multiple parameters does not increase expressiveness). In this chapter we do not address the problem of representing infinite alphabets.

We show a variety of analysis questions to be decidable for STTs. We establish an EXPTIME upper bound for type-checking, and provide a NEXPTIMEupper bound for

checking functional inequivalence of two STTs. This is the first elementary upper bound for checking equivalence of a model that captures MSO-definable transforma- tions. When the number of variables is bounded the upper bound on the complexity becomes NP.

Documento similar