2.2. Bases teóricas
2.2.10. Niveles de la comprensión lectora
We investigate a set ofequivalence properties(or law)pthat state when two non-identical expressions E1 and E2 are equivalent, denoted by E1 ∼p E2. Given p, we can either
reformulateE1 into E2 or E2 into E1. For example, given commutativity as property of
conjunction, expressionsA∧B andB∧Aare equivalent, which we denote(A∧B)∼comm (B∧A). Therefore, we can replace the former with the latter and vice versa.
Thus, we consider two different things: first, an equivalence property or law that states when two expressions are equivalent, e.g. commutativity. Second, we consider reformu- lations that rewrite one expression into another representation, exploiting the equivalence property. Note, that such reformulations need not be deterministic. As an example, the expressionA∧B∧C can be rewritten into 5 different representations exploiting commu- tativity, such asC ∧B ∧AorA∧C ∧B. Furthermore, each such reformulation has an inverse. Note, that a reformulation is beneficial, if the chosen representation is not worse than the representation that is replaced.
To apply reformulations in a structured, efficient way, adetectionstep is required, where two (or more) equivalent but not identical subtrees are spotted in an instance. Naturally, the effort required to detect equivalences can be arbitrarily large, especially for very powerful reformulations. For instance, detecting a maximal clique of disequalities to match global constraintalldifferentis NP-complete. Therefore, we are interested in measures with low
detection effort but high node-reduction potential.
In summary, we can increase the number of identical subexpressions in an instance by ex- ploiting an equivalence propertypbetween two distinct expressionsE1andE2and rewrit-
ing one representation into the other by applying a corresponding reformulationr. Ideally, this approach has the following properties:
• Thedetectionof two equivalent expressions is cheap and integrable into tailoring
• ThereformulationfromE1 toE2 (orE2 toE1) is cheap.
• It is easy to determine which of the two equivalent representationsE1orE2isprefer- ableso that the reformulation does not impair the instance.
In the following, we present an algorithm that applies a generic reformulationr (with re- spect to a generic equivalence propertyp) to rewrite subexpressions into identical represen- tations. Subsequently, we discuss a set of equivalences property and their applicability for increasing the number of identical subexpressions.
An Algorithm for Reformulation
We consider an algorithm to increase the number of identical subexpressions by reformulat- ing expressions using a reformulationrbased on an equivalence propertyp. The algorithm is embedded into flattening when common subexpressions are detected. Whenever a to- be-flattened expressionE has no common subexpression, then the algorithm performs the following steps:
1. ReformulateE toEr, using reformulationr
2. Generate the String representationStringEr from expressionEr
3. Check the hash-table for an entry ofStringEr
4. If successful, return the corresponding auxiliary variable,auxr, after adding another
entry into the hash-table:E −→auxr
5. Otherwise continue flattening
Note, that step 4 is necessary in caseE has common subexpressions in the constraint in- stance: if another occurrence ofE is flattened, the hash-table check will be positive during standard CSE and the reformulation need not be repeated.
More formally, the algorithm is summarised in Alg. 4.3 that illustrates the extensions to the recursive flattening procedure that performs CSE, FLATTEN CSEinred font: whenever a non-leaf childeiofEhas no CS,eiis reformulated using reformulationr(line 9), yielding
the reformulated expressioner. Next,eris converted to the StringStringR(line 10) and the
hash-table is checked for an entry ofStringR(line 11). If the check is successful, the hash-
entry is added to the hash-table, linking the original expression toaux, i.e. Stringei −→
aux. This assures that ifei appears again in the instance, the hash-table will have an entry
and the whole reformulation process won’t be repeated. Otherwise, if the hash-table has no entry ofStringer, flattening proceeds as usual.
Algorithm 4.3 Reformulation for CS-Increase. The recursive procedure FLATTEN REF
(E,flatten2Aux) is based on the CSE-flattening procedure from Alg. 4.2, ( FLATTEN CSE),
and performs a general reformulationrin order to increase the number of identical common subexpressions. Extensions are given inred font.
1: if¬(all ofE’s children are leaves)then
2: for allei∈children(E)do 3: if ¬(ei.isLeaf)then
4: Stringei←toString(ei)
5: if hash-table.contains(Stringei)then
6: aux←hash-table.get(Stringei)
7: else
8: if ris applicable toeithen
9: er←r(ei)
10: StringR←toString(er)
11: if hash-table.contains(StringR) then
12: aux←hash-table.get(StringR);
13: hash-table.add(Stringei,aux);;
14: else
15: aux←FLATTEN REF(ei,true);
16: hash-table.add(Stringei,aux)
17: else
18: aux←FLATTEN REF(ei,true);
19: hash-table.add(Stringei,aux)
20: E.replaceChildWith(ei,aux) 21: ifflatten2Aux then
22: Aux←createNewVariable(E.lb,E.ub);auxVars.add(Aux)
23: constraintBuffer.add(‘Aux=E’) 24: return Aux
25: else
26: return E
Generic Time Complexity
Alg. 4.3 is very general and its complexity depends on two factors. First, theapplicability
of reformulation r matters: r is usually applicable only to a certain kind of expressions. For instance, de Morgan’s Law can only be applied to particular Boolean expressions that are composed by disjunction and conjunction. Hence the algorithm also depends on the frequency of occurrence of the expression type to which r is applicable. We denote mr
the number of subexpressions in instancen to which ris applicable, where n ≥ nr ≥ 0.
Furthermore, we denotenr,u the number of unique nodes to which r is applicable, i.e. if
nr −nr,u > 0then there exist common subexpressions amongst the nodes to which r is
Second, it depends on the cost of reformulating expressionEtoEr, i.e. the cost of applying
ronE, which we denotecost(r,k), wherekis the number of nodes in the expression tree. Since Alg. 4.3 is based on CSE-flattening (Alg. 4.2), we analyse the corresponding exten- sions to derive the time complexity:
Applicability Check First, the to-be-flattened expressioneiis tested for applicability, whose
cost we denoteapplicr(k) wherek represents the number of nodes in the tested ex- pression. This test is performed on all n! subexpressions that have no previously flattened common subexpression. In the worst case (if the instance contains no com- mon subexpressions)n! = n, hence performing the check lies in O(n)∗applicr(ˆk),
whereˆkis the maximum number of subexpressions of any expression in the instance
(in the worst case, if the constraint instance has only one constraint,ˆk =n).
Reformulation Second, the reformulationris applied to those nodes that pass the check,
which are all unique nodes to which r is applicable, i.e. nr,u nodes. Note, that the
othernr−nr,unodes to whichris applicable, are common subexpressions, and hence
have a match in the hash-table, to which we addeiandaux(line 13). We denote the
cost of the reformulationcostr(k), wherekis again the number of subexpressions in the reformulated expression. Therefore, the reformulation step lies inO(n)∗costr(ˆk).
since in the worst case nr,u = nr = n, and where ˆk is the maximum number of
subexpressions of any expression in the instance (note that if the constraint instance has only one constraint,ˆk =n).
toString Operation . Third, the reformulated expressioner is converted to String format
StringR, an operation that lies inO(k)wherek is the number of subexpressions the to-be-flattened expressionE contains. This is performed for allnr,usubexpressions,
where in the worst case, nr,u = n, yielding a runtime of O(ˆkn), where ˆk is the
maximal number of subexpressions an expression contains.
Hash-Table Operations Finally, hash-table operations are performed in order to retrieve
a common subexpression. The first hash-table check (line 11) is performed on all nr,u nodes and the following two hash-table operations are performed on all those
subexpressions that have a common subexpression. All hash-table operations are constant in average, but require to read the String representation, so we summarise the complexity with O(ˆsn), since in the worst case, nr,u = n, where sˆdenotes the
maximal String length of a subexpression in instancen.
In summary, the additional runtime complexity of Alg. 4.3 compared to CSE-flattening (Alg. 4.2) is:
O(n)∗applicr(ˆk) +O(n)∗costr(ˆk) +O(ˆkn) +O(ˆsn) (4.1)
In the following, we investigate different equivalence properties on both integral and Boolean expressions: associativity, commutativity, negation, distributivity, Horn Clauses and De
Morgan’s Law. In each case, we will apply the reformulation in Alg. 4.3 (if necessary) and analyse the corresponding runtime from Eq. 4.1.