After generating equational constraints, we also need to solve them. One of the lessons from the Zombie implementation is that rather simple syntactic unification (Section 6.2.1) suffices to typecheck most practical example programs.
However, one worry when doing so is that this would upset the programmer’s mental model of the system. The type system based on bidirectional typing and congruence closure described in Chapter 5 has the pleasant property that all types are checked only up to congruence closure; the exact syntactic form does not matter. In order to get a theoretically well-behaved type inference system, we would want the constraint solver to satisfy the same property, i.e. the result of the query to the solver should only depend on the equivalence class of the inputs.
Therefore we are led to study unification with respect to a context of equality as- sumptions. That is, given two terms A and B which contain unification variables, find a substitutionσ such that
σΓσA=σB
This generalization of the unification problem is calledrigid E-unification. It was first studied by Gallier et al. [53, 54] who proved that it is decidable and NP-complete. Because the problem is NP-hard it is difficult to find an efficient and complete algo- rithm. In Zombie we have experimented with two heuristic algorithms. The simplest 12For the same reason, type inference for System F is equivalent to second-order unification, and
is to treat unification and congruence closure completely separately (Section 6.2.1); the other is a backtracking search that interleaves unification and congruence rea- soning (Section 6.2.2). Neither algorithm is complete, but they perform well on our example programs.
6.2.1
Simple syntactic unification
The simplest approach to unification modulo congruence is to completely separate unification and congruence. Whenever we have to solve a problem Γ A = B, we first syntactically unifyAandB “as much as possible”, and then check if the resulting terms σA and σB are in the congruence closure of σΓ.
“As much as possible” means that we proceed as in the usual recursive unification algorithm, except that we never return “not unifiable”. In cases where were we are asked to unify two different term constructors, or when the occurs check fails, we just move on and hope that the two terms will later turn out to be provably equal to each under the assumptions Γ. In the following pseudocode, X denotes unification variables, and F ai is a label application in the sense of Section 5.5.1 (i.e. F is a syntactic constructor in the core language).
unify(X, a) = assignVar(X, a) when X 6∈fv(a)
unify(X, a) = return ()
otherwise. //failed occurs check.
unify(F ai, F bi) = zipWithM unify ai bi
unify(F ai, G bi) = return ()
We have implemented this algorithm as an option in Zombie (--cheap-unification), and we find that it suffices for almost all the uses of unification in our set of test-cases. There are only four locations in the source code where the cheap unification fails and the more expensive default algorithm (Section 6.2.2) succeeds. These four cases fail in essentially the same way, so it suffices to look at one of them, e.g. the following definition of a function lookup which retrieves an element from a length-indexed list (Vector). In order to ensure that the index is in range, it is represented as a Fin n, i.e. a natural number strictly smaller than n. (The declaration of Fin ensures that
n >0, which forbids lookups into the empty vectors.)
data Fin (n : Nat) : Type where
FZ of [m:Nat] [n = Succ m]
FS of [m:Nat] [n = Succ m] (Fin m)
head = ...
tail : [A : Type] ⇒ [n:Nat] ⇒ Vector A (Succ n) → Vector A n
tail = ...
lookup : [A: Type] ⇒ [n:Nat] → Fin n → Vector A n → A lookup = λ [A] .
ind recFin [n] = λ f v .
case f [f_eq] of
FZ [m][m_eq] → head v
FS [m][m_eq] fm → recFin [m] [ord m_eq] fm (tail v)
The functionlookup is written in terms of helper functionsheadandtail. But with the cheap unification method, Zombie will fail to infer the argument n for the call (tail v) on the last line. The constraint that we need to solve is
Vector A n = Vector (?X : Type 0) (Succ (?Y : Nat))
Syntactic unification will instantiateX :=A. However, bothnandSucY are headed by (distinct) constructors, so from only looking at these two terms there is no way to know how to instantiate Y. In order to typecheck this expression, we need to note that the context contains the variables
m : Nat
m eq : n=Succ m
and use this to instantiateY :=m. In other words, we need to take equality assump- tions in the context into account.
6.2.2
Unification on equivalence-classes
To deal with such cases, the default unification algorithm in Zombie does unification
after congruence closure. That is to say, to solve a goal Γ a = b we begin by running the congruence closure algorithm described in Section 5.5, which partitions all subexpressions of Γ, a and b into equivalence classes. If after that, a and b are not in the same equivalence class, we try to unify them. The unification algorithm operates on classes, as follows:
• If the equivalence class ofa contains a unification variable X, then assignX:=
b0, where b0 is some arbitrary member of the equivalence class of b (chosen to not contain X, in order to pass the occurs check). The two equivalence classes can now be merged.
• If neither class contains any unification variables, see if a is congruent to some expressionF ai and b is congruent to some expression F bi, where both expres- sions are headed by the same label F. If so, recursively try to unify ai and bi for all i.
We can see how this works for the problem
Vector A n = Vector (?X : Type 0) (Succ (?Y : Nat))
that we mentioned above. Neither of the two expressions are provably equal to a unification variable, but theyare headed by the same constructorVector, so we go on to recursively unifyA with ?Xand n with(Succ (?Y : Nat)). The first subproblem can be solved by assigning?X. In the second subproblem, we see that the equivalence class of n contains (Succ m), which is headed by the correct constructor. So we go on to unify m with ?Y as desired.
This algorithm satisfies the property that whether two terms are unifiable only de- pends on what equivalence classes they are in, and it can solve all the unification problems that occur in our testcases. However, one can raise two complaints against it.
First, the above description does not specify what happens if there is more than one choice of label applications. For example, if we know a = F ai = F0ai0 and
b=F bi =F0bi
0
, then we could choose to recursively try to unify either ai with bi or
a0i with b0i. In this situation, the Zombie implementation will try all possibilities, in a backtracking search.
In the current set of example programs, this situation with multiple possible decom- positions only happens a handful of times, which are all easy to resolve. However, in general there is no guarantee that the search will terminate quickly. In particu- lar, the usual proof that rigid E-unification is NP-hard works by reducing Boolean satisfiability to rigid E-unification, as follows. We encode propositional formulas as syntactic formulas constructed from True, False, ∧, and ¬, and work with respect to the following context Γ, which contains equations specifying the logical connectives:
Γ≡ True∧True=True,True∧False=False,
False∧True=False,False∧False=False,
¬True =False,¬False=True
Now a Boolean formula such asφ≡(X∧¬Y)∧¬(Y ∧X) is satisfiable iffφ=Truehas a unifier with respect to Γ. The only way to unify them is to search for an assignment of True/Falseto the variables.
If we encode the above satisfaction problem as a Zombie program, the typechecker will indeed instantiate X :=True and Y :=False—after doing a backtracking search over different ways to match True againstTrue∧True or against¬Falseand so on. In
general, this is a quite inefficient way to enumerate propositional assignments, and for large formulas it would be very slow.
Second, this is not a complete algorithm for rigid E-unification. Tiwari et al. [136] give the example of unifyinggf f f gf f X andf f f X, given the equationsgX =X and
X = a. Picking X := f a is a solution, but one cannot find it just by equating two function arguments from the input problem, and indeed the Zombie typechecker is not able to solve this problem instance. To handle cases like this, all known complete algorithms for rigid E-unification include a rule of last resort, which tries to find a binding for a unification variable X by exhaustively trying each subexpression of the input problem in turn.