A critical property of any intermediate language used to compile Haskell is its ability to support type erasure. Haskell takes pride in erasing all of its complicated, helpful types before runtime, and the intermediate language must show that this is possible. Pico achieves this goal through its relevance annotations, where irrelevant abstractions and applications can be erased. In previous, non-dependent intermediate languages for Haskell, irrelevant abstractions and applications were also erased, but these were easier to spot, as they dealt with types instead of terms. In Pico, types and terms are indistinguishable, so we are required to use relevance annotations.
I prove the type erasure property via defining an untyped λ-calculus with an operational semantics, defining an erasure operation that translates from Pico to the untyped calculus, and proving a simulation property between the two languages.
5.11.1
The untyped
λ-calculus
The definition of our erased calculus appears in Figure 5.19 on the following page. It is an untyped λ-calculus with datatypes (allowing for default patterns) andfix. The language also contains two fixed constants, ’Π and
˜
Π, here only to have something for
Π-types to erase to.
The calculus also supports “coercion abstraction” via its λ•.e and e•forms. The existence of these forms mean that coercion abstractions are not fully erased. We can see why this must be so in the following example: let τ = λc:Int ∼Bool.not(3Bc). The type τ is a valid Pico type. We do not have to worry about the nonsense in the body of the abstraction because consistency guarantees that we will never be able to apply τ to a (closed) coercion. As an abstraction, τ is a value and a normal form. However, if our type erasure operation dropped coercion abstractions, then disaster would strike. The erased expression would be not3, which is surely stuck. We thus retain coercion abstractions and applications, while dropping the coercions themselves by rewriting all coercions with the uninformative•.
What has now happened to our claim of type erasure? Coercions exist only to alter types, so have we kept some meddlesome vestige of types around? In a sense, yes, we have kept some type information around until runtime. However, two critical facts mean that this retention does not cause harm:
• Coercion applications contain no information, and therefore can be represented by precisely 0 bits. Indeed, this is how coercions are currently compiled in GHC,
Grammar:
e ::= a|H |e y|Π|caseeofealt|λa.e|λ•.e|fixe expression
y ::= e| • argument
ealt ::= π→e case alternative
e −→e0 Single-step operational semantics of expressions
(λa.e1)e2 −→e1[e2/a]
E_Beta
(λ•.e)• −→e E_CBeta ealti = H →e
caseH yofealt −→e y• E_Match
ealti = _→e no alternative inealt matches H
caseH yofealt −→e E_Default
fix(λa.e)−→e[fix(λa.e)/a] E_Unroll
e −→e0
e y −→e0y E_App_Cong e −→e0
caseeofealt −→casee0of ealt E_Case_Cong e −→e0
fixe −→fixe0 E_Fix_Cong Erasure operation, e = TτU: TaU= a TH{τ}U= H Tτ1τ2U= Tτ1U Tτ2U Tτ1{τ2}U= Tτ1U Tτ1γU= Tτ1U• TΠδ. τU= Π Tτ BγU= TτU
TcaseκτofaltU= caseTτUofTaltU Tλa:Relκ. τU= λa.TτU Tλa:Irrelκ. τU= TτU Tλc:φ. τU= λ•.TτU TfixτU= fixTτU Tabsurdγ τU= Π Tπ →τU= π →TτU
by using an unboxed representation that is 0 bits wide. Thus, no memory is taken up at runtime.
• The coercion abstractions are not, in fact, meddlesome. The way in which coercion abstractions could cause harm at runtime is by causing a program to be a value when the user is not expecting it. For example, if a compiler translated the Haskell program 1 + 2 into the expression λ•.1 + 2, then we would never get3. I thus make this claim: no Haskell program ever evaluates to a coercion abstraction. This claim is properly a property of the type inference / elaboration algorithm and so is deferred until Section 6.10.2.
One may wonder why Pico needs coercion abstractions at all. I can provide two reasons: to preserve the simplified treatment ofcase that does not bind variables, and in order to enable floating. An optimizer may decide to common up two branches of a case expression (i.e., float the branches out), both of which bind the same coercion variable. If there were no coercion abstraction form, this would be impossible. It is a correctness property of the optimizer (well beyond the scope of this dissertation) to make sure that the floated coercion abstraction does not halt evaluation prematurely.
5.11.2
Simulation
Here is the simulation property we seek:
Theorem(Type erasure [Theorem C.83]).IfΣ; Γ`s τ −→τ0, then eitherTτU−→Tτ
0
U or TτU = Tτ0U.
Note that the untyped language might step once or not at all. For example, when
Pico steps by a push rule, the untyped language does not step. The proof of this
theorem is very straightforward.
5.11.3
Types do not prevent evaluation
Proving only that the erased calculus simulates Pico is not quite enough, as it still might be possible that an expression in the erased calculus can step even though thePico type from which it was derived is a normal form. The property we need is embodied in this theorem:
Theorem (Types do not prevent evaluation [Theorem C.86]). Suppose Σ; Γ`ty τ :κ
and Γ has only irrelevant variable bindings. If TτU−→e0, then Σ; Γ`s τ −→τ0 and
either Tτ0U = e0 or Tτ0U = TτU.
This theorem would be false if Pico did not step under irrelevant binders, for example.
The proof depends on both the progress theorem and the type erasure (simulation) theorem above, as well as this key lemma:
Lemma (Expression redexes [Lemma C.84]). If TτU is not an expression value, then τ is neither a value nor a coerced value.
This lemma is straightforward to prove inductively on the structure of τ, and then the proof of the theorem above simply stitches together the pieces.