O. DE CÁCERES Viernes 1 Junio 2012 N.º 105 Página 43 CÁCERES

Our development is based on the general theory of syntax with bindings, which was presented in Chapter3.The development has two parts.

The first part is the instantiation of the general theory to the two syntaxes, of λ-calculus and of λ-calculus with emphasized values, together with the transfer from a deep to a more shallow embedding (reported in Section 4.1). This is currently a completely routine, but very tedious process—it spans over more than 15000 lines of code (LOC) for each syntax. The reasons for this large size are the sheer number of stated theorems about constructors and substitution (more than 300 facts for the one-sorted syntax and more than 500 for the two-sorted syntax) and the many intermediate facts stated in the process of transferring the recursion theorems. Thanks to using a custom template for the instantiation, the whole process only took us two person-days. However, this is unreasonably long for a process that can be entirely automated.

The second part is the theory of CBN and CBV λ-calculus, culminating with the proofs of the Church-Rosser and Standardization theorems (reported in Sections4.2and4.3). This is where our routine effort from the first part fully paid off. Thanks to our comprehensive collection of facts about substitution and freshness, we were able to focus almost entirely on formalizing the high-level ideas present in the informal proofs—notably in Plotkin’s sketches of his elaborate proof development for the standardization theorem. On two occasions—for

defining Takahashi’s complete parallel reduction operator and the number of variable occur- rences needed in the Standardization proof development—our recursion principle allowed us to quickly get off the ground, in the second case also offering useful freshness and substitution lemmas needed later in the proof. Altogether, the second part consists of 5000 LOC (2000 for CBN and 3000 for CBV) and took us one person-month.

An exception to the above general phenomenon (of being able to focus on the high-level proof ideas) was the need to engage in the low-level task of proving custom constructor- directed inversion rules for our reduction relations—illustrated and motivated in the discus- sion leading to Lemma4. This lemma is just one example of the several similar inversion rules we proved, corresponding to the inductive rules involving λ-abstraction in the reduction relations’ definitions. These rules are essentially the binding-aware version of what Isabelle/HOL offers via the “inductive cases” command [78]. They seems to be generally useful in proof developments that involve inductively defined reductions but require structural induction over terms. The literature on formal reasoning seems to have overlooked the general usefulness of these rules; and state of the art definitional packages such as Nominal Isabelle [71,74] do not attempt to infer them automatically.

Finally, our case study illustrates another interesting and apparently not uncommon phenomenon: that fresh structural induction on terms may be too weak in proofs, whereas depth- based induction in conjunction with fresh cases may do the job while still enabling the use of Barendregt’s convention—as illustrated in our proof of Lemma1.

In conclusion, we believe that our approach did a very good job handling some formalization task from the “real world”. There are indeed some technical initialization issues, but we believe they can be resolved once and for all, at the “developers” level, and completely hidden from the users—especially in our second formalization, described starting from the next chapter. On the other hand, we believe the features of our framework (notably a rich built-in theory of substitution and operator-aware recursion) have enabled us to produce a fully formal, yet high-level presentation of these fundamental results.

Chapter 5

Intermezzo: More Bindings to be

Captured

Although the POPLmark Challenge and its recursively defined binding structures from parts 1B and 2B have been openly among our motivations, our first framework does not easily capture its syntax. In that framework binding structures are modelled via abstractions (Section

3.1.1, Chapter3) that bind one variable in one term at a time. We could have tried an encoding of the binding patterns from the challenge, by iterating the abstraction binder from our theory. Instead we decided to remain faithful to our generality purpose and changed some design features of the framework.

The inability of dealing gracefully with complex binders is not the only limitation of this first framework. Lack of modularity is also an issue: we have modelled the generic syntax by means of a binding signature, which fixes constructors and their arities, and precludes from any nesting or modularity features.

We have indeed captured infinitely branching terms and, in particular, these terms contain infinitely many free variables. On the other hand, we have limited ourselves to well-founded syntaxes.

In this chapter we change our design decisions for the framework, extending the theory to overcome the just mentioned limitations. In Section5.1we analyse and criticise what we have achieved up to now. Then in section5.2we build step by step our new formalization, based on functors. This way we obtain our second framework, which will be described in the next Chapter6.

5.1 Critique of the First Framework

In Chapter3we have presented our first framework aimed at capturing the essence of syntax involving binding mechanisms and formalizing this in a general theory. Then with Chapter4

we provided an instance of the theory, a formalized case study that could help understanding what we have achieved, what is good and what is missing in our development.

We believe that one of the major strengths of our theory is the exhaustive infrastructure it gives for its instances. The development is conducted at a general level, definitions are given and lemmas are proven independently of the particular syntax. We define alpha equivalence and the basic operators, the fundamental ones being freshness (or equivalently the set of

free variables of a term), swapping and substitution. We give results about these objects up to reasoning and definition principles suitable for dealing with (alpha-equated) terms with bindings. Then the theory can be instantiated to a particular syntax and all the infrastructure is made available and customized for it.

In the the framework from Chapter 3 we achieve this by defining an arbitrary syntax, via an instantiable binding signature—in the instantiation process constructors, binders and arities are specified.

We quotient the general terms to alpha equivalence so that, when deploying the formalization in concrete developments, there is no need to deal with it: as we have already pointed out, one of our aims is the usability of the formalized theory and we believe that this is better reached if two alpha-equivalent terms are treated as equal and still the variables, also when bound, are explicitly named.

In order to avoid complications arising from reasoning about alpha-equivalence classes, we need a collection of lemmas and properties that deal directly with these classes and do not involve their particular representatives; these are indeed provided in the formalization and we have dedicated the whole Subsection3.2.4of Chapter3to their description. These lemmas allow us to seal the abstraction barrier and the user to deal only with high level concepts and ignore alpha-equivalence. Moreover some of these properties are subtle and, when dealing with a particular syntax, the time spent to formulate and prove them would be wasted, since they can be proved once and for all, at a general level.

We have endowed our theory with useful reasoning and definition principles suitable for dealing with (alpha-equated) terms with bindings, described in Sections3.4 and3.3 from Chapter3: Nominal’s fresh induction [58, 76, 73], which allows the formal use of Baren- dregt’s variable convention, Micheal Norrish’s swapping-sensitive recursion principle [52], our own substitution-sensitive one and a combination of the two—overall a fair variety of them for the user to pick from.

Our theory can capture many-sorted binding syntaxes (Section3.5); so among its possible instances we have, e.g., λ-calculus (also distinguishing the separate syntactic category of values, see Setion4.1.2from Chapter4), FOL, process calculi, and System F.

Finally we have captured infinitely branching syntaxes, like CCS [51]: these allow for an infinite number of free variables in their terms; thus we were forced to go beyond finite support and exploit cardinality theory.

However this framework, also when confronted with what we declared as our aims (Chap- ter1, Section1.4), has some major drawbacks. For example, it deals with an infinite number of variables, but only if they are organized in recursively defined, infinitely-branching terms; namely, non-well-founded syntaxes with terms that have possibly-infinite depth, are not yet covered.

Our theory also has an overly rigid binding mechanism: our standalone abstractions (see Chapter3, Sections3.1 and3.2) allow to bind in a term just one variable at a time. If it is true that this framework can capture many syntaxes with bindings, we have an important left- out: the motivational example of System F<: from the influential POPLmark challenge [7].

The encoding in our first framework of complex bindings, such as the recursively specified binding patterns of System F<:, is not immediate, if possible at all.

Finally, it is important to remark that our theory is developed in the style of universal algebra, namely, it is based on binding signatures, see Subsection3.5.1. With the signature, we fix our syntactic objects (organized in sorts, operation symbols, variables etc.) once and for all, not allowing for any kind of modularity. When formalizing mathematics in a proof assistant, any object, once defined, is exactly that one: it is not easy to transfer the reasoning to isomorphic structures, as instead is commonly done in pen-on-paper algebraic practice. For this reason, it is useful to define flexible structures from the beginning, so that they can be reused in a wider context, without the need of defining isomorphic ones. In our case, as already stated in Subsection1.4, the correct property to go after is modularity. A syntax should be able to be arbitrarily nested in other syntaxes, just like it happens with free (i.e. with no bindings involved) datatypes: it can be the case that we have a list of trees and also a tree with lists at its nodes.

This kind of modularity and also infintely-deep structures are features that have been captured by the (co)datatype package of Isabelle/HOL [18], based on bounded natural functors (BNFs, see Subsections2.2and2.3from Chapter2), which follows a compositional design [70] and provides flexible ways to nest types [16] and mix recursion with corecursion [13, 17]. However this work does not cover bindings.

Motivated by its modularity features, we start from the (co)datatype package development and model syntax with bindings with an approach based on functors—in particular on a refined version of BNFs. This approach has a main advantage: it is based on an already existing mechanism from the logical foundation, namely the dependency of type constructors on type variables. This will give more flexibility to our structures, which will have generic parameters as inputs, instead of variables picked from a fixed universe, thus allowing for the modular nesting property discussed above.

In document B.O. DE CÁCERES Viernes 1 Junio 2012 N.º 105 ALCALDÍAS CÁCERES. Edicto (página 21-27)