A related technique to PCC is Typed Assembly Language (TAL) [6], which was developed in 1998 by Morrissett et al. The basic idea in TAL system is that the compiler can insert type information in the form of labels in the assembly code. The approach of TAL systems is to preserve the typing information in the process of compilation, from the source code to the assembly language. If the assembly code type-checks then the code respects the operations or rules previously given by the code consumer in the form of typing rules. This technique is less robust than PCC, in the sense that it only tackles the problem of type safety.
Some compilers using this technique were developed, each of which preserves typing information from source code to intermediate stages, such as closure conversion or lambda lifting. Figure 2.6 depicts a staged transformation from System F (λF) to
Typed Assembly Code (RISC instructions) as follows:
• The intermediate language λF is transformed to λK which uses Continu-
ation Passing Style. In this part of the transformation, the intermediate language has continuations, instead of returning values the functions apply a continuation to them.
• The transformation from λK to λC and λH is in two steps. The first one is a
simplified closure conversion; any variable from the context of the function must be transformed to additional arguments of the specific function. The second step is a lambda lifting process, in which all the local definitions of functions are hoisted by using functions and parameters.
Chapter 2. Literature Review 21 λF λK λC λH λA TAL
Continuation Passing Style Closure Conversion
Hoisting Allocation Code Generation
Figure 2.6: TAL transformation diagram.
• The transformation from λC and λH to λA is called allocation and produces
the intermediate language with “lets” as constructors, these let instructions represent allocations in memory. Initialisation flags are added to every field of tuples defined in the previous stage.
• The last transformation from λA to TAL generates the TAL code. A simple
type-checker algorithm checks the TAL code off-line.
Another implementation of TALx86 [26], is a realistic typed assembly language, sufficient to implement a subset of typed C called Popcorn. Many properties can be checked using typing information, those properties include memory address allocation, types of the variables, stack-allocation, and basic type constructors (e.g., arrays and tagged unions).
The TALx86 instructions are a significant set of INTEL IA32 (32-bit 80x86 flat model) assembly language, to be executed on Intel Pentium processors. Table2.1
identifies the main components involved in the TALx86 system.
TALx86 uses MASM syntax for data and instructions. The data is extended to handle the type annotations inserted in the code. The type preconditions in form of annotations, are used to specify the types of the instructions before the control of the code is passed from one address to other. These kind of annotations are of
TALx86 tools talc Type-checker for TALx86 code.
link-verifier Linker for TALx86. Verifies that the linking of TALx86 files is safe.
assembler Assembles a TALx86 code to produce the object file (COFF or ELF format)
popcorn Subset of C that compiles to TALx86.
Table 2.1: Main components of TALx86 implementation.
-- PopCorn code int i= n+1; int s= 0; while (--i> 0) s +=i; -- TALx86 code
mov eax,ecx ; i=n inc eax ; ++i mov ebx,0 ; s=0 jmp test
body :{eax:B4, ebx:B4}
add ebx,eax ; s+=i test :{eax:B4, ebx:B4}
dec eax ; i-- cmp eax,0 ; i>0 jg body
Figure 2.7: An example of Popcorn and its TALx86 representation.
the form : ∀α1 : κ1...αm : κm.r1 : τ1, ..., r :n: τn where α1, ..., αm : are the bound
type variables, and allow registers to have a polymorphic type. The annotation r1 : τ1, ..., r :n: τnsays that every record from r1through rnhas the type τ1 to τnre-
spectively. The κ1 and κm allow the possibility of having different “kinds” of types
in the TALx86 implementation. The type-checker talc verifies that instructions respect these type annotations for a given piece of assembly code.
Chapter 2. Literature Review 23 2.4.2.1 An example: The sum of the first n natural numbers.
Figure 2.7 shows Popcorn code to compute the sum of the first n numbers. The fragment of the corresponding assembly code is in the same Figure 2.7. This assembly code includes annotations for types in the body and test labels. In the label body, the type annotations eax : B4 and ebx : B4 mean that eax and ebx have type B4 (abbreviation of Byte 4). A similar situation occurs in the label test where eax and ebx are required to be of type B4. This simple example shows how the types are represented in TALx86.
Array-bounds checking is the one of most complicated aspects of TALx86. This is because the size of the array is unknown until the execution of the code. TALx86 uses two type constructors: the first new type constructor is S(s), which is called singleton type, where s is an integer expression. The purpose of this new type is to assign an integer value to the register, for example, if ecx is represented as S(4) then the value in ecx must be 4.
The second type constructor is array(s, τ ) where τ is the type of the array elements and s is an integer expression representing the size of the array.
One main difference between TAL and PCC is that TAL applies a type-checking algorithm to the assembly code avoiding the use of proofs in a separate logical framework (LF). There is no need for a separate theorem prover. In this way, the size of the binary code is considerably reduced if it is compared against the binary code produced by PCC systems. Provided that types are preserved along the whole process of compilation, this technique generates safe assembly code automatically. The main advantage of this technique is the reduction of the Trusted Code Base. The verification process on the consumer side, requires only a type-checker for the typed assembly code.
Advantages of TAL include :
• The size of the binary code is reduced. It is not necessary to have a proof in a separate logical framework. The typed assembly language serves as source for type-checking tools.
• A Simple and fast type-checking algorithm is applied to the annotated as- sembly code. If the code type-checks then it is safe to execute. In practise
the actual code is kept separated from the type information, allowing to execute the assembly code in the standard way.
• The process to create typed assembly code can be automatic, by the preser- vation of the types during the compilation process.
The main disadvantages of TAL are :
• Any program that violates a type system’s invariants will not be typeable under that type system, even if the code is actually safe.
• It is less powerful and general than the PCC technique. PCC is more general because it can be applied to check other properties not related to the type system.