INFRAESTRUCTURA EDUCATIVA - RESULTADOS DEL PROCESO DE AUDITORÍA

In this section we have deﬁned a formal approach to analyzing the type mismatch errors in a language that implements the formalization of a variation of Local Type Inference. Our approach integrates seamlessly with the existing TypeFocus-based analysis techniques by re- ducing the non-derivable elements to the TypeFocus values. Real-world programs can exhibit different category of errors, such as an inference of type arguments that do not conform to the declared type bounds of type parameters, or failed overloading resolution. In practice, most of them will involve some form of subtyping polymorphism, and we have shown examples where we can translate the latter to TypeFocus values with relatively little effort.

The debugging of type errors relies on the fact that the non-derivable type derivation trees preserve the structure and are not immediately discarded. This is an acceptable restriction for languages with local type inference; the errors tend to be heavily localized which allows for a separation of the erroneous branches of the type derivation tree and continuation of type checking for the other parts of the program.

In our approach we do not attempt to address programs that exhibit multiple type errors or errors that depend on each other, as it is the case for example in Chen and Erwig [2014b]. This in turn implies that debugging type errors may involve separate type debugging sessions for each of the type errors reported for the program. The limitation is acceptable since each of the type errors can be explained in an autonomous way without any user interaction.

Chapter 5

Lightweight extraction of type checker

decisions

The TypeFocus-based analysis presented in the previous sections navigates type derivation trees in order to explain the typing decisions of the local type inference. The idea of using type derivation trees for type debugging purposes is not novel; previous attempts include pro- totypes for the OCaml type checker (in Tsushima and Asai [2013]) or some variants of Simply Typed Lambda Calculus (in McAdam [2002]). The past, straightforward approaches hardly apply to non-trivial examples due to incompatibility with the ofﬁcial type checker and a non- autonomous mode of operation that requires constant user feedback. In addition none of the approaches have considered languages using Local Type Inference.

A novelty of our approach lies in obtaining low-level data from the existing type checking runs, all without affecting the compiler’s logic or reducing its features. The collected data is then sufﬁcient to create a data structure that closely resembles the desired type derivation trees. The separation of the construction of the derivation has two main implications:

• The low-level representation can be collected using a minimal, non-intrusive instrumentation infrastructure, that minimizes the performance impact of regular, non - debugging compiler runs and is easy to maintain during the usual development of the type checker.

• Expressions having a similar structure may lead to subtle type checking differences, which in turn can lead to different runs of the type checker and different low-level data. By mapping to a common high-level representation we establish a well-deﬁned set of expected type checking rules supported by the process. Algorithms that navigate type checker runs based on the reconstructed high-level representation are statically checked for correctness.

Chapter 5. Lightweight extraction of type checker decisions

def typecheckAst(ast: Tree, pt: Type): Tree = {

EV ≪ EV.TypecheckAst(ast, pt) ... // instrumented typing of ast EV ! EV.AstTyped(...)

...

EV ≫ EV.TypecheckDone(...) ...

}

(a) A an explicit instrumentation.

def typecheckAst(ast: Tree, pt: Type): Tree =

EV. instrument (EV.TypecheckAst(ast, pt), EV.TypecheckDone(_)) { ... // instrumented typing of ast

EV ! AstTyped(...) ...

}

(b) A compact version.

Figure 5.1: A brief look at the Instrumentation API.

In this section we will step through the construction of this high-level representation. Section 5.1 discuses the API used for instrumenting the compiler and Section 5.2 discusses how the individual type inference rules, their premises and typing judgments are represented through a high-level representation. Section 5.3 describes a one-to-many translation from low-level data traces to their high-level counterparts; the unambiguity of the mapping is determined by imposing restrictions on the possible deﬁnitions of the high-level representation.

5.1 Compiler instrumentation

The type debugger tool collects low-level type checker information by manually instrumenting the existing Scala compiler using a minimal API, a set of low-level instrumentation classes, and an infrastructure for debugging. The instrumentation primarily extracts raw type check- ing information that includes abstract syntax trees, type or symbol references; depending on the fragment of the type checking being instrumented, more specialized type information is collected, e.g., type variable variance information or an inferred type substitution. Listing 5.1 provides a small example of the manually instrumented method that type checks an AST (parameterastof typeTree) using the expected type (ptof typeType).

In the example, valueEVrepresents a reference to the instrumentation universe that extends the main compiler class, calledDebuggerGlobal, which controls the execution of the compiler (the implementation-dependent DebuggerGlobal class will be discussed in the next

5.1. Compiler instrumentation

chapter). The instrumentation universe deﬁnes an abstract base classEvent, that all the low- level instrumentation classes will extend from, and the instrumentation methods used for reporting them, i.e., ≪,≫and!. In the example, valuesTypecheckAst,AstTypedand TypecheckDoneof typeEvent, have been deﬁned in the instrumentation universe (for con- sistency, we chose to explicitly mention path-dependent typesEV.x, wherexrepresents a member of the universe).

The instrumentation API introduces the notion of an instrumentation block, which makes it possible for structural information to be collected during instrumentation. The additional property is sufficient for recreating traditional premises-conclusion relations in the typing rules, as opposed to typically “flat” instrumentation data. These instrumentation blocks are delimited by the≪and≫operators, and typically also contain other (potentially nested) instrumentation blocks, delimited using the same operators, as well as single instrumentation points (defined using the! operator). As a result, the framework understands that direct instrumentation points between the≪and≫method calls can be considered as type checking dependencies, without having direct references to them in the source code. The equivalent compact version of the instrumented code in the second part of the listing is us- ing aninstrumentmethod; the method wraps the type checker code as a by-name argument, and ensures proper opening and closing of the instrumentation block.

Listing 5.2 presents a (simplified) fragment of the instrumentation universe definition that is available in the compiler. The listing provides an overview of the instrumentation classes and methods, including the convenient overloadedinstrumentmethod that ensures an appro- priate block handling (the difference between the two alternatives stems from the presence of the default closing event, or lack thereof ). Due to the Scala’s optimizer not performing whole-program analysis (Dragos [2008]) most of the methods are marked as final and have an@inlineannotation. The inlining helps to avoid performance penalties associated with the additional method calls during the regular, non-debugging compiler runs.

The mode of operation of the type checker is determined using theisOnmethod. When the result is a Boolean valuefalse, any instances of the instrumentation classes will be discarded. When the result istrue, the instances of the instrumentation classes are used to construct a raw tree representation; the individual instances represent the values in the nodes of the tree, and the parent/child relationship between the nodes is determined by the block opening/- closing information.

ThewithNoEventsmethod indicates that fragments of the type checker, provided as an argument, will always execute with the instrumentation turned off. The method allows to discard the type checker executions that are implementation-dependent, unsupported, or irrelevant from the analysis point of view, and otherwise would have to be unnecessarily exposed in the high-level representation.

We chose to manually instrument the Scala compiler since the alternative is to modify byte- code (using e.g., http://eclipse.org/aspectj/), which is too coarse-grained; the instrumenta-

Chapter 5. Lightweight extraction of type checker decisions

1 trait EventsUniverse { 2 self: DebuggerGlobal => 3

4 val EV: EventModel

6 abstract class EventModel {

7 @inline

8 final def >>>(x: Event): Unit = if (isOn) { // ... } 9

10 @inline

11 final def <<<(x: Event): Unit = if (isOn) { // ... } 12

13 @inline

14 final def <<(x: Event): Unit = if (isOn) { // ... } 15

16 @inline

17 final def instrument[T](x: Event, y: T => Event)(body: => T): T = { 18 <<< x

19 val result = body

20 >>> y(result) 21 result 22 } 23

24 @inline

25 final def instrument[T](x: Event)(body: => T): T = { 26 <<< x

27 val result = body

28 >>> EV.Done 29 result 30 } 31

32 @inline

33 final def isOn: Boolean = // ...

35 @inline

36 final def withNoEvents(body: => T): T = // ... 37

38 abstract class Event { ... }

39 case class TypecheckAst(tree: Tree, tpe: Type) extends Event

40 case class AstTyped(tree: Tree) extends Event

41 case class TypecheckDone(tree: Tree) extends Event

42 case object Done extends Event

43 ... 44 } 45 }

Figure 5.2: A brief look at the instrumentation universe.

tion blocks not always align at the entry and exit of some type checker method. An automatic, or semi-automatic, approach to instrumenting the compiler would admittedly be less error- prone but for practical reasons we chose the former. At the same time any bytecode manipu- lation library would have to be aware of the semantics of the language in which the compiler 130

In document RESULTADOS DEL PROCESO DE AUDITORÍA (página 36-42)