Erbis Scheme

Texto completo

(1)Erbis Scheme Alejandro Forero Cuervo July 10, 2003. i.

(2) ISC-2003-1-15. i. Abstract This document presents the design of a Scheme interpreter for programming distributed applications. Special attention is paid to the algorithms to provide a coheren distributed memory model. The main goals in the design of the interpreter are making it portable, scalable, minimalist and extensible. Keywords: scheme, functional programming, distributed programming, parallelism..

(3) ISC-2003-1-15. ii This work is respectfully dedicated to Sergio García and Sebastián Gonzalez, for me the most important of the teachers I met at the Universidad de los Andes..

(4) ISC-2003-1-15. iii. Contents 1 Introduction. 1. 2 Goals and scope. 4. 3 Background 3.1 Functional programming . . . . . . . . . . . . . . . . . . . . . .. 5 5. 3.2. Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Scheme and macros . . . . . . . . . . . . . . . . . . . . . Processes and threads . . . . . . . . . . . . . . . . . . . . . . . .. 7 7 8. 4 Framework 4.1 Programming interface . . . . . . . . . . . . . . . . . . . . . . .. 13 14. 3.3. 4.1.1 4.1.2 4.1.3. Memory Environments . . . . . . . . . . . . . . . . . . . Threads . . . . . . . . . . . . . . . . . . . . . . . . . . . Domains . . . . . . . . . . . . . . . . . . . . . . . . . .. 14 14 16. 4.1.4. Network . . . . . . . . . . . . . . . . . . . . . . . . . . .. 17. 5 Programming methodology. 18. 6 Stand-alone interpreter. 21. 6.1. . . . .. 21 24 25 27. Execution model . . . . . . . . . . . . . . . . . . . . . . . . . . 6.4.1 Alternate model . . . . . . . . . . . . . . . . . . . . . . . Input and output . . . . . . . . . . . . . . . . . . . . . . . . . . .. 28 29 30. 7 Primitive types 7.1 Characters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 32 32. 6.2 6.3 6.4 6.5. Memory management . . . 6.1.1 Reference counting Environments . . . . . . . Control loop . . . . . . . .. 7.1.1. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. Different charsets . . . . . . . . . . . . . . . . . . . . . .. 32.

(5) ISC-2003-1-15 7.2. iv. 7.3. Symbols and strings . . . . . . . . . . . . . . . . . . . . . . . . . 7.2.1 Modifying strings . . . . . . . . . . . . . . . . . . . . . . Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 33 33 34. 7.4 7.5. 7.3.1 Numbers of arbitrary precision . . . . . . . . . . . . . . . Special objects . . . . . . . . . . . . . . . . . . . . . . . . . . . Lists and vectors . . . . . . . . . . . . . . . . . . . . . . . . . .. 34 35 35. 8 Distributed interpreter 8.1 Running a distributed application . . . . . . . . . . . . . . . . . .. 37 37. 8.2. Distributed memory management . . . . . . . . . . . . . . . . . . 8.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2.2 The coherence requirement . . . . . . . . . . . . . . . . .. 38 39 40. 8.2.3 8.2.4 8.2.5 8.2.6. References to objects Local references . . Variable names . . . Lamport clocks . . .. . . . .. 41 42 42 43. 8.2.7 8.2.8 8.2.9. Caches for variables . . . . . . . . . . . . . . . . . . . . The ownership table . . . . . . . . . . . . . . . . . . . . An order for events . . . . . . . . . . . . . . . . . . . . .. 45 46 46. 8.2.10 8.2.11 8.2.12 8.2.13. Invariants . . . . . . . . . . . . . . . Read accesses . . . . . . . . . . . . . Write accesses . . . . . . . . . . . . Satisfying the coherence requirement. . . . .. 47 48 51 54. 8.2.14 Actual structures . . . . . . . . . . . . . . . . . . . . . .. 54. 9 Conclusions. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. . . . .. 57.

(6) 1 Introduction “The mere formulation of a problem is far more essential than its solution.” – Albert Einstein. The purpose of this work is to specify and implement a distributed interpreter for the Scheme programming language, introducing small semantic changes to give the programmer more control over multiple processes and execution domains. The interpreter is run on multiple computers connected across a network, each executing multiple threads for one given application. A simplified version of this model is shown on figure 1.. Environment Dependance Connection Computer Process. Figure 1: A single application running in the interpreter. In figure 1, an environment represents a structure binding symbols to values. Each thread has one environment associated that is used to lookup the symbols in the expression it evaluates. Note that different threads can share the same environment or use different environments and an environment needs not be stored in.

(7) ISC-2003-1-15. 2. the same domain as the thread using it. In practice the memory is partitioned in a much finer level and the same variable can be stored in multiple environments. Semantics are added to Scheme for controlling execution environments and processes. Using Erbis, the interpreter implemented in this work, the Scheme programmer is able to do the following: • Create and modify different execution environments. • Create, stop, resume and cancel the execution of different processes. • Share execution environments among processes running in different nodes (computers). • Dynamically migrate threads across the network. The interpreter implements a minimalist interface which is not meant to be used directly by applications. Functionality is provided to allow the syntax and semantics to be extended dynamically within the language itself through the use of syntactically clean macros (which are explained in section 3.2.1 and in [9]). The programmer is thus able to use new constructions as soon as they are defined. This functionality allows the definition of high level object-oriented as well as process-oriented models, implemented on top of the interpreter to provide mechanisms to deal with concurrent programming and provide reasonable interfaces to the application programmer. These frameworks can make the parallelism visible to the programmer or hide it and rely on specific heuristics to distribute the evaluation of expressions and balance the load across a network of computers. The author believes that the object oriented paradigm –the most widely used these days– has serious limitations that disappear when the notion of a process is directly introduced into the semantics of the programming language. This belief has been central to the design of languages such as Erlang and Occam. Processes then become the central building block through which modularity is achieved. Complex applications are split into multiple activities that are carried on simultaneously and are connected to each other, in much the same way that computer.

(8) ISC-2003-1-15. 3. hardware is usually modeled. The interpreter allows further exploration of this area, combining some of the best characteristics of functional programming with the ideas of process-oriented programming and other paradigms. Section 2 lays out the purpose of this work. More important, it explains certain goals that fall outside of its scope and were left for future work. A complete explanation of concepts important for understanding this work is provided in section 3; readers with a strong background on programming might want to skip it. Section 4 explains the programming interface that the interpreter provides to allow the implementation distributed applications. A brief explanation of the programming methodology used in creating the interpreter itself is provided in section 5. The implementation of the interpreter is divided in two parts, for which two separate branches are kept. Section 6 explains the design and implementation for a stand-alone interpreter (that is, one that only runs in one computer, providing semantics for controlling multiple concurrent processes and execution environments). The model is then extended in section 8 to allow migration of processes across multiple computers running different instances of the interpreter (which requires functionality for dynamically migrating processes as well as significant modifications to the memory management algorithms). The full code for the two different branches of the interpreter (the one optimized for execution in a single machine and the one with the extensions required for running a single program across multiple computers) is provided and explained through this document..

(9) 2 Goals and scope “Everything should be as simple as possible but not simpler.” – Albert Einstein.. The following are the main goals of the interpreter: Portable The interpreter should be easily portable to different platforms. Platform-specific interfaces have been avoided and no assembly code has been used. Scalable The interpreter should scale appropriately as the number of processes increases. Minimalist The interpreter should introduce as little modifications to the Scheme language as possible, while providing enough functionality for implementing reasonably complete frameworks. Extensible It should be easy to add new functionality to the framework in the future. There should be reasonable mechanisms for allowing access to native interfaces to the Scheme programmer. No higher-level frameworks for application programmers were designed as part of this work. This task is left for future projects based on the interpreter. Also, no high level application programming interfaces (such as math libraries or sets of widgets) where created. This implies that the interpreter must perform relatively well in most situations while being easy to optimize in order to improve its efficiency for specific frameworks. Finally, it is desirable that the network of computers over which a distributed application is run can be modified dynamically. This should make it possible to add and remove processing units to the network, increasing its processing power, without losing the state of a given application..

(10) 3 Background “In theory, there is no difference between theory and practice. But, in practice, there is.” – Jan L.A. van de Snepscheut. 3.1 Functional programming There are many different programming paradigms in use today. As its name implies, a programming paradigm reflects the way the developer approaches her problems and formulates her solutions. For instance, the object-oriented paradigm attempts to reduce programming problems to modeling the roles of different entities (the classes) and their relations. By programming this way, its proponents argue, software will be very modular, which makes it easier to understand, modify and reuse. Functional programming is not easy to define. In general terms, this paradigm, in contrast with the imperative paradigm, tends to emphasize the evaluation of expressions rather than the sequential execution of commands. To execute programs, the computers iterate over their machine-level instructions, executing them one by one. This makes imperative languages closer to the way computers operate as they force the programmer to formalize their algorithms as series of steps to be performed sequentially. Functional languages, on the other hand, are closer to the way human minds work, as they tend to allow the programmer to describe her algorithms as expressions that are evaluated. Consider, for example, the factorial function. It is typically defined as follows: ( 1 x=0 f (x) = x × f (x − 1) x > 0 In an imperative language, such as C or Java, this function would typically be programmed as:.

(11) ISC-2003-1-15. 6. int fact ( int x ) { int total = 1; while (x < 0) { total *= x; x--; } return total; }. While there are shorter possible implementations, this one makes it clear how the algorithm is formalized as a sequence of steps that are performed in order. In contrast, a functional implementation, in a language such as Haskell or Scheme, would look like follows (as with the previous example, this version could be improved): let rec fact x = if x = 0 then 1 else x * fact(x - 1);;. There are many consequences of this emphasis on evaluation of expressions that are common to most, if not all, of the functional languages. Procedures as first class citizens Procedures are visible objects that can be stored inside complex structures, passed as arguments and manipulated as other types such as numbers or strings. Automatic memory management The programmer doesn’t need to take care of the memory used by objects no longer required; it is reclaimed by the system automatically. Functional languages have been around for a long time and have shown all the strength and power of their expressiveness. By allowing the creation of new.

(12) ISC-2003-1-15. 7. procedures at run time and giving them first order status, they allow for a great level of modularity not available in other programming paradigms.. 3.2 Scheme Scheme is one of the oldest languages still in use (its first description was written in 1975). A close cousin of LISP, Scheme is an exceptionally clean functional language that offers a very rich set of properties not available in many commonly used languages that make it ideal to formalize algorithms. First and foremost comes its simplicity and uniformity. With the exception of a few selected constructions, it is a very easy to understand and use. It has been used for years at universities on introductory programming courses. It is, like most functional languages, properly tail recursive. This allows the execution of iterative computations to be performed in constant space, even if they have been described using syntactically recursive procedures. Scheme is a strict language. This means that, in a function invocation, arguments are evaluated before the function is called. Non-strict languages, such as Miranda, Hyper Builder and Haskell, evaluate the arguments only as their values are required. Like most functional languages, Scheme allows the use of lazy evaluation, which delays the evaluation of expressions until some point on the future when they are actually needed. One evaluation that is captured in this way is called a promise. Only when the promise is forced, the expression is evaluated and its value returned. Besides simplifying the implementation of certain tasks, this allows the programmer to easily represent infinite lists (on which each node holds one value and a promise to evaluate the next node). 3.2.1 Scheme and macros One of the main reasons why this work is based on Scheme is its macros. Just like functions are semantic abstractions over operations, macros are textual ab-.

(13) ISC-2003-1-15. 8. stractions over syntax. Unlike many languages, Scheme allows the programmer to extend its syntax with macros in a consistent and reliable manner. Like Dickey puts it, “Macros have the advantage of expanding the syntax of the base language without making the native compiler more complex or introducing runtime penalties. [...] Because macros can be used to extend the base language, doing macro design can also be viewed as doing language design. [...] Scheme has perhaps the most sophisticated macro system of any programming language in wide use today” [1]. This feature is crucial to this work, as it allows frameworks to be built entirely on top of the interpreter using macros, imposing very little syntactic limitations. For instance, the macro system of Scheme has been used to implement pattern matching [11], a feature commonly found in other programming languages. The implementation was entirely written in Scheme as a series of macros providing a clean interface; it requires no modifications to Scheme interpreters or compilers, was very efficient and being implemented using macros imposed no syntactic limitations whatsoever. A complete description of the mechanism for constructing macros in Scheme can be found in [9].. 3.3 Processes and threads Most of the platforms currently in use offer different semantics to allow the “simultaneous” execution of multiple processes. At any given time only one process is being actively executed. The processing time is partitioned and different slices are used to execute each process. As the number of runnable processes grows, the time slices allocated for each process grow smaller and the performance of each process decreases. A processor works by executing the instruction at the position in memory pointed to by the program counter. Each process has its own virtual address space on which both its code and data are kept. The kernel loads and unloads virtual pages as they are required by the different running processes; the hardware keeps.

(14) ISC-2003-1-15. 9. a table mapping virtual pages to real pages in memory. When a program accesses a given position in memory, the hardware uses this table to map the virtual address to one of the pages currently in memory (or, if the page is not present, possibly because it was sent from memory to disk or because the process had not used it before, passes control to the operating system asking it to handle the situation). To use the abstractions provided by the kernel (such as files), the processes use system calls. Inside them, the kernel regains control of the processor and can decide to switch to a different process (specially if the system call will take time to complete). The kernel also uses certain interruptions, specially the one generated by the clock, to regain control from the user-space processes. The scheduler is the part of the kernel that keeps track of the time slices given to each process and decides which one to run. A real time platform is one that, under normal load conditions and as long as there are available resources, will not undergo long periods of time without giving control to those processes that are ready to run. That is, one for which there exist ε and δ such that each process will run for at least δ seconds every ε seconds (for this to be useful, ε should be small). Real time platforms tend to have a slightly smaller throughput but are required for certain applications, which tend to use little processing power but need to be given control as often as possible. One example of this is a music player: it requires to be run very often (or else the sound buffers will empty and the audio will be interrupted) but it will not use much processing time. It is possible to have multiple processes share the same virtual address space. This has the advantage of allowing the processes to share common information in memory, without requiring the use of character oriented mechanisms such as sockets. In this case, the memory of both processes (including their stack) is stored in the same virtual address space and careful programming is required to avoid certain potential problems that could arise by simultaneously accessing the same structures. When processes share the address space they are executed in (and some other properties such as signal handlers or the table of opened file.

(15) ISC-2003-1-15. 10. descriptors), they are usually called threads. There are different kinds of threads: user-space threads and kernel-space threads. Kernel space threads are those that are visible to the operating system; it knows about them and uses its scheduler to decide how to splice the processing time among them. Basically, kernel threads are very much like processes, but the kernel keeps record of the special relation among them. In Linux, for example, each process has a pointer to its memory address structure; the creation of kernel-space threads includes setting this pointer to the same value in multiple process structures. An application using kernel-space threads is depicted in figure 2, where the black circle represents the address space and the white circles represent kernel visible processes. The kernel recognizes the different threads and uses its own structures and algorithms to control them.. Kernel. Figure 2: An application with three kernel threads. User space threads, on the other hand, are hidden to the operating system, usually implemented through a library: the kernel only sees one process. The library (rather than the kernel) keeps information about the threads and decides how to partition the time and assign different slices to each. The library provides wrappers to most of the system calls (specially the ones that can block the process): when a system call needs to be made, the library checks to see whether it would block the process or not. In the affirmative case, it will perform an asynchronous.

(16) ISC-2003-1-15. 11. version of the call and give control to another thread; control will be returned to the calling thread eventually (once the asynchronous call completes). Figure 3 shows the same situation of figure 2 but with user threads. In this case, the kernel doesn’t see any of the threads.. Kernel. Figure 3: An application with three user threads. A common situation is for a thread to undergo big calculations that take long amounts of time without performing a single system call. With user threads, such a thread would block the execution of all others for unlimited amounts of time. To avoid this situation, the code for the calculation explicitly performs periodic calls to a certain function in the library to give it control and allow it to run other threads. This is sometimes called cooperative or non-preemptive threading within some circles. As the kernel structures must be general enough and handle many different situations, by avoiding complexity, user-space threads tend to have a smaller overhead than kernel-space threads. Processes, on the other hand, tend to have the bigger overhead of the three models as each requires the management of its individual address space (which is why threads as usually spoken of as “lightweight processes”). One problem of user-space threads is that, since the kernel sees them as a.

(17) ISC-2003-1-15. 12. single process, it is impossible for them to run simultaneously on different processors. On single-processor machines this is not a problem as the limitation already exists, but on multi-processor machines, which are becoming increasingly common, parallelism is severely affected. Certain systems use a combination of kernel and user-space threads as a way to obtain the best of both alternatives. These systems are programmed using userthreads but at the beginning of their execution a relatively small number (usually smaller than the number of available processors) of kernel threads are created to run the former. As the number of kernel threads is small, the overhead associated with them is small, even under a high number of (user) threads. However, this model allows the simultaneous execution of multiple user threads in multiprocessor computers..

(18) 4 Framework The interpreter presented provides a minimalist interface on top of which rich frameworks for allowing the implementation of distributed applications can be built. Three layers can be identified. The lowest level layer, implemented as part of this work, provides a programming interface with enough primitives to support distributed applications. The main goal for this layer is simplicity. The first layer is implemented mostly as C code, but some parts of its logic are specified using Scheme and executed by the interpreter in a manner that is completely transparent to all the other layers. This is explained in more detail in section 8.1. A second layer, which clearly falls outside the scope of this work, provides a richer interface that is built on top of the first. The only purpose of this layer is to provide syntactical constructs that aid the development of distributed applications. The second layer should be implemented in Scheme code, running on top of the first layer. Macros, as explained in section 3.2.1, are very likely to play a crucial role in any implementation for this layer. The term “generic framework” is applied to this second-level layer throughout this document. User applications can be seen as a third layer, running on top of the interface provided by the second layer. It is worth noticing that a concrete interface was specified for the first layer. By designing it in a minimalist fashion, the creation of different implementations is made easier. However, no interface is specified for the second layer. By using different interfaces (and their corresponding implementations) at the second layer, one can use different programming paradigms for developing distributed applications. Future investigation should try to identify useful high-level features for dealing with the inherent complexity of distributed applications. For instance, by modifying the implementation of the second layer, one can use a paradigm that.

(19) ISC-2003-1-15. 14. makes it impossible to share memory across different threads and relies entirely on message passing or one that provides a strong focus on locks, semaphores and similar structures for controlling access to objects shared across multiple threads.. 4.1 Programming interface The interpreter provides a full implementation to the interface specified by the Revised(5) Report on the Algorithmic Language Scheme. This section defines the extensions available to the main thread (which executes the read-eval-print loop) that it can make available directly or indirectly to the user applications. Two threads are created at startup. The first provides a standard read-evalprint loop used to control the interpreter. The second thread waits for incoming connections from client interpreters, called execution domains. As soon as a new connection is established with those interpreters, it is possible to use them for the execution of threads. More information about this setup is provided in page 37. 4.1.1 Memory Environments the-environment This call returns the specified specifier for the current memory environment at the position in which it is executed. A toplevel call to the-environment returns the same specifier as a call to interaction-environment, as specified in [9]. However, a call to the-environment inside uses of let and similar environment modifiers returns different specifiers. 4.1.2 Threads A new top-level type is added to represent threads. The operations used to operate on it are described in this section. thread-self Returns a thread object describing the calling thread..

(20) ISC-2003-1-15. 15. make-thread expr env dom This expression causes dom, one execution domain (more information on executing domains is available below), to create a new thread and evaluate expr, an arbitrary expression, inside the environment specified by env. The continuation for this call is passed a thread object describing the new thread. thread-stop thr Stops the execution of the thread specified by thr. If the thread was already suspended, this call does nothing. If the thread was waiting for IO, it continues to wait but becomes suspended whenever the IO operation completes. The return value for this function is unspecified. thread-exec thr Continues the execution of the thread specified by thr. If the thread was already running or is waiting for IO, this call does nothing; it only has effect when passes threads that have been stopped by means of thread-stop. The return value for this function is unspecified. thread-sync thr0 thr1 Synchronizes the time (Lamport clock, see section 8.2.6) of the two threads given. The maximum time is set for both threads. This is the only part of our interface where the mechanism for managing distributed memory (see section 8.2) is directly exposed to the programmer. It should be useful to implement generic frameworks (2nd layer) but should not be used by the applications directly (3rd layer). The return value for this function is unspecified. thread-migrate thr dom This call causes the thread thr to be migrated to the domain dom. The caller must stop the thread (by means of thread-stop) or this call will fail. Like thread-sync, this function should be used in the implementation of the generic framework rather than called directly by applications..

(21) ISC-2003-1-15. 16. This function allows the generic framework to implement different policies to balance the load of processing all the threads through the network based on different hints and data in a manner that is transparent to the application programmer. The return value for this function is unspecified. Notice that there is no function to cancel the execution of a given thread. Thread execution is canceled automatically by the garbage collector for threads that become unreferenced. For a thread to become unreferenced, it must have been stopped (by means of thread-stop) first. 4.1.3 Domains A new top-level type is added to represent execution domains in the network. One domain is one computer on which threads can be executed. The following operations can be used to operate on them: domains Returns a list of domain objects with all the execution domains. Some information is provided in section 8.1 about how new domains can be added to the interpreter. domain-threads dom Returns a list with all the threads executing inside the domain specified by dom that have been created by means of make-thread. Certain threads internal to the domains are only visible to the interpreter, never to the Scheme code. domain-quit dom Causes the domain specified by dom to disconnect from the server and terminate. The domain must not hold any threads (that is, domain-threads must return the empty list when applied to it) or the call will fail. The return value for this procedure is unspecified..

(22) ISC-2003-1-15. 17. 4.1.4 Network The interpreter provides high level access to the standard networking programming interface through the following functions. A new top-level primitive type is added to represent sockets on which a thread might wait for incoming connection. connect server port Creates a new TCP connection to the specified host and port. Server must be a string describing the host such as www.google.com or 127.0.0.1. Host name lookup is performed if required. Port must be a number describing the TCP port to connect to. A port is returned on which the standard IO procedures can be used. The calling thread is blocked until the connection is established. For the reasons described in section 6.5, the call to connect might temporarily block the execution of all the threads in the domain of the caller. accept socket Given a socket created by a call to listen, block the calling thread until a new connection is available. A new port on which the standard IO procedures can be used is created and returned. listen port Create a new socket bound to the TCP port specified by port. The socket can then be passed to accept to actually receive new connections..

(23) 5 Programming methodology This section provides a brief outline of the methodology and practices used in implementing the interpreter. The interpreter has been written using the C programming language, for which a good number of mature compilers and debuggers exist in most relevant architechtures probably making it one of the most ubiquitous languages. Unfortunately, C does not provide many important features that can significantly ease development, so special constructions and practices turned very helpful. One of the most important practices was the partitioning of the implementation in a big number of different modules. The modules were created taking into account the different data structures used through the interpreter. Each data structure or, for complex data structures, each selected portion of a data structure has one module associated and all accesses to its fields are performed by means of calling the functions on it. The information on each module is stored into two different files, the “C source file” and the “C header file” (which is designed to be included from the source files of other modules). Most of the time this information can be divided in four parts: 1. The function prototypes, stored in the C header file, contain all the prototypes for the symbols exported by the module. Not all symbols inside a module are exported, only those that are useful to the rest of the program. 2. The inline code portion of the modules includes a few relatively simple functions which are called very frequently. The inline keyword is added before the function definition to cause calls to these functions to be inlined. The implementations of these functions are stored in the header file but a special preprocessor macro must be defined before it is included to make them visible. This makes it possible for a C source file to first obtain all the prototypes for the functions in all the modules and then obtain the defi-.

(24) ISC-2003-1-15. 19. nitions for all the inline functions, which solves circular dependencies that could arise between different header files. 3. The standard code portion of the modules includes the implementation for all the functions (and sometimes variables) in the module that are not inlined. This code is stored in the C source file. 4. The type definitions for the data structures associated with the module. If there are inline functions that depend on the type definitions, the later are stored in the header files. Otherwise the actual structures are defined in the source file and weak types are defined in the header file so it becomes impossible to access the fields in the structure from any place other than the module’s source file. This approach makes the implementation of the interpreter fairly object oriented; the modules in the interpreter are designated following the different data structures and a primitive mechanism of encapsulation is provided. Special care was taken to keep each module relatively simple. It should thus be possible to perform automated testing of each module. Perhaps more importantly, each module becomes relatively simple to audit. Also worth noticing is that no direct access is allowed to the fields in the structures other than by simple functions designated specifically for that purpose. This is, in some circunstances, even enforced by the language as the actual definitions for the structures is sometimes only provided to the C source file of the relevant module. Explicit checks for invariants, preconditions and postconditions were added to most of the functions in the interpreter. To do this, an ASSERT macro was created that checks for a condition to be met and terminates the program’s execution if it isn’t after printing its location in the source code. By changing a parameter in the Makefile, it is possible to build the interpreter entirely disabling all the macros. When this is done, all the condition checks are left in the code (thus ASSERT(x); becomes (void) x;); the compiler removes.

(25) ISC-2003-1-15. 20. them emiting a “statement with no effect” warning for each of them. By leaving the conditions in the code, we avoid the creation of Heisenbugs 1 from expressions that have side-effects. The heavy use of assertion checks turned out to be invaluable in catching flaws with the implementation. Instead of random crashes, most of the bugs produced detailed error messages pointing directly to the culprit line. Running the interpreter with the debugger set to stop its execution whenever an assertion failed to analize its state was also very helpful. Benchmarks showed that the usage of inline functions to access the information in data structures (as opposed to direct access) does not have any measurable performance penalties as they are optimized away by the compiler. An alternative approach over the use of inline functions would be the use of macros but it introduces a lot of difficulties (see [6]) and does not provide any advantage. The approach described turned to be a good compromise as it showed the following advantages: Fast No measurable performance penalties were incurred by the pervasive use of simple functions to access data structures as they were inlined by the compiler. Neither were performance penalties introduced by the use of assertion checks throughout the code as it was possible to compile the interpreter without them. Modular It was possible to focus on specific modules and forget the rest of the interpreter, and thus the amount of complexity that the programmer has to deal with at any given time is kept rather low. Encapsulation was in a few circunstances enforced by the language. Safe The heavy use of assertion checks made it very easy to detect and fix defects in the implementation. 1. A Heisenbug is a bug whose presence is affected by act of observing it. For example, a bug which disappears in debug mode..

(26) 6 Stand-alone interpreter “The devil is in the details.” – Anonymous. In this section the design and implementation of an initial branch for the interpreter are explained. provided. The model allows the execution of concurrent applications but is restricted to run on a single computer. The model laid out in this section is extended on section 8 to allow execution across multiple machines.. 6.1 Memory management Scheme implementations are expected to provide automatic memory management: All objects created in the course of a Scheme computation, including procedures and continuations, have unlimited extent. No Scheme object is ever destroyed. The reason that implementations of Scheme do not (usually!) run out of storage is that they are permitted to reclaim the storage occupied by an object if they can prove that the object cannot possibly matter to any future computation [9]. The implementation for the stand-alone version of the interpreter uses a variation of the mark-and-sweep algorithm. Three C functions are provided: scm_malloc Returns new Scheme objects (represented by the C structure SCM) that can be initialized to specific Scheme types (such as strings, characters, numbers, pairs, vectors, etc.) and values and used. scm_gc Checks which objects (returned by scm_malloc) have become unreachable and reclaims any allocation space previously held by their representation..

(27) ISC-2003-1-15. 22. scm_gc_protect This function is used to register pointers to pointers to Scheme objects that are referenced from C code. These pointers are known as the “protected registers” and a list is kept with all of them. For instance, if the interpreter code keeps a variable for the list of all its threads, the address of the pointer to the list will be registered. This implementation does not provide a real time garbage collector (one on which the overhead of reclaiming unused memory is distributed through the execution of the program). There is no bound to the amount of time that the execution of the program can be interrupted when garbage collection is performed as scm_gc is called. This prevents the interpreter from being used on certain real-time environments but increases its performance while decreasing its memory consumption (as using a real-time implementation would require more complex structures and algorithms). Profiling of the interpreter has shown that, during a normal run, scm_malloc is one of the most called functions: its implementation must be very fast. Adding simple arithmetic additions or memory indirections to its operations decreases the overall performance of the interpreter in measurable ways. A doubly linked list with Scheme objects is kept. There is a mark, the “free objects mark”, in the list after which objects that are known to be free are kept.. Figure 4: The doubly linked list with Scheme objects. Figure 6.1 shows the state of the list at a given moment during the execution of the interpreter. Each of the squares, a node in the list, represents one given.

(28) ISC-2003-1-15. 23. Scheme object with all its associated memory. Those in black represent objects that are being used while those in white represent objects that could be reclaimed. As can be seen, all objects after the free objects mark are free. The interpreter does not keep track of which of the objects before the mark are actually used and which are free. The implementation of scm_malloc advances the free objects mark, decreases a count of free objects, reclaims memory (obtained calling malloc) associated with the object the mark was on and returns it. The caller can proceed to initialize it and use it. When scm_gc is called, it checks the counter for the number of free objects (those after the mark). The garbage collection is performed only when it is lower than a given constant (otherwise the function does nothing). To collect the memory associated with unused objects, the list of protected registers (a list of references to Scheme objects) is traversed and references in those objects are recursively navigated. Each object reached is tagged to avoid navigating it multiple times. Once this process is done, the memory associated with all the objects that were not reached can be reclaimed; it is easy to see that, since they can’t be reached from protected registers, they “cannot possibly matter to any future computation”. This makes it important to include any C variables that point to Scheme objects in the list of protected registers. Otherwise the collector might not be able to reach objects (and thus save from having their memory reclaimed) that are actually reachable from the C code. Also, since the C stack is not examined in any way, garbage collection can only be initiated from specific portions of the interpreter. This is not an important issue, as we will see in section 6.3, but in other contexts (specially if they interpreter were not based on continuations) it could become a serious limitation for the collector. Before traversing the list of protected registers, the free objects mark is set to the beginning of the list of objects (which would mark all objects as free). When.

(29) ISC-2003-1-15. 24. an object is reached during collection, it is moved from its current position in the list of objects to the beginning (which is why a doubly linked list is needed), automatically advancing the free objects mark. All of this is done keeping the counters consistent. At the end of the collection, objects will be stored before the mark if and only if they can be reached (and thus can matter for future computations). At this point it is possible to reclaim memory associated with objects that have been left after the mark. However, doing so could incur on many page faults that would seriously slow down the interpreter. To avoid this problem, memory associated with the objects is freed the next time they are returned by scm_malloc: if this causes a page fault, it is very likely that it would have been caused by usage of the object anyway. That is, right before returning any object, scm_malloc reclaims any storage space previously associated with it. If the number of free objects after the collection is lower than a given percentage of the total number of objects, new unused objects are added after the mark. The list of objects is thus dynamically expanded to accommodate the needs of the interpreted program. Note that the process of collecting garbage only needs one iteration over the reachable objects. The number of unreachable objects does not affect its running time in any way. The actual implementation for the interpreter is based on this design but extended slightly, in order to support a distributed memory. 6.1.1 Reference counting One interesting question is how counting references to objects could improve the performance of the memory management module of the interpreter. Clearly, reference counting is not a replacement for the current algorithm. There might be cycles of mutually referenced objects. As long as an object references itself, its references counter will never be zero (even though it can become unreachable). The interpreter could thus lose all its memory to mutually refer-.

(30) ISC-2003-1-15. 25. enced objects that are unreachable. In practice, however, this is a fairly unusual situation. Counters of references will thus decrease the need of full-fledged (and slow) garbage collection. The author verified this empirically. The interpreter was extended to use reference counting in addition to the previously explained algorithm. Objects whose reference count decreased to zero where automatically moved after the free objects mark in the global list of objects. This greatly decreased the number of times that the interpreter ran out of objects and had to perform the mark-and-sweep algorithm in the scm_gc function. It was found that, by freeing objects when their references count got to zero, many programs would run for unlimited amounts of time without requiring other mechanisms for collecting garbage. However, the time associated with updating the references for each object decreased the performance in a significant way. Small operations to update the counters for referenced objects had to be performed in many frequently called functions. Overall, benchmarks showed that the performance of the interpreter decreased from 10% to 20%: performing mark-and-sweep was shown to be less expensive than keeping the reference counters up-to-date. Reference counting also made the code of the interpreter more complex and prone to errors as it required the modification of many functions in many different modules. As there was no reason for keeping them, the modifications performed to implement reference counting were removed from the interpreter.. 6.2 Environments Each thread in the interpreter has an environment associated. The environment represents the bindings of symbols to values. When a thread needs to evaluate the expression (+ 1 x), the interpreter must.

(31) ISC-2003-1-15. 26. use the environment to lookup the symbols + and x in its environment in order to obtain their values (the actual function and number to which they are bound). If, by means of modifying the environment, the programmer binds + to a function other than the default different results will come up. When the programmer defines a new procedure (by using the lambda construction, for instance) both the list of argument names as well as the body are saved. When the procedure is called, the environment is extended, temporarily binding the symbols in the list of arguments to the values passed and then the body is executed. The environment is implemented as a list of lists of pairs of a symbol and its value. Usually the outer list will only contain a list with the definition for all the standard symbols such as +, and, if, etc., as in ((+ . function) (and . function) (if . function)) . Symbols defined by using define are added to this list, so (define x 5) leaves it as (((x . 5) (+ . function) (and . function) (if . function))) . If f is defined as (define (f x y) (if x (+ y y y) y)) , to evaluate (f #t 4) the environment will be extended to (((x . #t) (y . 4)) ((x . 5) (+ . function) (and . function) (if . function))) and the body of f will then be evaluated. Note that at this point the outer list contains two lists of pairs; the first list of pairs, the one holding the arguments of f, will be removed right after the evaluation of the body of f. The environment is also modified by calls to functions and macros such as set! or let. For instance, if the environment is.

(32) ISC-2003-1-15. 27. (((x . #t) (y . 4)) ((x . 5) (+ . function) (and . function) (if . function))) and the interpreter must execute the code (let ((i 6) (j 9)) ...), it will set the environment to (((i . 6) (j . 9)) ((x . #t) (y . 4)) ((x . 5) (+ . function) (and . function) (if . function))) during the execution of the code inside the let statement. When a thread is created, the programmer will specify the environment it should be launched in. By using an environment other than the default, one can restrict the functions available to different threads and add additional security checks in a manner that is completely transparent to the thread. Multiple threads might share some portion of their environments; modifications to those parts will be visible by all of them. However, each thread might extend its environment in a manner that is not visible by the rest (for instance, to execute calls to procedures, as explained above).. 6.3 Control loop The Scheme interpreter uses a loop to execute the multiple Scheme-level threads until all have terminated. Keep in mind that the interpreter only uses one C-level thread so special care must be taken not to block its execution when executing operations that block the Scheme-level threads on their behalf. Each iteration in the loop performs the following operations: 1. Call scm_input_get, which checks for any IO events that have been completed and modifies threads that were waiting for them. There is more information about this on section 6.5..

(33) ISC-2003-1-15. 28. 2. Iterate through the list of runnable threads executing one instruction for each. This is done by calling scm_thr_exec. More information about how the threads are actually executed is available in section 6.4. 3. Call scm_gc to see if a memory collection should be performed. Consult section 6.1 to see when this is so and how it is done.. 6.4 Execution model Erbis Scheme is strongly based on continuations. The state of each thread is represented by a triplet consisting of a function, which is herein referred to as “the t-callback”, some internal data, known as “the t-data”, and a value to be passed to the t-callback, called “the t-value”. The t-data is represented as a void pointer. Its actual representation depends on the specific t-callback. Different functions will use different C structures to hold the information. The actual prototype for t-callback functions is the following: void (*) ( SCM *thr, void *data, SCM *value ); thr should point to the thread being executed, data is the t-data and value is the t-value. The t-callback functions can be thought of as consumers of the t-values. When called they perform a little computation and, most of the time, modify the thread’s state (the triplet described above). The function scm_thr_exec calls the t-callback for a given thread, passing it its t-data and t-value. To continuously execute a thread it suffices to iterate scm_thr_exec on it until its t-callback becomes NULL. This happens only if the thread has terminated evaluating its associated expression (which might be impossible for some). The pair consisting of the t-callback and the t-data is what is known as the continuation of the thread. The implementation of the.

(34) ISC-2003-1-15. 29. call-with-current-continuation primitive thus becomes trivial: it merely creates a new procedure that resets the t-callback and the t-data for the thread to the values they had when it was created and sets the t-value to its first argument. Notice that there is no explicit place for the stack in the threads’ state. The stack, if available, must be stored inside the t-data, which is stored in memory allocated using malloc. As a consequence, one might view the stack of any given thread as a single linked list, although there is no easy way to traverse it (as the actual types of each t-data depend on the t-callback). This representation for the stack of the Scheme threads has three important consequences. First, this makes the interpreter very slow. Most modern machines are optimized so the standard usage of a stack (as, for instance, is done by C programs) is fast. For Scheme-level function calls, Erbis must allocate memory by means of malloc to hold the t-data for the new frame. Second, the garbage collector does not need to analyze the C stack to check for references to objects in local variables or function arguments. All the global variables that can point to Scheme objects are protected and collection is performed at one specific point in the control loop (see section 6.3). It thus suffices to recursively mark all the registered global variables to save all the objects that can possibly matter in future computations. Note, in particular, that the lists of threads in the different valid states are protected. Finally, no low-level code is required to stop the execution of one thread and pass control to another. To execute multiple threads Erbis iterates through the list of runnable threads calling the t-callback of each. As described above, the t-callback performs a small computation and modifies the thread’s state. This is also done at a specific position in the control loop (and special care is taken to remove suspended threads and add newly created threads to the list). 6.4.1 Alternate model An alternative execution model for the interpreter would be to directly map the Scheme-level function calls to the C stack..

(35) ISC-2003-1-15. 30. This was quickly discarded for the favored approach because the latter was found to be more consistent with the goals of our work, specially in helping make our interpreter portable and scalable. It might significantly improve the speed of the interpreter to modify it to map the Scheme stack to the C stack directly, as most interpreters and virtual machines do. To do so, one should keep an eye on the following complications: Manage multiple C-level threads As the interpreter has multiple Scheme level threads, mapping each to a standard stack would imply having multiple stacks. Some mechanism is required for managing the execution of multiple low-level threads. Usage of a library implementing a threading interface such as POSIX threads might be very helpful. Keeping the interpreter scalable Special care should be taken to keep the interpreter as scalable as it currently is. Although slow, the approach used by Erbis is very scalable as very little overhead is added by each thread. When there are many threads running and each is associated a normal stack problems appear. For instance, it might be hard to decide how many space to allocate for each stack. Modifying garbage collection The garbage collector would need to be modified. First, some code would need to be added to parse the stacks looking for local variables to avoid claiming their allocation storage when they are still in use. A library such as [4] could be of great use for this. This would not be enough, however, as one would need to find a way to perform collection while many threads are modifying their stack (or stop them all during garbage collection).. 6.5 Input and output Multiple input/output (IO) operations can be performed by the threads in our interpreter. Since the interpreter uses only one internal thread to execute all the.

(36) ISC-2003-1-15. 31. higher level threads, special care must be taken so whenever an IO operation must be performed for one of the threads, the interpreter does not become blocked. Asynchronous (also known as non-blocking) IO is used for all operations. The select system call is then used to check for IO events that have completed. When an IO event is found to be completed, it is processed and the information for any threads that were waiting for it is modified. All of this is done in the C function scm_input_get. The exact list of events recognized are: 1. Input is available on a file descriptor (which could represent a pipe, network connection, an actual file, a device and some other entities) that a thread is reading. 2. Output is now possible on a file descriptor on which a thread is waiting to write to. This allows the interpreter to control the execution of its threads based on the operating system’s facilities for flow control. 3. A new connection is available in a socket on which a thread is waiting for connections. Based on this it becomes very easy to create internet daemons using the primitives described in section 4.1.4. The select system call receives a timeout parameter that is used to have the interpreter’s process wait until an event is completed, which is done when all threads are waiting for events, or return immediately (even if no events have been completed). As a consequence, the operating system process associated with the interpreter will be blocked if and only if all the Scheme-level threads are blocked. An exception to the above rule is that during host name lookups the entire interpreter can become blocked. There is no easy and portable way to avoid this problem as the standard programming interface for performing these actions does not provide semantics for non-blocking lookups and designing one falls outside of the scope of this work. This could be improved in future versions using a portable implementation..

(37) 7 Primitive types This section provides an outline of the internal representation for the standard types managed by Erbis Scheme. Special attention is paid to the differences in the representation for the standalone and the distributed interpreter. The central criteria for deciding how to represent the different types was simplicity. The designs are presented along with their advantages and shortcomings. Alternative representations are explained for some structures, including information on how they could be implemented and the way in which they would affect the overall performance of the interpreter. A unique structure is used to represent all visible Scheme values. This structure, along with information for the garbage collector, has a field identifying the actual type of the object and a union of structures for the representation of each of the possible types.. 7.1 Characters To represent characters the C low-level representation for characters (bytes in the ASCII encoding) was used. A table is kept with one object for all 127 valid characters. Otherwise, each occurrence of a given character would require different allocation storage. This keeps the amount of memory required by each character relatively constant regardless of the number of times it is used, helping the interpreter scale up. 7.1.1 Different charsets It is possible to extend the interpreter using a higher-level representation for characters to allow different character sets such as ISO-8859-1, UTF-8 or UTF-16 to be used. For character sets with a larger number of possible characters, keeping a Scheme object for each of them could require a lot of memory. Some technique to.

(38) ISC-2003-1-15. 33. dynamically allocate and reclaim space only for the characters on use could help greatly.. 7.2 Symbols and strings As Erbis is not designed with special attention to internationalization, strings and symbols were represented as strings terminated by the zero byte. However, to optimize some operations, string lengths are also stored as an integer. This adds a small overhead (the size of an int) to each string. Notice that this imposes a limit to the length off the strings handled by Erbis Scheme. The limit is, however, too big to matter in practice (although time tends to make these statements concerning size limitations laughable). As future work, one might want to extend the representation of strings to allow strings in different encodings for proper internationalization. 7.2.1 Modifying strings It is worth noticing that it is not possible to modify a string, one has to create a copy. This can have important performance consequences for some programs that are not written with this limitation in mind but it becomes very important for implementing a distributed memory (see section 8.2.3). To solve this problem one could modify all the string-related operations to use vectors of characters and thus remove strings from the list of native types. This could create some overhead as a the amount of memory required for each character grows from one byte to the size of a reference. In the standalone interpreter, this will be 4 bytes on most machines, but in the case of the distributed interpreter this becomes around 20 bytes (as references are not merely C pointers but structures)..

(39) ISC-2003-1-15. 34. 7.3 Numbers For the sake of simplicity, C doubles are used to represent numbers throughout Erbis. The distinction between exact and inexact numbers (see section 6.2.2 of [9]) is thus irrelevant as all numbers are inexact. The portion of the R5RS concerning exactness of a number is ignored. As an optimization, Erbis Scheme keeps a table with all the integer numbers from zero to a given value (512 by default) and tries to use its entries, rather than create new structures, whenever possible. This is done at the parser level; whenever an integer number in the range of the table is read from the Scheme code, rather than create a new structure to hold its value, the table is used. As a consequence, no matter how many times a given number is used in the Scheme code loaded and executed by Erbis, all its instances will be represented by pointers to the same memory location. 7.3.1 Numbers of arbitrary precision It should be relatively straightforward to modify Erbis to use an arbitrary precision arithmetic library for its operations. To do this, a programmer would need to: 1. Modify the structure for numbers to use a pointer to the data type used by the library to represent numbers. 2. Reimplement the arithmetic operations (in number.c) to act as wrappers to the operations provided by the library. 3. For the distributed interpreter, implement a fast way to serialize and transmit numbers (modifying functions scm_protocol_encode and scm_protocol_decode). 4. Optionally, implement support for functions concerning exactness of numbers (such as inexact? or inexact->exact)..

(40) ISC-2003-1-15. 35. It is worth noticing that certain libraries (such as GMP) can greatly benefit by modifying the garbage collector to keep a dynamically growing pool of structures representing numbers for reuse to avoid the overhead incurred by constantly destroying and creating them. Also note that if this approach is implemented, Erbis will still benefit from the table with pre-allocated integers.. 7.4 Special objects There is an object type for objects that have no particular information associated with them but that must still be identified. Each object of this type is given a value from an enumeration identifying it. The following are some of the objects in this category: 1. The EOF object. 2. The true and false values for boolean operations. 3. The “unspecified” object (returned by some functions to help the programmer find mistakes in his code). 4. The “dot” object, used to represent occurrences of “.” in the source code. 5. Objects for the quote operators. There is only one instance for all of these objects (kept as global variables, protected to keep the garbage collector from reclaiming their storage).. 7.5 Lists and vectors Lists, as in most other Scheme systems, are represented internally using pairs. Each pair, as you can guess, contains nothing other than two references to other objects, the car and the cdr. The references, as sections 8.2.3 and 8.2.14 explain,.

(41) ISC-2003-1-15. 36. is not just a C pointer but a complex structure (which is required to implement a coherent distributed memory). Vectors are represented as C-level arrays of references to objects. This makes it possible to modify their values in a manner that keeps the memory coherent when the interpreter is run distributed (see section 8.2)..

(42) 8 Distributed interpreter The interpreter provides a uniform platform allowing the execution of multiple threads across different computers, as shown in figure 1. This section is presented as an extension to the model discussed in section 6, explaining the changes and new modules that are necessary for the execution of distributed applications.. 8.1 Running a distributed application Different instances of the interpreter can be run over different computers. Each of those instances is called an execution domain. All execution domains are connected through TCP/IP to provide a common platform over which the application is ran in distributed fashion. To execute a distributed application an initial thread is spawned on one specific domain (which is called the “server”). Only the functionality provided in the lowest-level layer (see section 4) is available to this thread. This thread first evaluates some Scheme code with the purpose of enriching its environment (for instance, many of the standard Scheme functions are implemented in Scheme rather than C so this thread needs to load their definitions). Once all the important symbols have been added to the environment, a secondary thread, called the “accept thread”, loads a few more definitions, opens a TCP port and waits for incoming connections of new “client” instances of the interpreter. At this point, control of the original thread is passed to the code associated with the second of the three layers in our system. The code should define all the functions and macros implementing the high-level interface which will be provided to the application. After this is done, the code should pass control to the application (a process which will very likely involve spanning more threads). Initially there is only one execution domain, the server. The user can then run instances of Erbis in different computers setting up the environment to cause.

(43) ISC-2003-1-15. 38. them to connect to the server. These new instances are called clients. When a client instance is started a thread is created. After evaluating the same Scheme code to enrich the environment that the server’s thread, the client opens a new connection to the server. Whenever a new connection is received by the server, a new thread, called a “protocol thread”, is created to handle it. Initial authentication is performed and, if successful, the client is added to the list of execution domains (variable *domains*, described in page 16) making it possible to migrate threads to the client. Only the first of all the threads described is directly visible to the second layer (and thus to the application).. 8.2 Distributed memory management One of the most difficult problems for the implementation of the distributed interpreter is the design of a mechanism to allow the execution of threads sharing information in memory across different computers. A naive implementation could store each object in only one computer at any given time. Whenever the value of a given variable is required, a request is sent through the network to all computers and the one holding it replies (probably assigning it to the computer where the request originated). Although such implementation could be optimized slightly, it would require tremendous amounts of bandwidth to keep up with standard processor speeds. Clearly some complex mechanism is required to avoid the delays that would be introduced by low bandwidth and high latencies in order to run the interpreter over standard equipment. This section describes the algorithms used by Erbis Scheme to provide a distributed memory. First, in section 8.2.1, formal notation is introduced to represent sets and relations used throughout this chapter. A formal description is provided of what is meant by saying that a distributed memory is coherent in section 8.2.2. This description is defined in a manner independent of Erbis Scheme for the sake.

(44) ISC-2003-1-15. 39. of generality. The next sections provide the actual description of the algorithms used. For the sake of clarity, an outline of the design for the memory manager, stating what information is kept with little regards for the data structures used to store it, is presented first and only in the end, after showing, in section 8.2.13, that the coherence requirement holds with such design, is a description of the actual structures used (focused on speed and low memory consumption) presented. 8.2.1 Notation This section introduces some notation that will simplify the definitions presented in the next sections. A variable is any “object” that might somehow change its value. During a portion of the execution of a given application, let V be the set of all variables used, D the set of all the execution domains, Pd the set of all the threads in domain d ∈ D and Rv and Wv the set of all read and write accesses to variable v ∈ V respectively. Define S p,q as the set of all the invocations to primitive functions that can be used to synchronize p ∈ P and q ∈ P. S p,q includes all events that can cause the execution of either p or q to be suspended until another event is performed by the other thread. This includes all disk and network IO performed by either p or q. For convenience, define P, R, W and S as the sets of all threads, all read and write accesses to all variables and all synchronization events respectively. That is, P = ∪d∈D Pd , R = ∪v∈V Rv , W = ∪v∈V Wv and S = ∪ p,q∈P S p,q . Let E be the set of all interesting events, E = R ∪W ∪ S, and time(e) represent the exact time at which e ∈ E took place (time can be represented, for instance, using real numbers for the number of seconds elapsed since the beginning of the execution). Define a binary relation in P × E such that for p ∈ P, e ∈ E, we say that “p performed e” and write p : e if either e is a memory access caused by the execution of p or e ∈ ∪q∈P S p,q . Call E p the set of all events performed by thread p ∈ P (that is, E p = {e ∈ E|p : e}).

(45) ISC-2003-1-15. 40. It is worth noticing that the second condition in the above definition makes it possible for one event to be performed by more than one thread. In particular, e ∈ S p,q if and only if p : e and q : e and, since it is impossible for one given memory access to have been performed by more than one thread, E p ∩ Eq = S p,q . Given v ∈ V , define an equivalence relation ˜ in the set of its accesses (R v ∪Wv ) such that a˜b if v held exactly the same value (or state) immediately after both accesses were completed. In the case where r ∈ Rv and w ∈ Wv , w˜r means that the value returned by r is the same as the value assigned by w. Notice that all the above sets are defined with regards to some portion of the execution of an application. For the rest of this document it is assumed to be its execution from the beginning until a given instant (or its termination). 8.2.2 The coherence requirement The implementation of the interpreter should be coherent in that once a variable is modified, further accesses must return the new value. This will be called “the coherence requirement for the implementation of the distributed memory” (or “the coherence requirement”, for short). Define an order relation <s on E such that x <s y if it can be shown that x took place before y. This means that either both events were performed by the same thread and time(x) < time(y) or there is some other event which we can show that took place before x and after y. Formally, x <s y if at least one of the following conditions holds: 1. time(x) < time(y) and x, y ∈ E p for some p ∈ P. 2. x <s e and e <s y for some e ∈ E. Clearly, this is a partial order: given two events performed by different threads, it will not always be possible to show which one took place first. Also note that, if restricted to E p for some p ∈ P, the order becomes total. We say that the memory is coherent (with regards to a portion of the execution of a distributed application) if it is possible to build a linear order < extending.

(46) ISC-2003-1-15. 41. <s (that is, on which x <s y implies x < y) such that for every r ∈ Rv there exists some w ∈ Wv with w < r such that w˜r (r returned the value stored by w) and for no x ∈ Wv the condition w < x < r holds. The next sections describe the mechanisms used in Erbis Scheme to make sure that its memory management is always coherent (i.e. it never fails to satisfy the coherence requirement). 8.2.3 References to objects Erbis Scheme uses a rich representation for references to objects. On the stand-alone interpreter the references are standard C pointers to the memory area where the objects’ information is stored. For instance, a pair contains two pointers to other objects, the car and the cdr. On the other hand, the extension to make Erbis distributed uses a secondary structure (which, from now on, is refered to simply as “a reference to an object”) containing special information about the reference. Other than the value of references, all objects are made immutable. In particular, it is impossible to modify strings (see page 33). As a consequence, the only elements in V are the references. R and W represent accesses to references and thus it suffices to design a mechanism to bring coherence into accesses to references. This simplifies the implementation significantly as it avoids having to manage separately the coherence of accesses to different structures. This representation of references will be very important for the implementation of a distributed memory satisfying the coherence requirement designed in this work. For the rest of this chapter the terms “variable” and “reference” are used with exactly the same meaning. The creation of a reference is considered to imply a write access to memory. It follows that for all read access to a variable v, a write access necessarily occurred before..

(47) ISC-2003-1-15. 42. 8.2.4 Local references It is worth noticing that, during standard execution of a thread, many of the memory accesses will be to local references which are not reachable by other threads. It is said that “variable v is local to thread p” or, alternatively, that “p owns v” if p was the last thread to modify its value and no other thread has read it afterwards. Note that this statement regards a specific instant at which the condition holds. A variable v can be local to one given thread or to none at all (if it has been read by a thread other than the last to modify it). A variable that is not local to any thread is called global. Evidently, in no case can a variable be local to more than one thread. A variable local to a given thread can become global at any given moment if it is accessed by a different one. The converse is also true: a variable will become local at the moment it is modified by any given thread (until its value is read by a different thread). Whenever a new variable is created by thread p (for instance, by passing arguments in a function call, which modifies the thread’s environment, or by creating a new pair), it is marked as local to p. For as long as v is local to p, it suffices to store the value of the latest assignment in the domain executing p and notify all other domains that the value was modified. In practice, Erbis does not explicitly notify the other domains but uses Lamport clocks (section 8.2.6). 8.2.5 Variable names References can be either named or unnamed. All new references in any domain are unnamed until the value of an object that includes them is sent to another one. At this point a new name is created and given to the variable, uniquely identifying it across the whole network. For example, when a new pair is created by thread p, its car and cdr will be both local as well as unnamed. Suppose that thread q, running on a different domain, requests the value of the pair (that is, the value of a variable containing.