The JVM is able not only to load classes, but also to load objects (instances of classes) from a storage device or from the network via the Java object serialization mechanism [50, 119]. Object serialization stores an object (or more generally a graph of objects) in a binary stream in such a way that a Java program can reconstruct the object’s state at a later time. The binary stream can then be saved on a persistent storage or sent over a wire7.
NOTE
Along with instance data, the object serialization mechanism writes a special object to the stream to represent the serializable object’s class. This object is of typeObjectStreamClass, and is essentially a descriptor for the Classobject associated with the serialized object. It contains the class’s name, its unique version number (serialVersionUID), and the class fields.
The serialization run-time calculates a default serialVersionUID value for serializable classes that do not explicitly declare it. The Java Object Serialization Specification [119] strongly recommends that all serializable classes explicitly declareserialVersionUIDvalues since their computation is highly sensitive to class details and may vary between different Java compiler implementations.
Object serialization supports encryption, both by allowing classes to define their own methods for serialization and deserialization (inside which encryption can be used), and by adhering to the composable stream abstraction (the output of a serialization stream can be channelled into another filter stream which encrypts the data).
NOTE
The Java run-time restricts access to fields declared to be private, pack- age protected, or protected. No such restriction can be made on an object once it has been serialized; the stream of bytes resulting from object serialization can be read and altered by any object that has access to that stream. Consequently, Java developers who declare a class to beSerializable must first give some thought to the possible consequences of that declaration.
7
Only the object’s state is saved; the object’s class file and methods are not saved but must be accessible from the system in which the restoration occurs.
3.3. CODE MOBILITY IN JAVA 59
Codebase Annotation
Given a class’s name from the descriptor found in the serialized stream, the JVM still needs to know where it should load the actual class from. Once it has that information, the JVM can create a class loader with the codebase pointing to that location, load the class, link it to the current execution environment, and initialize its class data.
While the JVM could use the data contained in the serialized object stream to create an instance of the class and initialize the object’s instance data, the best way is to stamp the code location for the class onto the serialized object stream – in other words, to annotate the serialized object with the codebase URL. This method facilitates dynamic code mobility, because the JVM can decide at run-time where it should download the classes from. This is also the fundamental technique used by Java RMI.
The Java RMI run-time mechanism provides a special output stream which serializes the stub of a remote object (i.e. an object which im- plements the java.rmi.Remote interface) instead of the object itself and annotates the location of the objects’class to the serialized stream.
NOTE
The codebase is specified to the JVM using the Java RMI property
java.rmi.server.codebase. Thecodebaseproperty is a space-separated list of URLs. When deserializing an RMI stub annotated with the codebase, the RMI run-time (i.e. java.rmi.RMICLassLoader) will create a class loader for each codebase URL specified in this list.
3.3.4
Security
The Java security framework is organized into three layers, each one ad- dressing differents needs:
• At the platform level, the Java security infrastructure provides sev- eral security mechanisms such as the bytecode verifier or the sand- box mechanism for Java applets. At run-time Java applications can download classes on demand; classes can thus be loaded from ei- ther the local file system (built-in classes) or from a network. The main security concern of the bytecode loader is to prevent built-in classes from being "spoofed" by other classes. This is accomplished by partitioning classes into separated namespaces (see Section 3.3.2). • At the language level, the Java compiler delves into extensive static checking to detect as many errors as possible at the compilation
stage. In particular the generated bytecode is guaranteed that all references to objects, methods, and variables are of the appropriate type, that Java ’s access control mechanism is not violated, and so on.
• At the application level, the Java security framework provides a broad set of security mechanisms available to applications for im- plementing a requested security policy. For example, Java RMI re- quires the installation of a security manager in order to use dynamic class loading; without it, servers could easily attack their clients by sending malicious code that masquerades as a remote stub.
Java’s security primitives are largely based on where code originated from. Thus, security policy files grant permissions based on where code was loaded from, and location is specified using URLs. The Java class loaders have a a crutial responsability here; a vulnerability 8 in the
AppletClassLoader would for example allow a remote user to connect
to local sockets on the target system.
Executable code is categorized based on its URL of origin and the private keys are used to sign the code. The security policy maps a set of access permissions to code characterized by particular origin/signature information. Protection domains can be created on demand and are tied to code with particular Codebase and SignedBy properties.
NOTE
The class java.security.CodeSource extends the concept of a code- base to encapsulate not only the location (URL) but also the certificate chains that were used to verify signed code originating from that lo- cation.
3.4
Discussion
RPC-based technologies are connection-oriented, since first the connec- tion must be established (by requesting a reference to a remote object via the name server) and then used (throughout the interaction with the remote object). Code that uses remote objects should thus be cluttered with many checks in order to deal with possible network communication failures.
Nevertheless, benefits of mobile code include fault-tolerance, service customization, code deployment and maintainance.
8
Chapter 4
State of the Art
You have to design distributed systems with the expectation of failure.
Ken Arnold1
While much research efforts have been devoted to the field of dis- tributed programming most published studies focus on aspects related to inter-process communication while the present work primarily consid- ers aspects related to the dynamic relocation of code fragments.
In this chapter we concentrate our analysis on studies and projects dealing specifically with the dynamic rebinding mechanism in a dis- tributed environment. First, we present two language calculi and sketch relevant aspects of their formalization; second, we look at four distributed languages and evaluate their respective solution in relation with our ap- proach presented in the introduction (see Section 1.1).
Most research works related to this project have focused on functional programming languages, in particular on the ML language family.
The key ideas presented in this thesis have been influenced by the lambda calculi proposed by Bierman and al. [9] and the language Obliq designed by Luca Cardelli [28] and share the same objectives which are to improve the design of programming languages for distributed compu- tation.
1