Privacidad del cliente
3. Índice de contenidos GRI
Several studies have applied passive learning to discover some of the temporal and data dependence relationships of a software system’s protocol.
Ammons et al. have applied this technique for X11 programs, for exam- ple [10]. Their input sequences consist of function calls and their attributes (i.e. parameters). Because the interface of such a program is typically open-source and well documented, no message format reverse engineering is applied here. Instead, method calls are instrumented, and several preprocess- ing steps are taken to make the domain of the message attributes finite and group similar sequences. The resulting input sequences are called scenarios. These scenarios are in turn used to learn a probabilistic automaton using the BEAM algorithm [118].
The approach described above was applied with mixed success to a handful of X11 programs. Scenarios were obtained from the X library calls and callbacks from and to these programs. These scenarios were then verified to comply to the Inter-Client Conventions Manual (ICCM), which is a standard for interoperability between X Window System clients of the same X server. Out of the 16 programs analysed, five violated a rule in the ICCM. Two of these violations to the standard were caused by bugs in the implementation.
In 2011, Lee et al. observed that sequences of method calls observed from a system often deal with multiple independent receivers [91]. One such example is an sequence of interleaved calls to a library from two concurrent threads in a program. It is hard to learn something useful from interleaved sequences, because there are many interleavings possible. Moreover, there 46
might be no semantic relation between the different recipients. The authors observe that different recipients can be distinguished based on the values of certain message parameters. They propose jMiner, an effective and general- purpose passive learning approach for inferring automata from interleaved message traces.
jMiner works by first partitioning the sequence in a set of independent ones, based on the parameters of the messages. For this, it uses a message format specification that it infers from the source code, package name, and unit test cases. The independent sessions are then used to learn a FSM using the same off-the-shelf passive learner as Ammons et al. used. The authors have successfully applied jMiner to a set of interleaved message sequences from four packages of OpenJDK.
Yang et al. have applied passive learning for dynamically inferring func- tional specifications from method calls [151, 152]. These specifications can be seen as invariants for how the interaction with the interface behaves. A functional specification for a mutex, for example, might be that “mu- tex.acquire(X) is always followed by mutex.release(X)”. The inference engine of their Perracotta tool uses heuristics to generalize these rules into regular expressions. These regular expressions can be represented by a DFA.
Although the previously described tools can often be used to accurately describe behavior of a specific system, they sometimes fail to capture crucial dependencies between input parameters. As we have outlined in this introduction, automata that model the control flow of a system typically only provide a partial view of that protocol’s behaviour. In practice, behaviour is often the result of interplay between the input sequences (as described by the automaton), and the values of the parameters for these messages. Therefore, most recent work focusses on learning both these facets of behaviour.
Indeed, these facets require the automaton to operate on an underlying memory, and have its transitions annotated by guards on the memory. We have already seen a formalism that can do this: the register automaton.
Walkinshaw et al. have recently proposed a passive technique for learning a flavour of register automata from method calls [146]. The technique works by combining previously mentioned passive learning techniques—which infer the control flow from inputs—with a component that relates the input sequences to the data state of the system. The latter process is known
Chapter 1
in general as data classifier inference, and refers to a range of techniques that that map possible values for parameters to a particular class. In the case of model learning, we are interested in the next input that will follow. Therefore this is the class that the authors try to predict in their application of data classifier inference.
There exist a huge number of classifiers that can solve this task. In their experiments, Walkinshaw et al. observe that the choice of classifier ultimately depends on the application area and the context in which the automaton is used. Therefore, the authors have made an effort to enable the use of an arbitrary classifier. In their reference implementation, called Mint, they use the (fifty or so) classifiers that are in the WEKA library [66].
The approach described above was applied with mixed success to a communications protocol that allocates frequencies to mobile phones, an implementation of a ‘resource locker’, and three Java SDK classes. The authors have identified two characteristics for an ideal application scenario of the technique.
– From a data dependency point of view, an input should contain only parameters that have a direct bearing on the subsequent behaviour, and the number of these parameters would ideally be low for each input.
– From a control flow point of view, the number of elements that could possibly succeed a given input should ideally be low (to prevent incor- rect data-based classification), however all of these possible sequences should be represented in the set of observations.