• No se han encontrado resultados

2.2.3

Features of Processor Languages

Although each processor architecture has a unique language, all processor languages have two basic features. The first, that every processor instruction is a state transformer: a state is pro- duced by changing the values of one or more variables, possibly using the initial values of the variables to determine the changes to be made. Every change made to a state can be described as the simultaneous assignment of values to variables: if state

s

differs from state

t

in the values of variables

x

1

;::: ;x

n then

s

can be transformed to

t

by a simultaneous replacement of the values of

x

1

;::: ;x

n in

s

with the values of

x

1

;::: ;x

n in

t

. Every processor instruction is therefore be an instance of conditional and simultaneous assignment commands. Processor languages also in- clude the equivalent of pointers (the expressions implementing addressing modes); unexecutable assignments are detected when an instruction is executed, allowing variables identified by point- ers to be determined before they are compared. The second feature of processor languages is that the selection of instructions for execution is an action carried out by the instructions. In the execution model of object code programs, the flow of control is determined when instructions are executed rather than by the text of the program.

2.3

Modelling Object Code

To verify an object code program, the program and each instruction must be described in a form which can be reasoned about in a logic. To support the definition of program transformations, it must also be possible to manipulate programs and instructions. Methods of describing object code vary in the approach taken to the processor language in which the program is written. This affects the difficulty of verifying and manipulating a program.

A common approach is to consider processor languages and their instructions individually (Gordon, 1988; Yuan Yu, 1992; Necula & Lee, 1996): each instruction is considered to be distinct from any other instruction. To verify object code using this approach requires a proof rule for each instruction of a processor language. Because processors have a large number of instructions, treating each instruction as distinct from any other instruction leads to the repetition of a large amount of work. For example, proof rules defined for the instructions of one processor cannot be applied to the instructions of another processor, even if the instructions have similar behaviours. This approach also makes program transformations dependent on the individual processor language and can unnecessarily limit the possible transformations. For example, if a program has a sequence of two instructions which carry out the assignments

x

:=1and

y

:= 2 then the program could be simplified by replacing the instructions with the single assignment

x;y

:= 1

;

2. This transformation is not possible if the processor language does not include an instruction which carries out the multiple assignment to

x

and

y

.

An alternative approach is to describe processor instructions as commands of an abstract lan- guage. This allows the similarities in the semantics of instructions to be exploited, describing a specific action, such as an assignment to a register, as an instance of a more general action, such as a simultaneous assignment to variables. This approach also supports the definition and

2.3 Modelling Object Code 22

application of program transformations. Since the abstract language is not limited by a processor architecture, it can be defined to include any construct useful for the verification or transforma- tion of programs. The use of an abstract language is similar to the use of intermediate languages to implement code generation and optimisation techniques (Aho et al., 1986). However, the com- mands of these intermediate languages only describe the actions which must be implemented by instructions. To describe an instruction in an intermediate language would need a sequence of such commands. This is not expressive enough for verification, where (ideally) a single com- mand of the abstract language will be enough to describe the behaviour of any instruction.

2.3.1

Instructions and Data Operations

To describe processor instructions, an abstract language must describe both the data operations used by the instruction and the action performed with the results of the operations. The data operations of a processor calculate values, the labels of computed jumps, the results of tests or identify program variables. All but the last can be described in terms of the expressions which occur in any programming language (see for example Wakerly, 1989). Operations to identify program variables implement the addressing modes of a processor. These are equivalent to pointers and can be modelled in terms of arrays. Formal models of arrays and the operators (including substitution) needed to reason about arrays in programs are described by Dijkstra (1976) and Cartwright & Oppen (1981), among others.

To describe the action performed by an arbitrary processor instruction, an abstract language need only include a conditional and a simultaneous assignment command. However, a simulta- neous assignment in the presence of pointers (of arrays) means that the abstract language will contain unexecutable commands, which assign many values to a single variable. It is necessary to be able to detect these commands since otherwise it is possible to verify an program which is incorrect; e.g. for partial correctness, an unexecutable command satisfies any specification. The aliasing problem means that it is impossible to distinguish syntactically between executable and unexecutable commands. If the abstract language includes both simultaneous assignments and pointers then the abstract language must also include unexecutable commands.

Unexecutable commands can be excluded if the assignment commands are restricted to a sin- gle assignment. An instruction would then be described as a sequence of (possibly conditional) single assignments to variables, which may be pointers or arrays. This approach is used in pro- cessor reference manuals (e.g. see Weaver and Germond, 1994 or Motorola, 1986) to describe the semantics of instructions. For example, the Motorola 68000 instructionmove

:

w#258

;

r@would be described as the sequence of three assignment commands Mem(r) := 1, Mem(r+1) := 2 and pc:=

l

. Proof rules for the necessary commands and syntactic constructs forming sequences of commands are given by Hoare (1969) and Dijkstra (1976) and others. However, this approach makes the task of verifying the program more difficult: the number of commands needed to de- scribe an instruction will be proportional to the number of assignments made by the instruction. If each instruction of an object code program makes an average of two assignments to variables then the abstract program describing the object code will have twice as many commands as the

2.3 Modelling Object Code 23

object code. The work needed to verify the program will be correspondingly increased.

To describe an instruction using a single command, the abstract language must include a si- multaneous assignment command which takes into account the presence of pointers. The assign- ment command will be executable only if all variables assigned to by the command are distinct. The variables identified by pointers must be determined when the command begins execution. The distinctness of the variables will therefore be a precondition on the state in which the as- signment command is executed. This approach is used by Cartwright & Oppen (1981) to define a multiple assignment command for arrays. However, this assignment command is interpreted as a sequence of single assignments, each of which must be considered individually in a proof of correctness. Generalising the approach to simultaneous assignments will make it possible for an abstract language to describe arbitrary instructions as a single command and to detect the unexecutable commands during the course of a proof.

2.3.2

Program Execution

The description of an object code program must model the selection and execution of the com- mands representing the program instructions. Two approaches are commonly used: the first embeds the object code in an iteration command, which repeatedly selects and executes com- mands. The second describes object code as a program of a language with a similar execution model. Both methods are intended to overcome the difficulties of reasoning about a program in which commands can be arbitrarily selected for execution. However, the choice of method can affect the ease which programs are transformed and verified.

Embedding in an Iteration Command

The execution model of an object code program can be describing in terms of an iteration command which repeatedly executes instructions selected by the value of a program counter (Back et al., 1994; Fidge, 1997). Assume do

b

1

!

c

1

j

:::

j

b

n !

c

n od is an iteration command which repeatedly selects and executes the commands

c

1

;::: ;c

n, until every test

b

i, 1

i

n

is false. A command

c

i is selected if its test

b

i is true, if more than one test is true then the choice is arbitrary (Gries, 1981). Also assume that every instruction

C

i of the object code program, with label

l

i, is described by a command

c

i of the abstract language with test

pc =

l

i. Since each instruction is stored in a unique location, and has a unique label, no two tests can be true simultaneously. The object code program can then be modelled as the command

do pc =

l

1

!

c

1

j

:::

j pc =

l

n !

c

n od. This executes a command

c

i only if it is selected, the precondition pc =

l

i is true. Proof rules for reasoning about the selection and execution of instructions and for the behaviour of a program can be derived from the rules for the iteration command, given by Hoare (1969) and Dijkstra (1976).

Embedding the object code in an iteration command models object code in two distinct parts. Instructions are modelled directly, as commands of the language, while object code programs are modelled indirectly, by the iteration command. To consider a single instruction of the object code

2.3 Modelling Object Code 24

program it is necessary to consider the whole of the iteration command, which is made up of all instructions of the object code. This makes it difficult to concentrate on a subset of a program. For example, to consider a loop in an object code program, which may be made up of a few instructions, it is necessary to consider the entire iteration command describing the object code. The number of commands in a typical object code program make this approach too unwieldy to be practical

Object Code as Flow-Graph Programs

A more natural model for object code is as a program of a flow-graph language (Loeckx & Sieber, 1987). A flow-graph program is made up of a set of commands and is executed by the repeated selection and execution of the commands. Since the execution model of a processor language is that of a flow-graph language, a flow-graph program can model object code directly. Program logics for flow-graph programs are often based on a temporal logic (Manna & Pnueli, 1981) although simpler logical systems have also been used (e.g see Loeckx & Sieber, 1987; Gordon, 1994a). However, the flow-graph languages which have been considered in verification are less expressive than processor languages. This causes problems when defining proof rules to specify the execution model of processor languages. These rules must permit reasoning about the transfer of control between commands and are characterised by the treatment of a jump command, written goto, where goto

l

does nothing except pass control to the command labelled

l

.

Proof rules for jump commands generally follow those of Clint & Hoare (1972) and de Bruin (1981). These interpret the jump as a construct which passes control to a target but which does not terminate: the jump to label

l

, goto

l

, satisfies the specificationf

P

ggoto

l

ffalsegfor any

P

. This leads to proof rules for partial correctness only, total correctness requires the goto to ter- minate. The interpretation is false for processor languages, in which all instructions terminate: if goto

l

describes a processor instruction then it must satisfyf

P

ggoto

l

fpc =

l

g. An alterna- tive interpretation is given by Jifeng He (1983), based on a generalised wp function gwp. This requires that a command both terminates, establishing a postcondition

Q

, and passes control to a specified label. However, the gwp function separates the flow of control from the program variables, which include the program counter pc. As a consequence the goto command satisfies the specification: pc 6=

l

) gwp(goto

l;

pc 6=

l

). This specification of a jump is false for pro- cessor languages: a jump instruction must assign the label of the target to the program counter, satisfying gwp(goto

l;

pc=

l

).

A simple approach to modelling and specifying object code programs can be based on the use of the program counter pc to select instructions. Assume that command

c

describes the action of an instruction at label

l

. The instruction is selected when pc=

l

, the command

c

must therefore be associated with label

l

. Assume that(

l

:

c

) labels command

c

with

l

and that the weakest precondition of (

l

:

c

) satisfies wp(

l

:

c;Q

) ) pc =

l

. This is enough to model the selection of the instruction. Because the program counter pc is a variable, a jump instruction only needs to make an assignment to pc. A jump command is therefore an instance of an assignment