We use the following notation to describe our technique. V is the set of all values (including special values represented by D-Models). C is the set of all control codes that
represents the returned values of statements (e.g., ‘RETURN’ or ‘BREAK’). S, E, and N
are the sets of all statements, expressions, and identifiers, respectively. Π is the set of all path constraints; each constraint is a propositional formula. P(S) is the powerset of a set S. We use small letters for elements of a set (e.g., s ∈ S is a statement).
Figure 3.3 shows the key evaluation rules. During symbolic execution, we maintain
a program state (V, π) where the value store V : N 7→ V is a (total) function mapping a variable/function name to its value (uninitialized variables have a ⊥ value), and the path constraint π encodes the branch decisions taken to reach the current state. For a statement s, a rule hhs, V, πii → hhc, V0, π0ii denotes that the execution of s changes
the program state from (V, π) to (V0, π0). The returned value c is a control code: It
returns‘OK’if there was no control-flow breaking instruction in s (i.e., the next sequential
statement can be executed) and other control codes (e.g., ‘RETURN’) otherwise. For an
expression e, a rule he, V, πi → hv, V0, π0idenotes that the evaluation of e results in a new program state and returns a (non-control) value v. We use addOutput to record a string or symbolic value in the output (under a path constraint). Other notation and auxiliary
functions are listed at the end of Figure 3.3. PhpSync processes PHP statements and
expressions as follows.
Variable access (rule 1). When a variable is accessed for a computation, PhpSync
looks up its value in the value store V.
Assignments (rule 2). PhpSync computes the value of the right-hand-side expres-
sion and updates the value store V with this new value of the variable on the left-hand
side of the assignment. The computed value is represented by a D-Model (or a Literal
Initialization: V(x) =⊥ π = TRUE 1. Variable Access: v = V(n) h$n, V, πi → hv, V, πi 2. Assignment:
he, V, πi → hv, V0, πi
h$n = e, V, πi → hv, V0[n 7→ v], πi 3. If Statement:
he, V, πi → hv, V0, πi π0 = whenEqual(v, TRUE) isSat(π ∧ π0) isSat(π ∧ ¬π0)
hhs1, V0, π ∧ π0ii → hhc1, V1, π ∧ π0ii hhs2, V0, π ∧ ¬π0ii → hhc2, V2, π ∧ ¬π0ii
V3(x) = select(π0, V1(x), V2(x))
hhif (e) s1 else s2, V, πii → hhselect(π0, c1, c2), V3, πii
he, V, πi → hv, V0, πi π0 = whenEqual(v, TRUE) ¬isSat(π ∧ ¬π0) hhs1, V0, πii → hhc1, V1, πii
hhif (e) s1 else s2, V, πii → hhc1, V1, πii
he, V, πi → hv, V0, πi π0 = whenEqual(v, TRUE) ¬isSat(π ∧ π0)
hhs2, V0, πii → hhc2, V2, πii
hhif (e) s1 else s2, V, πii → hhc2, V2, πii
4. Function Declaration:
λis a pointer to function n($n1, ..., $nm){s}
hhfunction n($n1, ..., $nm){s}, V, πii → hhOK, V[n 7→ λ], πii
5. Function Invocation:
λ = V0(n) λis a pointer to function n($n1, ..., $nm){s}
hei, Vi−1, πi → hvi, Vi, πi, ∀i ∈ [1..m] Vf(x) =
n
vi if x = ni
⊥ otherwise
hhs, Vf, πii → hhc, Vf0, πii
hn(e1, ..., em), V0, πi → hVf0(‘RET’), Vm, πi 6. Return Statement:
he, V, πi → hv, V0, πi
hhreturn e, V, πii → hhRETURN, V0[‘RET’ 7→ v], πii
7. Block of Statements:
hhs1, V, πii → hhc1, V1, πii π0= whenEqual(c1, OK)
isSat(π ∧ π0) hhs2, V1, π ∧ π0ii → hhc2, V2, π ∧ π0ii
V3(x) = select(π0, V2(x), V1(x))
hhs1s2, V, πii → hhselect(π0, c2, c1), V3, πii
8. While Statement:
hhif (e) {s while (e) s}, V, πii → hhc, V0, πii hhwhile (e) s, V, πii → hhc, V0, πii
9. Include Expression:
he, V, πi → hv, V1, πi hhs, V1, πii → hhc, V2, πii
s =
parseF ile(v) if v is a concrete value
empty statement otherwise
hhinclude e, V, πii → hhV2(‘RET’), V2, πii
10. Infix Expression:
he1, V, πi → hv1, V1, πi he2, V1, πi → hv2, V2, πi
v = concat(v1, v2) if op is concatenation symbolic(e1 op e2) otherwise he1 op e2, V, πi → hv, V2, πi 11. Echo Statement:
he, V, πi → hv, V0, πi addOutput(v, π)
hhecho e, V, πii → hhOK, V0, πii
Notation and auxiliary functions: - 7→ denotes total functions.
- g = f[x 7→ y] denotes a function same as f except that g(x) = y. - symbolic(e) returns a fresh symbolic value mapped to an expression e. - select(π, v1, v2) returns an alternative value of v1 or v2 depending on π.
- concat(v1, v2) returns a concatenation of v1 and v2.
- isSat(π) returns TRUE if π is satisfiable and FALSE otherwise. - whenEqual(v, v0) returns the constraint under which v equals v0, e.g.
whenEqual(select(α, TRUE, FALSE), TRUE) returns α,
whenEqual(select(α > 1, TRUE, FALSE), TRUE) returns (fresh) β. - parseF ile(v) parses a PHP file v and returns the parsed program. - addOutput(v, π) records value v under constraint π in the output.
Figure 3.3 (Continued)
$message = ;
( message ) = ( message ) =
( message ) = Welcome admin!
( message ) = Access denied.
( message ) = select(μ, Welcome admin! , Access denied. )
( message ) =
μ: $_GET[ user ] == admin
if ($_GET[ user ] == admin ) { $message = Welcome admin! ; } else {
$message = Access denied. ; } ... 1 2 3 4 5 6 7
Figure 3.4 Executing conditional statements
Conditional statements (rule 3). If the path constraints of both branches of anif
statement are satisfiable, we explore both branches. The function whenEqual(v,TRUE)is
used to compute the constraint where a value v (evaluated from theifcondition) evaluates
to TRUE. For example, the conditions on lines 2 and 7 of Figure 3.4 are both resolved
into α == ‘admin’ where α is the symbolic value for $_GET[’user’]; thus, we evaluate the
both conditions into the same (fresh) symbolic value µ (to simplify constraint checking). Modifications to the value store V take effect in the corresponding branch only. After executing the branches, we update the value store with the combined values from the two branches together with their corresponding constraints. To represent that a variable
may have multiple values depending on a path constraint, we use a D-Model Select
node (select(π, v1, v2)) to represent a selection between value v1 if the path constraint π evaluates toTRUEand v2 otherwise (we also useSelectfor control codes). Note that if the
path constraint of one of the branches is unsatisfiable, we execute the other (satisfiable) branch only. As an illustration, in Figure 3.4, the variable $message after line 6 has two
alternative values from both branches.
Functions (rules 4–6). Similar to concrete execution, PhpSync evaluates a func-
value): First, it sets up a new context/call stack for the function and passes the actual parameters to the formal parameters of the function. Second, it executes the function
body and records all returned values (here represented by a special variable named‘RET’)
encountered when exploring different paths in the function. Third, the returned value(s) are propagated to the call site of the function. The details are shown in rules 4–6. Note that for scalability, PhpSync does not execute a recursive function call.
Block of statements (rule 7). In a block of statements, the returned control
code after executing a statement can be ‘OK’, indicating that the next statement can be
executed, or other control codes otherwise (e.g.,‘RETURN’ for areturn statement). Note
that the returned control code can also represented by a D-Model Select node (e.g., the
returned code of anif statement). Therefore, after each statement, we compute the path
constraint under which the next statements can be executed (i.e., the constraint with
which the returned control code equals ‘OK’) and execute them under that restricted
constraint. After executing the block, we update the value store similarly to the case
of an if statement. If the computed constraint is not satisfiable, we simply stop the
execution for that block (not shown). In rule 7, we show the algorithm for a block of two statements. Note that the rule for a block with more statements can be generalized from this rule.
Loops (rule 8). We execute a loop by modeling the loop as recursively nested if
statements with the same condition and body code. We first applying the rule for an if
statement to execute the first iteration and then recursively execute the next iterations in
the same manner. If the loop contains control-flow breaking instructions (such as break,
continue, orexit), we either abort the loop (forbreak,return, andexit) or continue the next
iteration (for continue) in their respective constraints (not shown). For scalability, we
typically limit the number of iterations at one (i.e., the loop terminates after at most one iteration).
Dynamically included files (rule 9). A PHP program can dynamically include other files. During symbolic execution, we execute these files if the file names can be
resolved to concrete values. Since include is an expression in PHP, we treat the returned
value of include similarly to the returned value of a function call.
Computing the output (rules 10–11). The output of a PHP program is usually
a concatenation of multiple string values and is printed out throughecho/printstatements
or inline HTML code. To keep track of concatenations, we use a D-Model Concat node
(concat(v1, v2)) to represent a concatenation of two (possibly symbolic) values v1 and v2
(rule 10). Atecho/printstatements or inline HTML code, we simply record the computed
value v of expression e for the output in the corresponding path constraint (rule 11).
(Anecho e statement is a concatenation of the value of e with the current output, i.e. it
can be treated as an equivalent assignment $OUTPUT = $OUTPUT . e, where $OUTPUT
is a special variable representing the current output.) The use of Concatvalues andSelect
values allows us to track the symbolic output with conditional fragments efficiently.
Limitations. We have made several design strategies to our symbolic-execution
engine so that it can scale when computing all possible outputs of a PHP web application. Because of these simplifications, the engine has several limitations. Currently, we handle common PHP constructs and functions only since the PHP APIs are large. Specifically,
(1) we have limited support for control-breaking statements (e.g.,BREAK,RETURN,EXIT)
or operations with objects and arrays in the presence of symbolic or conditional values, and (2) we implement infix expressions with the concatenation operator only since we are interested in the string output of a program (for other operators, we create fresh symbolic values to represent the results—for instance, we track α > 1 as a new symbolic value
β). Because of the conservative approximations with symbolic values and the limitations
of external constraint solvers (especially with strings and objects/arrays), the engine may explore some infeasible paths. It runs exactly one iteration of each loop and skips
recursive function calls. Our approach also does not handle well library function calls in the presence of symbolic and conditional values since their source code is unavailable. Similarly, if the currently executed file invokes another file (by using the PHP expression
include) and the expression for the file is resolved to a symbolic value, the engine is
unable to execute that file. Because of these limitations, the symbolic-execution engine is unsound and incomplete. Nonetheless, our design strategies allow the engine to scale to real-world web applications.