• No se han encontrado resultados

To be put in a broader context, our research can be viewed as an instance of providing run- time safety guarantee for meta-programming [88]. Macros are a very old and established meta-programming technique; this was perhaps the first setting where the issue of correct- ness of generated code arose. Powerful macro languages comprise a complete programming facility, which enable macro programmers to create complex meta-programs that control macro-expansion and generate code in the target language. Here, basic syntactic correct- ness, let alone semantic properties, of the generated code cannot be taken for granted, and only limited static checking of such meta-programs is available. The levels of static checking available include none, syntactic, hygienic, and type checking. The widely used cppmacro pre-processor allows programmers to manipulate and generate arbitrary textual strings, and it provides no syntactic or semantic checking. The programmable syntax macros of Weise and Crew [97] work at the level of correct abstract-syntax tree (AST) fragments, and

2.7. Related Work 35

guarantee that generated code is syntactically correct with respect (specifically) to the C language. Weise and Crew macros are validated via standard type checking: static type checking guarantees that AST fragments (e.g., Expressions, Statements, etc.) are used ap- propriately in macro meta-programs. Because macros insert program fragments into new locations, they risk “capturing” variable names unexpectedly. Preventing variable capture is called hygiene. Hygienic macro expansion algorithms, beginning with Kohlbecker et al. [51] provide hygiene guarantees. Recent work, such as that of Taha & Sheard [88], focuses on designing type checking of object-programs into functional meta-programming languages. A number of other proposals provide type-safe APIs for dynamic SQL, including, for ex- ample Safe Query Objects [14], SQL DOM [60], and Xen [5, 62]. These proposals suggest better programming models, but require programmers to learn a new API. In contrast, our approach does not introduce a new API, and it is suited to address the problems in the enormous number of programs that use existing database APIs. Other research efforts focus on type-checking polylingual systems [25, 29], but they do not deal with applications interfacing with databases such as web applications.

36

Chapter 3

Static Analysis for SQL Injection

In Chapter 1 we argued that SQL injection attacks are a common and significant problem, and in Chapter 2 we presented a formal definition of SQL injection attacks and a runtime technique to prevent them. This technique is effective for each place where it is used, but it does not guarantee that every query site is protected. Static analysis can guarantee that all query sites are protected, and it has other advantages (discussed below). This chapter presents a sound static analysis for finding SQL injection vulnerabilities based on the definition presented in the last chapter. This static analysis scales to large, real-world codes bases and has a low false-positive rate. Our evaluation of our implementation revealed previously unknown SQL injection vulnerabilities in real-world code.

3.1

Introduction

Our approach for runtime enforcement prevents SQL injection effectively in deployed soft- ware, but static approaches are desirable during software development and testing for three reasons. First, a single programming error often manifests itself as multiple different bugs, so statically verifying code to be free from one kind of error (e.g., static type checking) helps to reduce the risk of other errors. Second, the overhead that general techniques incur significantly exceeds the overhead of appropriate, well-placed checks on untrusted input.

3.1. Introduction 37

Even if the network latency dominates the overhead of a runtime check for a single user, the added overhead can prevent a server from functioning effectively under a heavy load of requests. Finally, some runtime techniques [70, 73] require a modified runtime system, which constitutes a practical limitation in terms of deployment and upgrading.

Static analyses to find SQL command injection vulnerabilities (SQLCIVs) have also been proposed, but none of them runs without user intervention and can guarantee the absence of SQLCIVs. String analysis-based techniques [12, 64] use formal languages to characterize conservatively the set of values a string variable may assume at runtime. They do not track the source of string values, so they require a specification, in the form of a regular expression, for each query-generating point or hotspot in the program—a tedious and error-prone task that few programmers are willing to do. Static taint analyses [43, 59, 99] track the flow of tainted (i.e., untrusted) values through a program and require that no tainted values flow into hotspots. Because they use a binary classification for data (tainted or untainted), they classify functions as either being sanitizers (i.e., all return values are untainted) or being security irrelevant. Because the policy that these techniques check is context-agnostic, it cannot guarantee the absence of SQLCIVs without being overly conservative. For example, if the escape quotes function (which precedes quotes with an “escaping” character so that they will be interpreted as character literals and not as string delimiters) is considered a sanitizer, an SQLCIV exists but would not be found in an application that constructs a query using escaped input to supply an expected numeric value, which need not be delimited by quotes. Additionally, static taint analyses for PHP, a language that is widely used for web applications and is ranked fourth on the TIOBE programming community index [9], typically require user assistance to resolve dynamic includes (a construct in which the name of the included file is generated dynamically).

This chapter proposes a sound, automated static analysis algorithm to overcome the lim- itations described above. It is grammar-based; we model sets of string values as context free grammars (CFGs) and string operations as language transducers following Minamide [64]. This string analysis-based approach tracks the effects of string operations and retains the