The literature has thoroughly studied the computation and use of alias and mod/ref information.
See, e.g., [73], for a survey of some of the available work in the field. In this section, we describe the context for this work and the assumptions we make. All of the alias analysis implementations described in this chapter are built in and follow the conventions of the LLVM Alias Analysis Framework [85].
Note that, in the LLVM compiler, all automatic (stack) scalar variables that do not have their address taken are promoted to SSA values, and are thus are not candidates for alias analysis (it is not possible to take the address of an SSA register). In LLVM, there are four operations that access memory: load, store, call, and invoke. See Section 2.2 for more details.
Alias analysis and mod/ref information are typically used by two very different forms of clients:
optimizations and safety checking/program understanding tools. The two types of clients are char-acterized by how they use the resulting information and their tolerance for errors. An optimizing compiler requires the the pointer analysis be safe (i.e., it returns conservative information) while a checking and program understanding tools generally do not. Because the primary focus of this thesis is for program optimization, all analyses described and evaluated here (including DSA) are conservatively correct: If they cannot determine, for all executions of the program, that a statement is true, it does not assert it. For example, if it cannot prove that two pointers will never alias, it must return “MayAlias” (defined below).
4.1.1 Alias Analysis Assumptions and Applications
Alias analysis, in this context, is a static compiler analysis which performs some amount of up-front inspection of the program, builds data structures to summarize its results, then answers queries of the form “alias(P1, S1, P2, S2)”, where P1 and P2 are pointers in the program and S1 and S2 are constant integers, which represent the size in bytes of the target of each pointer. This query can return one of three results:
• MustAlias: P1 is always exactly equal to P2.
• NoAlias: The two ranges [P1...P1+ S1) and [P2...P2+ S2) never overlap.
• MayAlias: The analysis can not prove that the result is either MustAlias or NoAlias (i.e., the ranges might overlap).
Alias analysis can support a wide variety of different clients, including devirtualization, common subexpression elimination, scalar promotion, etc. (even optimizations as simple as transforming memmove calls to memcpy calls if the source and destination ranges can never overlap). Figure 4.1 gives two examples to demonstrate how alias analysis can be used to prove the safety of redundant load elimination (a form of Common-Subexpression Elimination) and load hoisting (a form of Loop Invariant Code Motion). In Figure 4.1 (a) and (c), if an alias analysis can guarantee that P1 and P2 can never alias, CSE and LICM can transform the examples into the code in Figure 4.1 (b) and (d) respectively, which execute fewer dynamic loads from P1.
t 1 = ∗P1 ;
Figure 4.1: Results of Example Pointer Analysis Clients 4.1.2 Mod/Ref Analysis Assumptions and Applications
Like alias analysis, mod/ref analysis is a well studied static compiler analysis which performs an up-front analysis, then responds to some number of client analyses. Our implementation supports two forms of mod/ref query. The first query is of the form “modref(I1, I2)”, where I1 and I2 are two primitive operations in the program. This query can return one of several forms of dependence between the two operations, and supports general call/call mod/ref information, but is not described in detail for this work.
The second query is of the form “modref(I, P , S)”, where I is a primitive operation, P is a pointer in the program, and S is a constant integer size. This query can return one of four possible results:
• NoModRef: I does not access the memory defined by the range [P...P + S).
• Ref: I1 might read the range [P...P + S), but is guaranteed to not modify it.
• Mod: I1 might modify the range [P...P + S), but is guaranteed to not read it.
• ModRef: I1 might modify or read the range [P...P + S).
Mod/ref information can be used for a variety of purposes, such as dead store elimination, program slicing, and redundancy elimination. When used for redundancy elimination, mod/ref information is strictly more general than alias analysis information, as it allows the client to query about the mod/ref effect of function calls. Figure 4.2 gives two examples where mod/ref information for function calls allows the elimination of a potentially redundant load and the hoisting of a
t 1 = ∗P1 ;
Figure 4.2: Example clients of mod/ref results
potentially loop invariant load from a loop. If the mod/ref analysis can prove that ‘func’ never modifies P1 (i.e. the modref query returns NoModRef or Ref), it is legal for CSE to optimize (a) to (c) and LICM to optimize (b) to (d).
While computation and use of mod/ref information have been investigated in the literature, context-sensitive analyses tend to either be limited to cases with very simple aliasing [11, 36, 35]
or too slow for practical use [83, 32, 130, 107, 97]. Because of this, use of context-sensitive mod/ref analyses (which permits aliasing) has largely been unattractive for inclusion in a commercial-grade compiler. Because DSA is very efficient and can directly provide context-sensitive mod/ref infor-mation, we feel is very important to consider it.
Note that mod/ref information nicely encompasses several ad-hoc optimizations performed by many compilers (e.g. optimizing “pure” and “const” functions, which do not access memory or only read memory), simplifies the implementation of many clients, and is more general than using traditional alias queries for many clients (such as redundancy elimination).
Note that it is possible to use a context-sensitive interprocedural data flow analysis post-pass to construct context-sensitive mod/ref information from a non-context-sensitive alias analysis [116], but we have not implemented and do not evaluate this option here.