• No se han encontrado resultados

Gens regulats per LexA1

3. Resultats

3.5. La resposta global SOS de P. putida

3.5.1. Gens regulats per LexA1

As part of its basic design, DSA includes several features that are required for scalability. For example, the use of unification solves the exponential explosion inherent in cloning in practice.

Additionally, processing SCC’s in the call graph eliminates the need for iteration inside of SCC’s.

Other factors are less obvious. In particular, because the local phase is the only part of DSA that uses the compiler IR (all other phases perform graph transformations on DS Graphs), DSA has better cache behavior than analyses that need to keep the pointer representation and the compiler IR in cache.

These design choices are some of the keys to achieving practical analyses, and can reduce analysis times by several orders of magnitude. In addition to these key design choices, this section lists several important engineering issues which can also improve analysis times in important cases, primarily by improving handling of global variables and by reducing N2 behavior in important cases.

3.3.1 The Globals Graph

One reason the DS graph representation is so compact is that each function graph need only contain the memory reachable from that function. However, Figures 3.7(c) and 3.10 illustrate a fundamental violation of this strength. In both of these graphs, the global variable G makes an appearance even though it is not directly referenced and no edges target it. Such nodes cannot simply be deleted

because they may have to be merged with other nodes in callers or callees of each function. If left untreated, all global variables defined in the program would propagate bottom-up to main, then top-down to all functions in the program. This trivially balloons the size of each graph to include every global variable in the program, a potential O(N2) size explosion.

In order to prevent this unacceptable behavior, our implementation uses a separate “Globals Graph” to hold information about global nodes and all nodes reachable from global nodes. This allows us to remove global variables from a function’s graph if they are not used in the current function (even though they may be used in callers or callees of that function). For example, this eliminates the two G nodes in the example graphs7.

For the steps below, all nodes reachable from virtual registers (which includes formal parameters and return values of the current function, and call node arguments within the current function, but not globals) are considered to be locally used. Call nodes are also considered to be locally used, unless they contain a callee that is an external function (and thus will never be resolved).

More specifically, we make the following changes to the algorithm:

• In the BU phase (respectively, TD phase), after all known callees (respectively, callers) have been incorporated in step 4, we copy and merge in the nodes from the globals graph for every global G that has a node in the current graph, plus any nodes reachable from such nodes.

This ensures that the current graph reflects all known information about such globals from other functions.

• After step 5 in the BU phase, we copy all global nodes and nodes reachable from such nodes into the globals graph, merging the global nodes with the corresponding nodes already in the Globals Graph, if any (which will cause other “corresponding” nodes to be merged as well). We clear the Stack markers on nodes being copied into the Globals Graph, for the same reason as in ResolveCallee. We also clear the Complete markers since those markers will be re-computed correctly within the context of each function.

By the end of the BU phase, all the known behavior about globals will be reflected in the Globals Graph. Therefore, globals do not need to be copied from the TD graph to the Globals graph in the TD phase.

7Liang and Harrold [92] use a somewhat similar technique.

• In step 6 of the BU phase, we identify global nodes that are not reachable from any locally used nodes and do not reach any such nodes. The latter requirement is necessary because we may revisit the current function later, resolving previously unresolved call sites, which can bring in additional globals. Merging such globals will not correctly merge other reachable nodes in the graph if a global that can reach a locally reachable node is removed from the graph. The latter requirement is not needed for the TD phase since no further inlining needs to happen after reaching step 6. We simply drop all these identified nodes from the BU or TD graph for the function.

In practice, we find that the Globals graph to make a remarkable difference in running time for global-intensive programs, speeding up the top-down phase by an order of magnitude or more.

3.3.2 Efficient Graph Inlining

Our first implementation of DSA used a very simple implementation of the graph inlining operation described in Section 3.2.1. To inline a callee graph into a caller graph (for example), it literally made a copy of the callee graph into the caller graph, then used unification to perform the merge (this algorithm is listed as the cloneGraphInto operation in Figure 3.4). The merge simply unifies each of the linked nodes between the caller and callee: this includes the formal/actual argument bindings as well as any global variables that are common to the two graphs.

This implementation is inefficient for several reasons. First, this operation copies nodes that are not reachable in the caller graph (e.g. for stack allocations in the callee or local data structures), requiring an “unreachable node elimination” cleanup pass to get rid of them. Second, copying nodes only to unify them away is a gross waste of time. Third, unification uses a union-find approach which does not immediately free a node when it is unified. In particular, all nodes referring to a unified node need to have their references updated (lazily), which means the nodes that are copied may last far longer than we would like (consuming memory).

To solve these problems, our implementation uses a parallel recursive traversal of the caller and callee graphs starting from each matching pair of callee and caller nodes. For each pair of nodes traversed, we merge information from the callee node into the caller node (which may involve merging or collapsing nodes in the caller graph). If no caller node corresponds to the callee node,

nodes are lazily (recursively) created. Nodes that exist in the caller but not the callee do not require recursive traversal.

This approach solves all of the problems with the naive implementation: 1) only reachable nodes are copied. 2) the only new nodes created are those that exist in the callee graph but not in the caller graph. 3) The dead nodes are never created, so they do not use memory or time.

3.3.3 Partitioning EV for Efficient Global Variable Iteration

The EV mapping described in Section 3.1 contains all of the scalar pointers in the graph as well as the addresses of all globals. This mapping is used primarily by clients of the analysis (e.g. to find out which node a pointer points to), but is also used by various phases of the analysis (e.g. to find the formal arguments for a function when inlining a graph). In programs with large SCCs (and thus many functions merged into the same DS graph), this mapping can be very large.

Several portions of the DSA algorithm need access to all of the global variables that exist in a DSGraph (e.g. updating the globals graph, and performing graph inlining operations). Our initial implementation iterated through the EV to find the globals used in a graph, which suffered due to the large size of EV (while clients use constant-time hash-table lookups, iteration takes linear time).

Our solution is to partition EV into two mappings, one for scalar pointers and one to represent the address of globals. This allows direct iteration over just the information needed, yielding a large speedup on big codes with large call graph SCCs or many pointer variables.

3.3.4 Shrinking EV with Global Value Equivalence Classes

Even with the refinements described in Section 3.3.1 and Section 3.3.3, program that use extremely large tables of global variable pointers can cause a problem. In particular, consider a program that contains the (very reasonable and not uncommon) C code shown in Figure 3.11. The figure also shows the LLVM code it expands into.

At the LLVM level, each constant string is lowered to a different global variable which is initialized with the string constant, “strGV n” in our example (See Section 2.4.1). The “StringArr”

global is an array that points to all of these globals, and DSA will represent this configuration with

const char ∗ const String A rr [ ] = {

Figure 3.11: C Source, DSGraph, and LLVM code for Global Value Equivalence Class Example

the graph shown on the right side of Figure 3.11.

Given the operation of the Globals Graph, many functions that either directly or indirectly use StringArr will have a copy of this graph in their per-function graphs. Unfortunately, this means that each of those graphs must also have EV entries for each of the (potentially thousands) globals that are merged into the string constant node. These extra entries slow down any analyses that need to iterate over globals in the graph and require extra memory to represent. Finally, note that DSA will never be able to distinguish between the strGV * nodes in the graph.

The solution we use for this problem is to maintain an equivalence class of global value ad-dresses, merging these equivalence classes (maintained with Tarjan’s union-find algorithm) when DSA merges nodes corresponding to multiple globals. With this refinement, DSA need only keep the leader of an equivalence class in the graphs. The interface used to query the DSGraphs auto-matically return the full set of globals in the equivalence class, permitting clients to be unaware of this implementation detail. In practice, we find that this straight-forward refinement can cut DSA runtimes by a 30% and reduce memory usage by 50% for large programs (such as 176.gcc and 253.perlbmk).

3.3.5 Avoiding N2 Inlining for Function Pointers

With a straight-forward implementation, large tables of function pointers cause cause an efficiency problem for both the bottom-up and top-down analysis phases. The problem is that any call through the table can reach N callees, and programs with tables often have a large number of calls through them. Because of this, the BU and TD passes have to inline all N graphs M times (one for each call through the table), which takes N ∗ M time. In practice, this time can be unacceptably large for programs with hundreds of function pointers in a table.

Our solution to this problem is to keep a graph cache of all sets of function pointers inlined.

For example, in the BU phase, every time a call site with more than one callee needs to be inlined, the cache is queried. If there is no entry for this set of callees, a new DSGraph is allocated, all of the callee graphs are inlined into it, all formals and globals are merged, and the new graph is added to the cache. Finally, whether the graph was in the cache or not, the graph (which now represents the effects of all callees) is inlined into the caller graph. This makes the first inline operation for a set of callees slightly more expensive for the benefit of subsequent inline operations with the same set of callees.

In the best case, instead of performing N∗M graph inlining operations, the BU-pass now needs to perform N + M + 1 graph inlining operations, a substantial improvement. In the worst case, entries in the cache are never reused, which adds one extra graph inline operation to a call site with many callees. In practice, this refinement is extremely important for certain classes of large programs.

3.3.6 Merge Call Nodes for External Functions

One simple observation is that any nodes reachable from an unrecognized external function call will always be marked incomplete. Because of this, no DSA client will be able to do any substantial analysis or transformation of these nodes. There are several ways to use this to shrink graphs: the compiler could simply merge all nodes reachable from any external function call.

For our implementation, we considered this too drastic: it eliminates the possibility of perform-ing modular analysis (e.g. analyze a library, generate DS Graphs for it, then use these precomputed graphs when compiling the main application). As a compromise, our implementation merges call

nodes for external calls to the same function: this discards some amount of context sensitivity, but does not grossly pessimize the points-to information for external function calls8.

In practice, we find that this can greatly reduce the number of nodes to common functions like printf, which often have globals (constant strings) passed as arguments. With this refinement, there is at most one node for printf format strings (per function), which contains all of the format strings in that context.

3.3.7 Direct Call Nodes

The final, and most simple, refinement is based on the observation that direct function calls are far more common than indirect function calls. As such, our representation of call nodes allows either a callee node (as described above) or a callee function to be specified for the call. In the case of direct function calls, this eliminates the need to allocate a DSNode to represent the callee of direct calls. In the case of indirect calls, a node is used to allow lazy resolution and multiple callees to be represented.