• No se han encontrado resultados

In order to evaluate alias analysis precision, the AA-EVAL client iterates over each function in the program. Within each function, AA-EVAL builds a set of pointers that are used by the various memory accesses in the body of the function (e.g. by load and store instructions). Given this set of instructions, it does a simple O(N2) alias query of every pointer against all of the others4 and counts the alias responses. Because the MayAlias response is the only response that indicates lack of information, an analysis with a lower may alias response percentage is more precise than one with a higher percentage of may alias responses.

This portion of the AA-EVAL client produces a metric that is very similar to the “alias frequen-cies” described in [42]. The primary difference between that work and this evaluation is that they

4Because alias relations are symmetric [(alias(X, Y ) = alias(Y , X)] and a pointer always must-aliases itself [alias(Z, Z) = MustAlias], AA-EVAL only performs N2/2 queries.

0%

181.mcf 256.bzip2 164.gzip 175.vpr 197.parser 186.crafty 300.twolf 255.vortex 254.gap 252.eon 253.perlbmk 176.gcc 179.art 183.equake 171.swim 172.mgrid 168.wupwise 173.applu 188.ammp 177.mesa 129.compress 130.li 124.m88ksim 132.ijpeg 099.go 134.perl 147.vortex 126.gcc 102.swim 101.tomcatv 107.mgrid 145.fpppp 104.hydro2d 110.applu 103.su2cor 146.wave5 fpgrowth boxed-sim NAMD povray31

local steens-fi steens-fs anders ds-aa

Figure 4.3: Percent of AA-EVAL Alias Queries Returned “May Alias”

0%

181.mcf 256.bzip2 164.gzip 175.vpr 197.parser 186.crafty 300.twolf 255.vortex 254.gap 252.eon 253.perlbmk 176.gcc 179.art 183.equake 171.swim 172.mgrid 168.wupwise 173.applu 188.ammp 177.mesa 129.compress 130.li 124.m88ksim 132.ijpeg 099.go 134.perl 147.vortex 126.gcc 102.swim 101.tomcatv 107.mgrid 145.fpppp 104.hydro2d 110.applu 103.su2cor 146.wave5 fpgrowth boxed-sim NAMD povray31

local steens-fi steens-fs anders ds-aa

Figure 4.4: AA-EVAL Mod/Ref Query Responses of “May Mod or Ref”

0%

181.mcf 256.bzip2 164.gzip 175.vpr 197.parser 186.crafty 300.twolf 255.vortex 254.gap 252.eon 253.perlbmk 176.gcc 179.art 183.equake 171.swim 172.mgrid 168.wupwise 173.applu 188.ammp 177.mesa 129.compress 130.li 124.m88ksim 132.ijpeg 099.go 134.perl 147.vortex 126.gcc 102.swim 101.tomcatv 107.mgrid 145.fpppp 104.hydro2d 110.applu 103.su2cor 146.wave5 fpgrowth boxed-sim NAMD povray31

local steens-fi steens-fs anders ds-aa

Figure 4.5: AA-EVAL Mod/Ref Query Responses of “No Mod or Ref”

0%

181.mcf 256.bzip2 164.gzip 175.vpr 197.parser 186.crafty 300.twolf 255.vortex 254.gap 252.eon 253.perlbmk 176.gcc 179.art 183.equake 171.swim 172.mgrid 168.wupwise 173.applu 188.ammp 177.mesa 129.compress 130.li 124.m88ksim 132.ijpeg 099.go 134.perl 147.vortex 126.gcc 102.swim 101.tomcatv 107.mgrid 145.fpppp 104.hydro2d 110.applu 103.su2cor 146.wave5 fpgrowth boxed-sim NAMD povray31

local steens-fi steens-fs anders ds-aa

Figure 4.6: AA-EVAL Mod/Ref Query Responses of “May Only Ref”

0%

181.mcf 256.bzip2 164.gzip 175.vpr 197.parser 186.crafty 300.twolf 255.vortex 254.gap 252.eon 253.perlbmk 176.gcc 179.art 183.equake 171.swim 172.mgrid 168.wupwise 173.applu 188.ammp 177.mesa 129.compress 130.li 124.m88ksim 132.ijpeg 099.go 134.perl 147.vortex 126.gcc 102.swim 101.tomcatv 107.mgrid 145.fpppp 104.hydro2d 110.applu 103.su2cor 146.wave5 fpgrowth boxed-sim NAMD povray31

local steens-fi steens-fs anders ds-aa

Figure 4.7: AA-EVAL Mod/Ref Query Responses of “May Mod Only”

0%

181.mcf 256.bzip2 164.gzip 175.vpr 197.parser 186.crafty 300.twolf 255.vortex 254.gap 252.eon 253.perlbmk 176.gcc 179.art 183.equake 171.swim 172.mgrid 168.wupwise 173.applu 188.ammp 177.mesa 129.compress 130.li 124.m88ksim 132.ijpeg 099.go 134.perl 147.vortex 126.gcc 102.swim 101.tomcatv 107.mgrid 145.fpppp 104.hydro2d 110.applu 103.su2cor 146.wave5 fpgrowth boxed-sim NAMD povray31

Mod & Ref Mod Only Ref Only No Mod/Ref

Figure 4.8: AA-EVAL Mod/Ref Query Responses for ds-aa

only consider one level of pointer dereference, where we consider all levels. For example, for the statement “*p = **q”, we would count all alias pairs <*p,*q>, <*q,**q>, and <*p,**q>, where Das et.al., only count the last. A secondary difference is that we consider must-alias information to be accurate, they only count no-alias as a precise response. We believe this second difference to be very minor as the only analyses capable of returning must alias information in this evaluation are the local and anders analyses, which should not impact the evaluation of the DSA-based analyses.

Figure 4.3 shows the percentage of AA-EVAL queries that return a MayAlias response for each of the benchmarks in our suite and for each alias analysis implementation. All of the charts in this section are grouped by benchmark suite and ordered according to the number of memory instructions in the program (to match tables in Section 3.4). Thorough inspection of this figure confirms and validates several properties of pointer analyses which have been previously discussed in the literature, and shows that DSA provides very accurate points-to information in addition to being able to support the macroscopic techniques described in this thesis.

• Trivial local analysis can successfully resolve a large number of queries, particularly in simple array-based programs that do not pass values heavily by reference [62]. In particular, three FORTRAN programs have over 75% of their alias queries disambiguated without any inter-procedural analysis at all, and 10 programs across the suite have over 50% of their alias queries resolved by the local algorithm. We believe that this shows the importance of evaluating in-terprocedural analyses together with a local algorithm, to avoid overstating the contribution of the interprocedural technique.

• Any interprocedural analysis is far better than none in many cases (e.g., 256.bzip2, 186.crafty, 175.vpr, 179.art, and 129.compress), even if it is as simple as Steensgaard’s imprecise (but very fast) analysis. This argues for every compiler implementing some form of interprocedural pointer analysis if possible. Because Steensgaard’s algorithm is the most straight-forward to implement, and has an excellent worst-case complexity in its simplest form, it should probably be the best candidate for an implementor who does not want to invest much time in pointer analysis.

• Field sensitivity can substantially improve the precision of unification-based analysis in

pro-grams that use multiple instances of structures with different types. While it makes no pre-cision difference for a large number of programs, steens-fs is reasonably more precise than steens-fi for 188.ammp, fpgrowth, 175.vpr, 300.twolf, 176.gcc, 179.art, NAMD, povray, and for a large number of smaller programs that are not included in this data set (e.g. the Olden suite). If implementing a unification-based approach, adding field sensitivity should be con-sidered. Note that steens-fs is more precise than anders for 188.ammp, due to the large contribution of field sensitivity.

• All other factors being equal, subset-based analysis is far more precise than unification-based analysis. While it is clear from the formulation that subset-based analysis is at least as precise as unification-based analysis, the numbers show that in many cases, a subset-based analysis (such as anders) if far superior in practice. Given a choice between implementing basic Steensgaard’s algorithm and Andersen’s algorithm, and given the resources to implement all of the refinements to make Andersen’s algorithm scalable in practice, Andersen’s should be far preferred.

• Adding context sensitivity to a unification-based pointer analysis can allow it to meet or exceed the precision of a subset-based analysis in most cases. Others have shown that either limited (e.g., [41]) or full (e.g., [92]) context sensitivity can be used to achieve this added precision.

Our experience (matching other researchers [41, 53]) is that bidirectional argument binding is the leading cause of precision loss in a unification-based analysis. This problem can either be solved either by using context sensitivity, or a subset-based analysis. Note that adding context sensitivity to a subset-based analysis has been shown to only provide a marginal increase in precision [55] and can be impractically expensive [103, 102].

• Using a cloning-based context-sensitive analysis can yield far more accurate points-to results than using a static naming scheme for heap and stack objects [92, 103, 140]. The effect is most pronounced in programs that use a large amount of heap allocated data and have few static allocation sites. For example, the 175.vpr, 300.twolf, and 252.eon programs which have simple wrapper functions around malloc that prevent the context-insensitive algorithms from detecting the independence of any memory allocated from these wrappers. While special

purpose tricks [62] can be used to address this problem in limited cases, only full context sen-sitivity can address the problem in its full generality. Note that context sensitive algorithms that name heap objects by their static allocation site will suffer the same precision problems as context-insensitive algorithms for such programs.

Overall, these numbers show that the raw alias disambiguation precision of DSA is comparable to Andersen’s algorithm in many cases (256.bzip2, 164.gzip, 183.equake, 176.gcc, 129.compress, etc), only occasionally slightly worse (197.parser, 255.vortex), and far better in several (181.mcf, 175.vpr, 186.crafty, 300.twolf, 172.mgrid, fpgrowth, NAMD, etc). Cases where Andersen’s algorithm is more precise than DSA show cases where the precision advantage of a subset-based (instead of unification-base) approach out-weigh the precision advantage of using a context-sensitive (instead of a context-insensitive) approach.