(a) 2011 Study
(b) 2012 Study
Figure 5.24: Trend Lines Generated for Average Solution Birth Generation for Suc-cessful Experiments in Scalability Studies
Since the original study, CASC has had a number of high impact changes made, in terms of the search space being navigated. Most notably (in the context of this study), the system now represents a much larger portion of the C++ language and supports many more possible code modifications. With these enhancements the system can handle a much wider variety of programs and bugs, but at the cost of a much larger search space.
The CASC system used in the 2011 study supported 32 node types for program tree construction. In the 2011 version, variable names were discovered by the system when they were used in code (as opposed to the declarations being specifically sought out). The code modifications possible were similar to what is used in the current version, but were less guided.
The version of the CASC system described in this dissertation supports 59 node types for program tree construction. Even though the programs used in the study are the same (i.e., contain the same base code elements), code elements not already present in the source code can be introduced during code modification. And so, with more types that can be generated (as described in Section 5.2.4.4), this represents a significant increase in search space from the version used in the 2011 study. Additionally, the CASC system now identifies all names available in the ES through complete source code parsing. Each variable name identified is essentially another node type available for use in code generation; as such, each name identified results in an increase in search space. These increases in search space are contrasted by more guidance added to the search. The major sources of search guidance are the identification and exploitation of apparent code intention, variable type and qualifier sensitivity, and finer granularity in performance assessment through MOOP.
The goal of reproducing this study is to provide a grounds for comparison between the version of the CASC system used in the original study and the current version. This comparison provides insight into how well the current version of the
CASC system is handling the increase in problem space by using the previous study as a benchmark.
The same parameter values used in the original study (shown in Table 5.9 on page 125) were used when appropriate; otherwise, the parameter values used in the general experiments were used (shown in Table 5.6 on page 109). MOOP was used in the new study, since it was shown to be generally at least as good as SOOP, and better in many cases.
Table 5.11 summarizes the results from the new study. When compared to the previous results summary in Table 5.10 on page 126, it is clear from the observed success rate that the system was able to much more consistently find solutions, despite the increased problem space.
Table 5.11: Results Summary for 2012 Scalibility Study Success Average Birth
Rate Gen. of Solution
OneLine 100% 10
T woLines 98% 35
T hreeLines 98% 47
F ourLines 94% 40
F iveLines 92% 66
SixLines 88% 80
SevenLines 94% 89
EightLines 92% 75
N ineLines 100% 89
Figure 5.23b provides a summary of the birth generations for solution programs found in successful runs of the new experiments. The generation count used to create this figure only includes generations in the Testing and Correction module, since that is the location where the program search space is being navigated in the CASC system.
In these plots, the cross symbols indicate the median of the recorded birth generations and the bottom and top of the boxes are the first and third quartiles.
In general, the boxes for the new study are smaller and lower in the plot. Smaller boxes indicate more consistency in the data, while boxes that are lower indicates that solutions are found earlier in the run.
As discussed in the previous section, the addition of the fourth line introduced a great deal of interdependence between the lines being evolved; resulting in a sig-nificantly more complex search space. This increase in complexity is reflected in the solution birth generations for the 2011 study. The plots for the 2012 study, however, do not indicate that the system has any issue dealing with this increase in search space complexity. This can likely be attributed to the addition of context sensitiv-ity to the code modifications supported. With this addition, the system is able to identify apparent intent in the code, and perform modifications accordingly. In the context of this study, this results in increased sensitivity to the fact that the fourth line is a branch statement, and that the condition for the branch should be modified with that in mind.
Figure 5.24b shows a plot of the average birth generation for solutions in this study. In the same manner as the 2011 study, trend lines have been added to this plot to approximate the system’s rate of convergence on a solution. Again, it is clear that system performed much more consistently in 2012 study, as the line plot of average solution birth generation is much smoother than in the 2011 study. Both linear and logarithmic trend lines are also shown in this plot. The R2 value for the linear trend line is 0.8671 and was 0.9055 for the logarithmic trend line. The higher R2 values (relative to the 2011 study) indicate a tighter fit for the trend lines to the data. This indicates that the current CASC system holds more tightly to the hypothesis that the convergence rate for the system is at worst linear, and very possibly sub-linear with program size.