• No se han encontrado resultados

De natura rerum de Rabano Mauro (Montecassino, AAM Cod Casin 132)

In the simulation phase of the experiment (subsection 7.1), we mined always the same data set with Apriori and in this way we eliminated the effects of the data set feature changes on the behavior model constructed. On the other hand, in a real life situation data set to be mined may grow or shrink either by addition or deletion of instances into

the data set or by attribute set changes. Variations on the mining data set features may necessitate refreshing the behavior model that is used to recommend algorithm configurations for mining this data set. Thus, we performed a series of experiments to affirm that mining data set size may have an effect on the parameter setting recom- mendations and also to speculate on how to detect that the behavior model is decayed. In the experiments, we made use of quality measurement figures collected during the execution of Apriori to assess whether the recommended parameter settings provide the requested quality. Experiments rely on the behavior model that we call basis behav-

ior model generated in the same way explained in subsection 7.2.2 from the execution

records collected by running Apriori with input data set (DSx1 ) in a simulated ubiq- uitous computing environment similar to the one explained in subsection 7.1. Brief explanation of data set size variations effect evaluation experiments are as follows:

• Verify the recommendations. In this experiment, Apriori was configured by

the recommended settings acquired from the basis behavior model and ran with input DSx1 in the simulated ubiquitous computing environment for every possible recommendation. Afterwards, we determined the appropriateness of each recom- mendation by comparing the relevant quality measurement value collected during Apriori’s execution against the requested quality used when deriving the recom- mendation from Bayesian network. For example, if an Apriori configuration is recommended to minimize the memory usage of Apriori, we assess the parameter setting objective by comparing the memory usage figures of Apriori’s execution with this configuration against the lowest memory usage figures in the behav- ior model. The percentage of the Apriori executions which achieve the objective of the parameter settings grouped by relevant quality measurement are given in Figure 7.5. Percentage of deviation from the requested quality is also analyzed for each quality measurement group. The maximum amount of deviation is ten percent of the requested quality whereas the average amount of deviation does not exceed five percent of the requested quality for any of the groups (Figure 7.5). The results obtained are satisfactory to verify the appropriateness of the recommendations.

0 20 40 60 80 100 120

Avg. mem Max. mem Total CPU CPU cycles Duration Model supp

%success

average % deviation

Figure 7.5: Assessment of recommendations derived from Bayesian network

• Demonstrate the data set size effect. We try to find out in this experiment

whether the behavior models extracted from the executions of the same data min- ing algorithm with same configuration settings but with different data mining data set sizes, are different. For this purpose, we generated another behavior model,

behavior model 10 in a similar way that we generated basis behavior model but

the size of the data set (DSx10 ) used as input to Apriori in this experiment is ten fold bigger than DSx1. After generating the behavior model (behavior model 10 ) for Apriori mining DSx10, we compared behavior model 10 against basis behavior

model and detected that half of the recommendation decisions are changed. By

this way, we have shown that input data set’s size of a data mining algorithm may have an impact on certain parameter settings decisions given in order to achieve certain quality objectives.

• Estimate behavior model decay. In the final experiment, we gradually in-

creased the size of the mining data set mimicking a possible real life situation in which a data set grows in time. Our purpose is to analyze the deterioration of the recommendations in terms of achieving the quality requested as the data set

0 20 40 60 80 100 120 DSx1 DSx20% DSx40% DSx60% DSx80% Avg. mem Max. mem Total CPU CPU cycles Duration Model supp

Figure 7.6: Behavior model decay

grows. We iteratively increased the size of the mining data set by twenty percent, run Apriori with all the possible recommended parameter settings extracted from the basis behavior model while simulating the relevant circumstance, recorded the requested quality for the recommendation and compare it against the achieved quality. During this process we used the same behavior model (basis behavior

model ) without populating new execution data or refreshing it completely. Figure

7.6 shows for different mining data set sizes the percentage of Apriori executions where the objective of the parameter setting is achieved in terms of the quality obtained. Experiment results show that the correctness of the configuration deci- sions derived from the behavior model in order to obtain the requested qualities of all types except the model minimum support are affected by the data set size change. Furthermore, basis behavior model decays needing a refresh before DSx1 grows by forty percent. This experiment revealed that mining data set size change do not effect every parameter setting decision but if a parameter setting decision do not provide the requested quality, it is possible to detect.

Documento similar