de 0.34nm. La rotación sobre el eje de simetría es lo que nos proporciona el MWNT Arriba-Derecha: Extremo de MWCNTs con forma de cono, simétrico y no-simétrico.
4.5 Calorímetro Diferencial de Flujo Modulado.
4.2.3 Procedimiento experimental.
Strength of association between the two binary variable sets, the response variables and the explanatory variables, is assessed using the phi coefficient. The phi coefficient is commonly used to determine the strength of association between two binary variables (Chedzoy, 2006). Specifically the phi coefficient is an expression of the amount of consistency (both variables share same value) and inconsistency (variables differ in value) that exist between the two binary variables (Chedzoy, 2006). To calculate the phi coefficient (see Equation 4.1), the number of observations that are associated with each of the four categories that exist between the two binary sets of variables is determined as depicted in the contingency table in figure 4.7. A chi-square statistic with one degree of freedom can then be calculated based upon the phi coefficient (see Equation 4.1) as shown by Equation 4.2 using the counts in the contingency table in figure 4.7. Based upon this chi-square statistic, a p-value can be calculated to deter- mine the significance of the strength of association between the two sets of binary variables. In cases where cell sizes of the contingency table are small, a Fisher’s exact calculation is more appropriate to prevent one from failing to identify a significant association. This should not be an issue for truly consistent subsets because the minimum support threshold should be large
enough that cellashould overshadow small cell sizes in cellsb-d. When performing multiple tests of strength of association for different binary variables based upon the same set of obser- vation it is necessary to adjust the p-value to account for the increase in type-I error that occurs due to multiple testing. The p-values used to determine significant results have been adjusted for multiple testing using the False Discovery Rate (FDR) adjustment correction, details can be found in Benjamini and Hochberg (1995) paper.
Φ = p (a∗d)−(b∗c)
(a+b)∗(b+d)∗(a+c)∗(c+d) (4.1)
χ2 = (a+b+c+d)∗Φ2 (4.2)
Our use of the phi coefficient differs slightly from its traditional use because we are look- ing at association between to sets of binary variables as opposed to two binary variables. Therefore, these sets of binary variables represent distinct patterns where all variables in a set can be active(ones) or inactive(zeros). Because we are looking at sets of variables this means certain observations may remain undefined because they contain a mixture of activity (ones and zeros) for given a set of variables. In this sense, we subset the observations under consideration down to data that can be defined as active or inactive given a variable set defini- tion and we test for strength of association between the two variable set combinations. Using the phi coefficient we are attempting to judge the consistency between two variable sets, here response and explanatory variables, over a subset of data in which the phi coefficient can be clearly defined. Because the four categories of the contingency table are so closely related to each other, we only have to mine for one of the four cells of the contingency table to define the others when calculating strength of association between two variable sets. In our case, we use closed frequent itemset mining to define all combinations of response and explanatory
111...(Explanatory) 000…-(Explanatry) 1 -(Response(s)) 1 111… [Consistent] 1 000… [Inconsistent] 0 -(Response(s)) 0 111… [Inconsistent] 0 000… [Consistent]
=
(a* d) (b*c)
(a+b)*(b+d)*(a+c)*(c
+d)
2
=
n *
2, with 1 degree of freedom
a
b
c
d
R1 R2 E1 E7 E9 E11 E14<=a=>
<=b=>
<=c=>
<=d=>
<= Observations/Subjects=>
Response & Explanatory Variables
000...(Explanatory)
Figure 4.7: Statistic of Association. Binary matrix shows mined closed frequent itemset
R1,R2,E1,E7,E9,E11,E14 as indicated by pink box labeled a. The other three categories of
data denoted byb, c, andddefine the remaining 3 combinations of response and explanatory variables in the contingency table below the matrix. This contingency table depicts the data partitioning used to determine strength of association between the two binary sets, response variables and explanatory variables. Observations that cannot be classified by the 4 categories are ignored.
variable sets that are all active (ones). The only criteria placed upon the combinations dis- covered is that they contain at least one response and explanatory variable and that they meet the minimum support threshold for active observations. The cartoon in figure 4.4 depicts the closed frequent itemsets where response and explanatory variables sets are all active (a, pink box) and its associated subset that contains all four categories (a-d, yellow box).
The data matrix on top of figure 4.7 depicts an example closed frequent itemset R1, R2,
E1, E7, E9, E11, E14 as indicated by the pink box labeled a and its similarly labeled cell
in the contingency table below the matrix. The remaining three cells of the contingency ta- ble, b-d, are indicated by the blue lines and letter labels in the data matrix. Once the mined closed itemsets have been defined, in our case where both response and explanatory variables are all active, the remaining three categories can be calculated by a single pass through the
110...(Explanatory) 010..(Explanatory) 1 -(Response(s)) 1 110… [Consistent] 1 010… [Inconsistent] 0 -(Response(s)) 0 110… [Inconsistent] 0 010… [Consistent]
=
(a* d) (b*c)
(a+b)*(b+d)*(a+c)*(c+d)
2
=
n *
2, with 1 degree of freedom
a
b
c
d
R1 R2 E1 E7 E9 E11 E14<=a=>
<=b=>
<=c=>
<=d=>
<= Observations/Subjects =>
Response & Explanatory Variables
Figure 4.8: Statistic of Association for Approximate Itemsets. Primary difference between calculating phi coefficient for approximate itemsets as compared to closed itemsets is that approximate itemsets allow some proportion of zeros in the observations that define the four categories of the contingency table for the explanatory variables (only).
data. This determines the strength of association for each of the variable sets as defined by the mined closed itemsets and adjusts this value to determine its significance with regards to the entire dataset. Therefore given a threshold of support, in our case the number of obser- vations that must be all active, we can rank and determine which subsets of data are most statistically significant, regarding strength of association, given the entire distribution of the data. This provides a powerful way to determine which subsets of data most likely show a true association between explanatory and response variables when considering all possible enumerations. Given noisy and inconsistent data, this methodology can provide insight into determining novel associations between explanatory and response variables that could not be efficiently discovered without prior knowledge of a likely relationship between explanatory and response variables.
explanatory variable set could now contain a mixture of ones and zeros as long as they did not exceed the row and column constraints placed upon approximate sets as defined by the user. The count for the contingency cell that represented when the response and explanatory variables were considered active is indicated by the pink box and letterain figure 4.8 is given by the number of observations (support) that an approximate frequent itemset has. The more challenging issue is to define the remaining three categories while allowing the same user de- fined approximation for the explanatory variables. The algorithm first uses the results from the approximate sets to define the sets of response and explanatory variables under consid- eration. Then in a single pass through the algorithm defines three sets of observations that meet the row and column constraints placed upon approximate sets. Next for each of the three sets, the algorithm combines the observations in similar fashion to finding an approximate set based upon closed frequent itemsets to determine the maximum number of observations that can be combine without breaking column and row constraints. Whichever combination gives the maximum count becomes the observations used for the set associated to the cell of the contingency table when calculating the phi coefficient. The chi-square statistic and FDR adjustment of the p-value are then calculated using these counts. The approximate frequent itemsets represent the union of closed frequent itemsets; thus, results including approximate frequent itemsets also include closed frequent itemsets that could not be made approximate and the FDR adjustment of the p-values account for all these results.