CAPÍTULO IV: IMPLEMENTACIÓN
4.2. Proceso de Inducción de Personal
Following the preparation of data, including the elimination of variables with missing weights and the imputation of missing values, cluster analyses were conducted to determine whether there were any naturally occurring groups within the data on several variables previously thought to be pertinent to the child
maltreatment. The purpose of this phase of the study was to identify varying profiles of parental characteristics as a way to understand differences which may vary across parenting age groups as well as different forms of child maltreatment. Because parental characteristics were the primary focus of this phase of the study, only these variables were used in the cluster analyses. Other, variables, including living arrangement, receipt of financial support, child characteristics, more distal ecological factors, and socio-demographic characteristics were used in later analyses to further describe clusters of parental characteristics once identified based on similar patterns of parental characteristics.
As previously mentioned, cluster analyses were performed using Stata 11. Prior to using the cluster command, the variables of interest were standardized using range standardization. Each variable was divided by its range. The range standardized variables all had ranges less than or equal to 1. According to Wilson, Kuebli, & Hughes (2005) “Cluster analyses are affected by highly correlated variables because such variables are implicitly weighted more heavily (Hair & Black, 2000)” (p. 990). Wilson et al. (2005) further report that intercorrelations greater than .80 are considered problematic. Thus, intercorrelations were calculated for all of the variables used in the cluster analyses. All of the intercorrelations fell below the .80 threshold and the variables were thus deemed appropriate for cluster analyses.
For the first step in the cluster analyses, a hierarchical agglomerative method using Ward’s minimum variance technique and the squared Euclidean distance measure was employed to identify patterns of parents’ characteristics. Ward’s method “Joins the two groups that result in the minimum increase in the error sum of squares” (Hamilton, 2009, p. 352). The squared Euclidean distance measure is the recommended distance measure when using Ward’s clustering algorithm (Hair & Black, 2000). There are few steadfast rules for determining the most appropriate number of clusters. Stata offers several postclustering commands to
method is called the Duda, Hart, and Stork (2001) Je(2)/Je(1) index with associated pseudo-T squared. For the Calinski Harabasz pseudo-F index and Duda-Hart index, larger values are indicative of more distinct clustering. Pseudo-T squared values are also presented with the Duda-Hart index however, and smaller pseudo-T squared values indicate more distinct clustering. Thus, the Stata Multivariate Statistics Reference Manual Release 11 (2009) suggests when examining the Duda-Hart Je(2)/Je(1) index “to find one of the largest Je(2)/Je(1) values that corresponds to a low pseudo-T-squared value that has much larger T-squared values next to it” (p. 164). The Calinksi Harabasz pseudo-F index and the Duda-Hart index are only suitable for continuous data (Everitt et al., 2001). Lastly, a step-size stopping rule (Milligan & Cooper, 1985) was implemented which “computes the difference in fusion values between levels in a hierarchical cluster analysis” (StataCorp, 2009, p. 144). It is reported that “large values of the step-size stopping rule indicate groupings with more distinct cluster structure” (p. 144). The step-size values represent the difference between the matching coefficients when one more group is formed than the previous. These methods, in addition to the inspection of the cluster dendrogram, were utilized to determine the number of solutions worthy of following-up with a nonhierarchical K-means cluster analysis.
K-means cluster analysis is a partitioning method, which breaks “the observations into a pre-set number of nonoverlapping groups” (Hamilton, 2009, p. 351). The number of clusters most consistently found across clustering methods in step one was used as a starting point for forming clusters in step two. Thus, the cluster centers were set using the mean scores of each cluster derived in step one using Ward’s method. This allowed cluster members to be reassigned to the cluster with the nearest centroid, when appropriate. When several K-means cluster solutions were explored, the Calinski Harabasz pseudo-F index was again inspected for each cluster solution to determine the one most appropriate for the data. In addition, the sum of the within- group sums of squares over all of the variables was also utilized as a method to determine cluster solution fit, with a minimization of the error sum of squares considered optimal.
A second hierarchical agglomerative method, weighted-average linkage using squared Euclidean distance measure, was used as an alternative clustering algorithm with which to compare the clustering results derived from Ward’s method. Hamilton (2009) reports that for weighted-average linkage, “two groups are given equal weighting regardless of how many observations there are in each group” (p. 352). This method was chosen to counter-balance the drawbacks posed by Ward’s method as Hamilton maintains that Ward’s linkage
“Does well with groups that are multivariate normal and of similar size, but poorly when clusters have unequal numbers of observations” (p. 352). Cluster solutions were explored by inspecting the cluster dendrogram, examining the results of the Calinski Harabasz pseudo-F index, Duda-Hart index and associated pseudo T- squared values, as well as the step-size table. As with the Ward’s method, follow-up K-means cluster analyses were then conducted using the means of the most stable cluster solutions suggested by the aforementioned rules. As a follow-up to multiple K-means cluster solutions examined for their descriptiveness of the data, the
Calinski Harabasz pseudo-F index was inspected as well as minimization of the error sum of squares to help determine the most appropriate K-means cluster solution.
Ward’s linkage followed by a K-means cluster analyses and weighted-average linkage, also followed by a K-means cluster analyses, were used to cluster the three ordered categorical maltreatment variables, and the continuous depressive symptoms and emotional closeness with parents variables. Alternative methods for clustering the observations were also attempted by using the continuous summary maltreatment index along with the depressive symptomatology and emotional closeness with parents variables. The continuous summary maltreatment index could not be clustered in conjunction with the ordered categorical maltreatment variables due high collinearity. This combination of clustering variables was employed in both Ward’s linkage and weighted-average linkage clustering algorithms. In addition, a binary measure of each of the three forms of parent maltreatment victimization was utilized in Ward’s linkage and weighted-average linkage cluster analyses. The combination of binary and continuous variables required the use of a mixed-methods dissimilarity measure. The Gower dissimilarity coefficient was selected as the appropriate measure for a mixture of binary and continuous variables. Each of these alternative methods for examining and clustering the maltreatment variables was followed-up with K-means cluster analyses. When non-continuous variables were included, step-size values were inspected to indicate the number of clusters to examine in the follow-up K- means cluster analyses. Following K-means cluster analyses of these continuous and binary variables,
minimization of the error sum of squares was examined. These results are presented and described in Section 1 of Chapter 3.