ANEXO III: Medidas en granjas de visón americano
11. Requisitos para el vallado alrededor de las granjas o plantas:
Once we have a model for conceptualizing leadership, and the adequate response data for implementing the analysis, the third issue regards the structure of the dataset and the models that can be used for correctly investigating it. Specifically, hierarchical data are very common in the social and behavioral sciences. This kind of data involves measurement at multiple levels such as individual and groups as, for example, classes
and schools. In general, hierarchical data are obtained by measurement of units grouped at different levels. Groups can be defined as “natural” individuals’ clusters. In particular, hierarchical data can be obtained by multistage sampling. For instance, one might sample schools within school districts, and then sample students within sampled schools. Moreover one can obtain hierarchical data in the experimental contexts when the experimental plan presents a nested treatment structure. In all these cases the individuals present some common characteristics (these can be observed or unobserved).
In the hierarchical data analysis the lower level of observation (commonly the individual level) is called first level and following the hierarchy of data one can define the second, third, etc levels. The variables introduced in a multilevel analysis can be observed at different levels but the dependent variable must be collected at the first level.
In the social research context one generally considers the individuals as interacting with the social context to which they belong. This means that individuals are influenced by the characteristics of the groups and that the observations coming from the same group cannot be considered independent.
The analysis of hierarchical data (involving characteristics measured at different level of aggregation) can be faced by means of aggregation of the disaggregated measures or disaggregation of aggregated ones. Both the solutions present statistical drawbacks. In particular the disaggregation causes spurious statistical significance in regression model estimation without the introduction of information in the model specification, some authors called this practice “the miraculous multiplication of the number of units” (most of times the information used for the disaggregation procedure is connected with one or more individual level variables). On the other side the aggregation causes the loss of statistical “power” (the aggregation lowers the observations number), the well known ecological fallacy (or “Robinson effect”, Robinson, 1950) and the “Simpson’s Paradox” (see Lindley and Novick, 1981).
For these reasons models involving the grouped individuals cannot be specified without considering the dependence structure of the individual observations. Moreover, the multilevel structure of the data involves the possibility to consider variables measured at different levels of the hierarchy. Analysis models that contain this kind of variables are known as multilevel models. The specification of multilevel models consists in the definition of functional forms allowing for group specific coefficients. In order to specify these models one can follow:
9 Fixed effects approach that considers only fixed factors and optional covariates as predictors.
9 or Random effects approach considering one or more random factors and optional covariates as predictors (mixed effects model).
The decision about most favorable approach can be based on theoretical and practical consideration.
In experimental research a factor, defining different treatments, is said to have fixed effect if all possible treatments are considered in the experiment plan. A random effect is attributed to a factor defining treatments that can be considered a sample of all possible treatments. From a statistical point of view a random effect can be attributed to a factor defining large groups’ number. In this case the fixed effect approach cannot be considered parsimonious and consequently one can consider the random effects model specification. Moreover, one can interpret the group specific coefficients as direct effects of the grouping factor. In this case the model specification must include a factor (the grouping variable) in the explicative variables matrix. It is important to note that the inclusion of a factor makes impossible the use of group level variables (these would, in fact, cause perfect collinearity). This model specification implies the estimation of a coefficient for each macro-unit. Otherwise, the group specific effects can be considered as residuals from an average regression function. In this context, the residuals can be assumed as randomly drawn from a population with zero mean and unknown variance. The treatment of the
individual effects under this hypothesis implies the specification of the random coefficients model. The random coefficient model only implies the estimation of the variance of the group effect.
Summarizing, the choice between the two approaches can be influenced by the focus of statistical inference, the nature of the observed set of groups, the magnitude of group sample sizes and the population distributions involved. In particular:
1. if the groups are regarded as a sample form a larger population, the random coefficients model is appropriate;
2. if the researcher wishes to test effects of group-level variables, the random coefficients model should be used;
3. if the average group size is relatively small (some authors suggests a range from 2 to 50 or 100 observations), the random coefficients model has important advantages from an inferential point of view.
Finally, the random coefficient model is mostly used with the additional assumption that the random coefficients are normally distributed. If this assumption is a poor approximation of the real condition, the model results may be unreliable. Other discussions about the choice between fixed and random coefficients can be found in Searle et al. (1992) and in Hsiao (1995). Based on a review of the literature and on simulation studies, Ita G. G. Kreft (1996) concluded, "for researchers specifically interested in variance components, and posterior means, random coefficients modeling provides them with separate estimates for separate contexts, and the iteration procedure improves the estimates of the variance components." Compared with classical regression, multi-level modeling is more helpful in revealing differences in variance among units of analysis in different groups which comprise the levels.