7.Identificación y valoración de impactos en las diferentes alternativas 7.1 Metodología
7.3 Caracterización de impactos
7.3.2 Descripción de impactos
In the MoveM8 dataset two types of missing data were identified: missing due to non-response and missing due to attrition (Little & Rubin, 1987, 1989). The first type of missing data affected each survey dataset from a cross-sectional point of view. This means that some people did not provide answers to some of the questions in baseline and follow-up surveys (Time 1 and Time 2). The second type of missing data depends on attrition or simply because not all respondents provided answers at each time point. Missing value analysis with the full dataset was conducted with the MVA and Multiple Imputation packages in IBM SPSS Statistics v.19.
4.3.3.1 Item non-response at baseline, Time 1 and Time 2
Missing values were found in the variable age (only at baseline) and in the IPAQ time variables (days and minutes), which were used to compute the main outcome variables, such as the physical activity scores in the four domains sub-scores (WPA, LTPA, DGPA, ATPA) and total physical activity (TOTPA). There was only one missing value (.2%) in the age variable because a person did not provide their date of birth correctly. It was decided to replace that value with the mean of the variable.
Missing data in the IPAQ variables required more attention, because they affected almost every single time variable that was needed to compute total vigorous, moderate and walking activities in the four domains (WPA, LTPA, DGPA, ATPA). Missing values affected the 11 total time variables (minutes spent in physical activities multiplied by the number of days) in each vigorous, moderate and walking activities, in the four domains mentioned above. IPAQ guidelines recommend to exclude cases with any missing values in either days or minutes (or hours) spent in physical activity (IPAQ Research Committee, 2005, p. 10). Although listwise deletion is very popular and
commonly utilised, this approach can be problematic in many ways, because it represents “a threat to statistical power and also to the validity of statistical inference” (Fichman & Cummings, 2003, p. 7). In this study, listwise deletion would have excluded 108 cases (27.5%) of the total initial sample (N = 393), 46 cases (28.4%) in the follow- up sample (n = 162), and 32 cases (22.9%) in the 4-months follow-up survey (n = 140). Hence, it was decided to use alternative strategies, in order to preserve as much information as possible.
In the baseline dataset, the total amount of missing values in these variables was on average 2.6% (range 0% - 5.6%), and Little’s MCAR test was statistically non- significant (χ2 = 186.866, df = 199, p = .722). In the post-test follow-up survey (Time 1),
missing data were on average 1.5% (range 0% - 4.9%) and Little’s MCAR test was statistically non-significant (χ2 = 70.857, df = 91, p = .942). In the 4-months follow-up
survey (Time 2), the average missing data was 2.1% (range 0% - 5.7%) and Little’s MCAR test was also non-significant (χ2 = 43.356, df = 85, p = 1.000). Hence, it was
decided that mean substitution could be applied to the 11 time variables (minutes or hours), so that the calculation of the composite outcome variables in the four physical activity subdomains and total physical activity could be undertaken.
4.3.3.2 Missing data due to attrition
Analysing each single dataset separately, attrition caused 58.8% of the missingness at Time 1 and 64.4% of the missingness at Time 2. However, when considering the full dataset, attrition accounted for the 73.8% of the missingness in the dataset. In fact, of the total sample of participants who completed the baseline survey (N = 393), only 103 (26.3%) completed both follow-up surveys at Time 1 and Time 2.
4.3.3.3 Missing value patterns
Four types of missing value patterns were found in the full dataset including baseline, Time 1 and Time 2 data. The first pattern represents cases with no missing
190 RESULTS
data, the second represents cases with missing data at Time 1, the third is for cases with missing at Time 2 and, the fourth is for cases with missing data at both Time 1 and Time 2 variables. Almost 50% of the cases in the dataset had pattern 4, so those cases with missing data at both Time 1 and Time 2. The second most frequent pattern was number one, with no missing values in all surveys. This pattern represented the 26.3% of those who completed all surveys (103 participants out of 393). Almost 15% of the cases did not complete the survey only at Time 1 (pattern 3) and about 10% did not respond only to Time 2 (pattern 2). The missing value patterns graph showed that the pattern was arbitrary (Schafer & Graham, 2002) and was nonmonotone, because almost half of the sample did not complete all follow-up surveys, some did not complete only the Time 1 survey or only the Time 2 survey. A monotone pattern of missing data can be identified by first ordering the variables according to the amount of missing data, then by identifying whether a missing pattern is related to the amount of data missing in some units: “if Yj is missing for a unit, then Yj+1, . . . , Yp are missing as well” (Schafer &
Graham, 2002, p. 150). Some authors argued that monotone patterns can be present in longitudinal studies, however, Horton and Kleinman noted that “a monotone pattern is uncommon in most realistic settings” (Horton & Kleinman, 2007, p. 80). A monotone pattern would have implied that people who did not complete Time 1 were forced to drop-out to the study, so that they could not complete the survey at Time 2, but this was not the case, as participants were considered part of the study unless they actively asked to be excluded from the intervention.
Attrition bias was assessed by creating a series of dummy variables indicating missing data only at Time 1, at Time 2 and at both Time 1 and Time 2. Point-biserial correlations were used to investigate possible associations between missingness at Time 1 or Time 2 and all continuous variables in the dataset, namely age, physical activity variables and TPB variables. Almost all correlations with the dummy variables were not statistically significant, and those that were significant, were associated with a small effect size. For example, a small positive correlation was found between the subjective norm score and missingness at Time 1 (rpb = .168, p = .001), and Time 2 (rpb = .167, p =
participants who did not respond to post-test follow-up survey (Time 1) scored .36 units higher on the overall direct social norm scale at baseline (t (391) = -3.148, p = .002). Those
who did not respond to 4-months follow-up survey (Time 2) scored .40 units higher on the overall direct social norm scale at baseline (t (391) = -2.959, p = .003).
A Chi-square test for independence (with Yates Continuity Correction) was used to assess the associations between dichotomous variables (i.e., gender and group) and missingness at Time 1 and Time 2. No significant associations were found between gender or group and missingness at Time 1 o Time 2. The relationships between missingness at Time 1 and Time 2 and other categorical variables, such as enrolment wave, cluster, education, BMI (categorical), work status, health status, family status, age groups and baseline physical activity (categorical) were also inspected.
A significant association with missingness at Time 1 was found in baseline perceived health status groups (χ2 = 10.356, p = .035, Cramer’s V = .16), and baseline
physical activity categories (χ2 = 6.846, p = .033, Cramer’s V = .13), indicating a
significant difference in proportions between health status categories and physical activity categories. In particular, the majority of those who were in good health status did not complete Time 1 questionnaire and the majority of those who were classified as ‘highly active’ did not complete Time 1 questionnaire. However, these differences disappeared at Time 2. Consequently, it can be concluded that over all participants who did not complete all the follow-up surveys did not differ significantly from those who completed all surveys.
4.3.3.4 Strategies to deal with missing data
Although the differences between participants who responded and did not respond to the surveys were not substantial, the attrition rates were high and a large amount of data was missing from follow-up surveys. When dealing with large amounts of missing data two ‘state of the art’ strategies are available: multiple imputation and maximum likelihood estimation (e.g., Elobeid et al., 2009; Graham, 2009; Schafer & Graham, 2002). Multiple imputation is a Monte Carlo technique in which missing values are
192 RESULTS
imputed, that is replaced by a set of m > 1 simulated versions (datasets) calculated using original observed data. In other terms, multiple imputation techniques provide multiple sets of plausible values. Maximum likelihood estimation is a technique in which missing values are estimated rather than imputed, and estimates are based on a log likelihood algorithm. Multiple imputation is implemented in several statistical software, including IBM SPSS Statistics and Mplus, whereas maximum likelihood (based on a full- information maximum likelihood algorithm) is implemented in specialised SEM software packages, such as AMOS and Mplus. Methodology literature clearly suggests that both maximum likelihood and multiple imputation approaches outperform traditional methods to deal with missing data (Newman, 2003), and both approaches can be safely used in conjunction with missing longitudinal data (Graham, 2009, p. 562). Traditional methods, which include for example pairwise or listwise deletion, are considered more prone to biasing estimates (e.g., Baraldi & Enders, 2010; Blankers, Koeter, & Schippers, 2010; Graham, 2009; Honaker & King, 2010; Kristman, Manno, & Côté, 2005; Peng, Harwell, Liou, & Ehman, 2006; Peugh & Enders, 2004; Raghunathan, 2004; Rubin, Witkiewitz, St. Andre, & Reilly, 2007; Scheffer, 2002; Twisk & de Vente, 2002). For instance, in this study, using traditional procedures (such as the default listwise deletion) would produce biased estimates, since the pattern of missing data was not missing completely at random (MCAR).
Many authors encourage the adoption of maximum likelihood methods (e.g., full- information maximum likelihood) as alternatives to multiple imputation in SEM (Olinsky, Chen, & Harlow, 2003), and specifically in the context of longitudinal studies (Raykov, 2005; Raykov & Marcoulides, 2010). Full information maximum likelihood (FIML) estimation is a recommended approach also because it can be used in combination with modern robust algorithms that deal with non-normality (Shin, Davison, & Long, 2009). Additionally, in a recent simulation study, which compared the performance of FIML and MI in the presence of a second-level dependency in multilevel setting (Larsen, 2011), FIML outperformed MI. Therefore, in the current study, to missing data were dealt with FIML as implemented in AMOS and Mplus, when tested models involved longitudinal comparisons.