The following model diagnostics were used to fit and adjust the logistic models and transform variables as described above. Models were assessed for goodness of fit, distribution of deviance residuals, influential observations, linearity between predicted log OR and the explanatory variables, and collinearity. The graphical plots showing individual observations such as the deviance residual plots and the GAM plots cannot be released from the Research Data Centre due to Statistics Canada’s regulations to protect respondent confidentiality. Graphical displays showing individual observations on a plot cannot be released from the centre.
4.7.1. Goodness of fit
The Hosmer-Lemeshow goodness-of-fit test was used to assess the overall fit of the logistic models. The observations were first sorted in increasing order according to their predicted probability of outcome. The observations were then divided into groups according to the quantiles of estimated probability. The observed and expected number of events was tabulated for each outcome group. The test statistics were obtained by
calculating the chi-square statistic from the table of observed and expected frequencies and compared to a chi-square distribution. The test hypothesis is that the estimated and observed frequencies agree. Rejection of the null hypothesis indicates a lack of fit. The test statistics for the above models are not significant indicating there is not a gross lack of fit for the models. The Hosmer-Lemeshow test has some important limitations. It is sensitive to the cut-point used. It is not sensitive to misspecification of the model.242
4.7.2. Likelihood ratio tests for nested models
It is important to assess if the addition of the exposure parameter significantly contributes to the model for the OR of T2D. The likelihood ratio test was used to assess if the addition of the POP measure significantly contributed to the initial base models. These tests were performed in SAS using the GENMOD procedure to compare models 1 and 3, and similarly compare models 2 and 4. Model 1 is nested within model 3, and model 2 is nested within model 4. The addition of the POP measures as an explanatory variable to the models did significantly contribute to the log-likelihood of the models (results not shown).
4.7.3. Linearity
The linearity between the predicted log OR of having T2D and the explanatory variables was assessed using the non-parametric method of generalized additive models (GAM), The GAM plots include a smoothed non-parametric line between the outcome log OR and an explanatory variable when adjusted for other explanatory variables in the model. The log transformation of the POP measures substantial improved the linearity of these variables with the predicted log OR of T2D.
Each predictor in the models was plotted against the deviance residuals to assess if a systematic pattern could be identified in the residuals across values of the explanatory variable for each model. This is a method to identify deficiencies in specification of the parameters per the shape of the association (i.e. for example, assumptions of linearity). The plots of deviance residuals versus the predicted probabilities did not show a systematic pattern in the residuals.
4.7.4. Influential observations
Cook’s D was used to test for influential points 243. A further assessment for
influential observations was also performed by examining graphical plots of deviance residuals, DFBETA, and DIFCHISQ. As well, the deviance residuals were plotted against the identifier number of respondents to assess if any observations appeared to be obvious outliers.
The models were assessed by graphically plotting the predicted outcome probabilities versus the difference between betas (DFBETA). A plot of DFBETAs against predicted outcome probabilities shows the standardized differences in each regression parameter estimate when a specific individual observation is excluded from the analysis. It assesses the effects of individual observations on the estimated
regression parameters in the fitted model. The DFBETA statistic is calculated for each regression coefficient for each individual observation by excluding each observation in turn. Influential observations can then be identified by sorting all observations by
DFBETA values and printing the observation IDs with the largest values. The DFBETAs did not indicate any observations as being highly influential.
Graphical plots were also assessed for the predicted outcome probabilities versus the difference between chi-square goodness of fit (DIFCHISQ). These plots did not indicate any observations to be highly influential. A plot of DIFCHISQ against the predicted outcome probabilities shows the change to the overall chi-square goodness-of- fit statistic by excluding each individual observation in turn. The DIFCHISQ measures the effect of individual observations on the fit in general.
4.7.5. Collinearity
Pearson correlations were produced to identify possible collinearity among the outcome and explanatory variables in the models. The only model parameters which had elevated correlations coefficients were between years of age and the plasma concentrations for some of the POP compounds but none of these correlations
approached 1.0. Age and the plasma concentrations of POP compounds are expected to be correlated to some extent since these compounds bioaccumulate with age.
An indication of collinearity in a model may be when adding a possible collinear parameter substantially increases the standard error of the other possible collinear parameter. Age was included in the base models without a POP exposure parameter. When the POP parameter was added there was not a substantial change in the standard error in age.