Materia prima: Linaza - Justificación del problema

CAPITULO I: PLANTEAMIENTO TEÓRICO

1. Problema de investigación

1.7. Justificación del problema

2.1.2. Materia prima: Linaza

Assessment of model fit is an important step of a modelling procedure, and there are many statistical tools for it, including both graphical and numerical ones. However, graphical analysis is usually difficult for logistic regression with binary response data, such as the graphical residual analysis given in Figure 3.1. In this case, numerical methods, such as the Pearson chi-squared test and the deviance test, provide a different type of tools to assess the fitted model. Tests of this type are usually carried out by measuring the discrepancy between observed data and predicted or expected outcomes based on the fitted model. For logistic regression with binary response data, Hosmer et al. (1997) compared different goodness-fit-tests and deduced the superiority of performance of the Pearson chi-squared test and deviance test in the case when only categorical predictor variables are involved in the logistic regression.

40 Chapter 3. Logistic regression models with application to anglerfish 0.2 0.4 0.6 0.8 1.0 −3 −2 −1 0 1 2 3

(linear logistic with length as the only predictor)

Fitted values

Residuals

FIGURE 3.1. Plot of the residuals versus fitted values for the fitted linear logistic model with length as the only predictor, i.e. model described by (3.32) in Section 3.2.1.

However, in the case of logistic regression with continuous predictors and binary response, the test statistics of the commonly used tests, Pearson chi-squared or deviance test, do not have approximate chi-squared distributions under the null hypothesis that the fitted model is the correct model. This is due to the very small expected cell sizes resulting from a contingency table with the number of rows equal to the total number of individual subjects in the observed data. Taking the Pearson chi-squared test for binary response data as an example, the test statistic is

χ2 = n X i=1 (yi−pˆi)2 ˆ pi(1−pˆi) ,

and the square root of the contribution from the ith observation (i.e., the Pearson residual) is yi−pˆi p ˆ pi(1−pˆi) .

The distribution of this cannot be approximated by a standard normal distribution, as the normal approximation for a binomial distribution works only when the number of trials for theith observation (ni) is large (the rule of thumb is min{nipi, ni(1−pi)}> 5). However, this is not the case for binary response data asni = 1.

3.1 Fixed-effects logistic regression and its extended forms 41 To compensate for this, Hosmer & Lemeshow (1980) introduced a goodness-of-fit test (HL-GOFtest) which groups the data with respect to predicted success probabili- ties based on the fitted model, and then compares the observed to the expected counts for both successes and failures of Bernoulli response. The cut-points of predicted probability for each cell are chosen in a way that the total number of observations in each cell is about the same. This grouping strategy allows sufficient cell size to perform a chi-squared goodness-of-fit test, which is reviewed in Appendix 3.E. The total number of cells,k, lies between 6 and 10 in most cases. Table 3.1 is a contingency table for performing a HL-GOF test, with theith row consisting of the cut-points of the cell(ˆpi−1, pˆi], its total number of observations (Ni), the observed

counts of failure and of success (denoted asOi0 andOi1 respectively), and the pre-

dicted counts of failure and success (denoted asEi0 andEi1respectively).

TABLE 3.1. Partition for the Hosmer-LemeshowGOFtest.

δ= 0 δ= 1

Cell pˆ Total Observed Expected Observed Expected 1 (0,pˆ1] N1 O10 E10 O11 E11 2 (ˆp1,pˆ2] N2 O20 E20 O21 E21 .. . ... ... ... ... ... ... i (ˆpi−1,pˆi] Ni Oi0 Ei0 Oi1 Ei1 .. . ... ... ... ... ... ... k (ˆpk−1,1] Nk Ok0 Ek0 Ok1 Ek1

In summary, theHL-GOF test is conducted with the following steps

1. order the fitted valuespˆfor all individual subjects in the data;

2. group the fitted values intok cells (mostly 10, but usually between 6 and 10) so that the size of each cell is roughly the same;

3. calculate the observed and expected number for each cell in the cases of both success and failure of the binary response; and

42 Chapter 3. Logistic regression models with application to anglerfish The test statistic of the chi-squared test in the above step 4 is calculated as

χ2 = k X i=1 1 X δ=0 (Oi δ −Ei δ)2 Ei δ , (3.30)

where δ stands for the binary response with δ = 0for the ith observation being a failure and δ = 1 otherwise. For the anglerfish experimental survey data, δ = 1 means that the fish was retained in the main cod-end andδ= 0that the fish escaped beneath the footrope.

Unlike a chi-squaredGOFtest withkcells for which the degrees of freedom equal to

k−1(see Appendix 3.E for details), for aHL-GOFtest, the degrees of freedom of the chi-squared distribution under the null hypothesis isk−2. The intuitive explanation for this decrease in the degrees of freedom is the constraint of a fixed total number of observations in each cell.

Finally, for a given significance level α, the test statisticχ2 _{obtained by (3.30) for}

a HL-GOF test is compared to a critical value,χk−2,α, which is the(1−α)×100% percentile of a chi-squared distribution withk−2degrees of freedom. Ifχ2_{is larger}

than the upper critical value χk−2,α, then the null hypothesis of no lack of fit is rejected at significance levelα.

In document ELABORACIÓN DE UNA PREMEZCLA FUNCIONAL PARA PIZZA A BASE DE MAÍZ (ZEA MAYS) NIXTAMALIZADA, CAMOTE (LPOMOEA BATATAS), KIWICHA (AMARANTHUS CAUDATUS) Y FIBRA SOLUBLE DE LINAZA (LINUM USITATISSIUM L), UCSM AREQUIPA 2014 (página 42-50)