Choosing between the linear regression model or the generalized linear regression model for count data sets could be seen as a model selection problem. McCullagh and Nelder (1989) de-
scribe it also as a part of selecting the right scale for analysis, taking the research purpose into account. But should we use the original scale, or the transformation scale of the target variable? Jereys (1961) states, “It is sometimes considered a paradox that the answer depends not only on the observations but on the question; it should be a platitude.” In linear regression models it is crucial to analyze the fulfillment of model assumptions, such as normality, homoscedastic- ity, and linearity. In particular, combining Poisson-distributed data under this kind of models usually leads to different possible “good” scales. For instance, the square root transformation of the target variable often stabilizes the variance. Meanwhile, the cube root of the squared variable gives approximate symmetry or normality. In practice, finding a common scale may mean choosing a suitable transformation that simultaneously improves these model assump- tions. This is a seldom feature obtained by applying a selected transformation for right skewed distributions with variance equal to the mean, as in the Poisson distribution case. The most suitable transformation to achieve homoscedasticity frequently differs from the best transfor- mation to achieve symmetry or normality (Agresti, 2015). Furthermore, transformation comes at some cost to the trade-off between accuracy and interpretability (O’Hara and Kotze, 2010; Ives, 2015). The interpretability on the original scale of measurement is often preferable to the transformed scale. However, if the transformed scale is chosen, such as the logarithm scale, conclusions can be also presented in some cases on this scale and the subsequent inference of the linear regression models can be applied. However, applying a logarithmic transformation for count data often leads to results that are not defined on the original scale, particularly, if the data is highly skewed and contains many outliers. Additionally, in Poisson log-linear models, model parameters express the effects of the covariates onlog[E(y|x)]. In order to obtain the information on E(y|x), these effects can be translated to an exponential model for the mean by using the inverse link function. In contrast, if a logarithmic transformation is applied to the linear regression model, the model parameters are defined only on E[log(y)|x], but not exactly on E(y|x) (Agresti, 2015).
One of the biggest challenges that researchers face when working with the linear regres- sion model under transformations is the bias problem. Often, after fitting such a model, it is common to want to return to an untransformed scale. The bias is produced in the inverse transformation process of a non-linear transformation. In general, a non-linear function has a non-linear inverse. In fact, E[t(y)|x] is not equal to t[E(y|x)], for most functions t(·) applied in the response variable. Expressing this issue for the linear regression model leads to:
t−1 h
E(xβ + e)i= t−1hE(xβ) + E(e)i = t−1hE(xβ)i
= t−1hE(y|x)i 6= Eht−1(xβ + e)i
Therefore, it becomes a common problem in practice to determine the magnitude of the bias caused by applying a specific transformation. If no attention is paid to this problem, grossly misleading conclusions can be produced. On the contrary, GLMs directly model the conditional expectation of the target variable in the original scale. Therefore, using GLMs does
not produce this kind of bias. This is naturally one of the reasons why the researchers prefer fitting this model, instead of analyzing the possible bias problem inherent to the non-linear transformations by using the linear regression model.
GLMs are considered to be a unified theory of modeling some prominent continuous and non-continuous response variables, for which the random component is separately chosen from the choice of the link function. That means the probability distribution and the variance struc- ture can be defined by the researcher in case they are known. However, Gelman and Hill (2006) and Ives (2015) point out that these distributional assumptions are not carefully ana- lyzed in common practice. On the contrary, transformations may be useful when no evidence of the exact definition of the underlying probability distribution of counts is known, such as Poisson-like processes. Furthermore, in case the counts are large in the data set, the linear regression model can be useful as an alternative to GLMs (Warton et al., 2016). Additionally, GLMs and GLMMs have some mathematical limitations and computational complications in case other correlation structures or data-type of covariates are needed.
Finally, a challenge regarding the research purpose also arises when choosing between the linear regression model or the generalized linear regression model for count data. Is the focus paid on prediction or inference? If the research is only concerned with statements about the likely values of the target variable under a question of the form “What is the predicted value of the response under the selected model?”, the prediction problem should be the key point. The analysis should be accompanied by some measures of precision, such as the root-mean-squared error and bias deviation, and some measures of goodness of fit.