Data with an Ordinal Response
Longitudinal data analysis has been playing a profound and irreplaceable role in analyzing clustered and correlated data from a variety of fields. Substantial effort has been devoted to developing statistical models and inferential procedures for explaining the relationship between independent and dependent variables. However, little work has been done in de- veloping the variable selection procedure for longitudinal data, especially for longitudinal data with high-dimensional features (N << p) due to the immense cost of generating and
collecting longitudinal genomic data. Since the 1990s, the emergence of genomic technologies and the ‘omics’ revolution, the cost of genomic technologies has tremendously decreased and with their increased use, the need for developing cutting-edge data mining algorithms to identify and select a few key drivers from tens of thousands measured for a complex disease at the molecular level is imperative. However, due to the untidy confounding and multivari- ate dependencies in the longitudinal high-dimensional data, it is very challenging to select effective classifiers to build a parsimonious model and enhance predictability. Some work has been done by several researchers to address the variable selection problem in a linear mixed-effect model and generalized linear mixed model framework. For the linear mixed- effect model, Chen and Dunson [2003] proposed a hierarchical Bayesian model to identify any random effect having zero variance and thus performed random effects selection in linear mixed models. Vaida and Blanchard [2005] derived and compared different forms of Akaike information criterion (AIC) used for model selection in marginal and conditional represen- tation of linear mixed-effect models where the population and cluster-specific parameters are of concern, respectively. Bondell et al. [2010] implemented the adaptive LASSO [Zou, 2006] as a shrinkage penalty on the reparameterized linear mixed-effects models for select- ing fixed and random effects simultaneously. The model is fitted using the constrained EM algorithm [Larid et al., 1987] where the random effects are unobserved in the conditional expectation and the penalized likelihood is maximized at each iteration. For the generalized linear mixed model, Pan [2001] proposed a modified AIC based on quasi-likelihood with adjustment for the penalty which is suitable for model selection in Generalized Estimating Equations (GEE). Fu [2003] incorporated the bridge penalty to GEE model for variable se- lection when collinearity is present and the tuning parameter in the penalty is determined
by quasi-Generalized Cross-Validation.
Among these algorithms mentioned above, none has provided a flexible structure for extending to analyze longitudinal high-dimensional data. Here, we further implemented the Generalized Monotone Incremental Forward Stagewise (GMIFS) algorithm discussed in Section 5.2 to analyze the longitudinal high-dimensional data. We propose a two-step algorithm: first, we concentrate on all the fixed-effects and implement the forward stagewise algorithm to perform the variable selection procedure for the high-dimensional features; second, we used the set of biased estimates to fit a random coefficient ordinal response model for classification and prediction purposes. It is worth mentioning that the variable selection procedure conducted by the forward stagewise method actually shares the same computational complexity for traditional and longitudinal data since in the ordinal random coefficient model, the normal distribution of the random component does not depend on the coefficient β. Recall the likelihood L(α, β, ui, Gi; xi) for the ordinal random coefficient model discussed in Section 4.2 has an explicit form:
L(α, β, ui, Gi; x) = Y i 1 p2πσ2 int exp − u 2 i 2σ2 int ni Y j=1 C Y c=1 πc(xij, ui)yijc (5.3.1)
if assuming a random intercept model and
Li(α, β, Gi, u1i, u2i; xi) = 1 2πσu1σu2p1 − ρ2 exp − 1 2(1 − ρ2) u2 1i σ2 u1 + u 2 2i σ2 u2 − 2ρu1iu2i σu1σu2 × nj Y j=1 C Y c=1 πc(u1i, u2i)yijc. (5.3.2)
if assuming a random coefficient model. In both scenarios, the log-likelihood log L(α, β, ui, Gi; xi) can be expressed as:
log L(α, β, ui, Gi; xi) = log g(u, G) + log ni Y j=1 C Y c=1 πc(xij, ui)yijc (5.3.3)
where g(u, G) is the distribution of random effects u. Since only the second part on the right side of equation (5.3.3) dependent on the coefficient β, the gradient of the log-likelihood for the random coefficient ordinal model is the same as that of the traditional ordinal model, that is, −∂ log L(α,β;x)∂β
j = −
∂ log L(α,β,ui,Gi;x)
∂βj .
We now present our Generalized Monotone Incremental Forward Stagewise method for modeling a longitudinal ordinal response in the presence of high-dimensional data in Algo- rithm 2.
Algorithm 2
1. Create a negative version −xj of each predictor xj and expand the predictor space to e
X = (X, −X). Set the initial values for the coefficientsβ = (β1, · · · , β2p) = 0. Obtain the estimate of intercepts under the null hypothesis where αc = log
Pc
c=1P (Y ≤c)
1−Pc
c=1P (Y ≤c) for
c = 1 · · · , C − 1 and α0 = −∞, αC = ∞.
2. Find the predictor xj, j = 1, · · · , 2p with the largest negative gradient of the log- likelihood −∂ log L(α,β,ui,Gi;x)
∂βj evaluated at the current estimate β
(s).
3. Update the coefficient estimate of the selected predictor xj in step 2 with β (s+1)
j ←
βj(s)+ , where is a small positive amount; a rational choice is = 1 × 10−4. 4. Repeat steps 2 and 3 many times until convergence.
5. Update the intercept estimates ˆα by fitting a series of ordinal model using the last set of features with nonzero coefficients before a new feature enters into the model. Calculate the corresponding model fitting criteria AIC and BIC for selecting the optimal set of features with nonzero coefficients.
6. A parsimonious ordinal random intercept/coefficient model log P (Yi ≤ c|xi, ui)
P (Yi > c|xi, ui)
= αc+ xTi βˆbiased+ ziui (5.3.4)
is fitted using the optimal penalized estimate ˆβbiased to further update the intercept estimate ˆα and obtain the empirical Bayes estimator of the random effect ui, which are used for prediction and classification purposes along with ˆβbiased.
It may be easy to notice the first five steps in the forward stagewise algorithm for ordinal response with longitudinal high-dimensional data stand exactly the same as that for tradi- tional high-dimensional data where the beauty lies in the fact that the first-order derivative of the log-likelihood with respect to βj for the two types of data have the exact same forms. The only modification made is the convergence criteria in step 4 to adapt to the dynamic and vulnerable properties of the longitudinal high-dimensional data and guarantees convergence. We implement the double convergence criteria: 1) the difference between two successive log- likelihood is smaller than a given value; 2) the number of features having a nonzero coefficient estimates is less than a specified value.
In addition, there is a trade-off between computational complexity and accuracy at step 5 when selecting the optimal model based on the AIC or BIC criteria. Currently, in the method described, we treat observation as independent and fit a fixed-effects only ordinal model using the last set of features with nonzero coefficients before a new feature enters into the model to obtain an approximately best model fitting criteria. Given the framework of longitudinal data, step 5 should have been done by fitting a series of penalized ordinal random inter- cept/coefficient model with the form of (5.3.4) using different sets of penalized estimates. However, the computational process can be extremely burdensome without implementation of parallel computing. Currently, we compromise a small amount of accuracy in return for a faster solution. We adjust this bias introduced by ignoring the within-subject correla- tion in step 6 by fitting a smaller number of penalized ordinal random intercept/coefficient model to further update intercept estimate ˆα as well as model fitting criteria. Then the best parsimonious model can be selected accordingly where the performance of classification
and prediction can be evaluated correspondingly. We now illustrate step 6 in more details. Suppose the model associated with optimal model fitting criteria selected in step 5 have q features with nonzero coefficients, in step 6 we fit (2k + 1) penalized ordinal random inter- cept/coefficient models with the number of features with nonzero coefficients ranging from q − k to q + k, where k specifies the range. For example, when k = 0, only one penalized ordinal random intercept/coefficient model with q nonzero coefficients is fitted; when k = 1, three penalized ordinal random intercept/coefficient models with q − 1, q and q + 1 nonzero coefficients are fitted, respectively.