• No se han encontrado resultados

8.8 Ejecución en paralelo

8.8.2 Preparación de la ejecución

Although subset selection methods simplify the model and reduce the vari- ance, they may be unstable. In other words, small changes in data could result in drastic changes in regression equations. As a result of the instability in subset selection methods, the prediction error is strongly affected by slight variations in the data. Besides, these methods cannot handle high dimensional data due to computational deficiency (Breiman, 1995). Since in subset selec- tion variables are either selected or discarded, subset selection is a discrete process. As a result this method often suffers from high variability. To tackle this problem shrinkage or penalisation methods are proposed. These methods are continuous and reduce the variance by putting constraints on coefficients estimates (Hastie et al., 2009). The literature on penalisation methods is very rich and considering all these methods is beyond the scope this thesis, there- fore in the following sections some of the recent famous methods to which later we compare our proposed method, are introduced.

Ridge

Ridge regression (Hoerl and Kennard, 1970) is an improvement to the ordi- nary least square (OLS) where model fitting is performed by minimising the residual sum of squares while limiting the ℓ2-norm of coefficients. Consider

the regression model 2.1.1 introduced earlier then the optimisation problem in ridge regression will have the form

ˆ β= argmin β ky − Xβk 2 2+ λkβk22, where kβk2 2 = Pp

j=1βj2 and λ ≥ 0 is a tuning parameter and it controls the

amount of shrinkage. As λ → ∞ the amount of shrinkage increases which results in variance reduction and therefore a better prediction accuracy. Al- though ridge regression is a stable method, it shrinks small coefficients towards zero but not set them to zero hence all predictors are retained in the model. Therefore variable selection cannot be performed through the ridge regres- sion. A nice feature of the ridge penalisation is the ability of this method to shrink correlated variables towards each other. This property is referred to as grouping effect. A new technique which is introduced in the next section was proposed with the aim of improving the ridge regression.

Lasso

Lasso which was proposed by Tibshirani (1996) is an alternative to the ridge regression which imposes the ℓ1-norm penalty on coefficients. So the residual

sum of squares (RSS) will be minimised as follows ˆ β = argmin β 1 2ky − Xβk 2 2+ λkβk1,

where kβk1 = Ppj=1|βj|. Because of the nice geometric feature of the lasso

constraint this method has the property to set some of the coefficients equal to zero. Unlike the strictly convex ℓ2-norm in ridge regression, the ℓ1-norm of

coefficients hits the RSS contours which are defined by n X i=1 yi− X j βjxij !2 ,

on the axes, so the corresponding coefficient will be set to zero.

β2 β2

β1 β1

β^ β^

Figure 2.1.1: RSS contours shown in red ellipses and green areas show penalty functions for

the lasso (left) with the constraint region,1| + |β2| ≤ λ and ridge regression (right) with

the constraint region, β2

1+ β 2

2 ≤ λ (James et al., 2013).

This will lead to a sparse model which is more interpretable (Tibshirani, 1996). However, lasso does not have the grouping feature of the ridge regres- sion. As a result, in the presence of correlated variables, lasso tends to select one from the grouped correlated variables and discards others. This can occur in biological data analysis. For example, gene expressions are highly corre- lated when genes belong to the same pathway. This cannot be explored by lasso because it lacks the grouping effect property. Another shortcoming of the lasso is that when p≫ n it can select at most n predictors before it saturates, also lasso may not be an ideal approach where the aim is building a predictive model (Zou and Hastie, 2005) .

Elastic net

Zou and Hastie (2005) proposed a new penalty called elastic net. This penalty is a convex combination of ridge and lasso penalty and as a result, it pos- sesses the nice features of both the ridge and the lasso, while it improves the prediction accuracy of the lasso. Elastic net solves the following optimisation problem ˆ β= argmin β ky − Xβk 2 2+ λ2kβk22+ λ1kβk1. (2.1.3) Let α = λ2

(λ1+λ2) then (2.1.3) can be equivalently written as

ˆ

β= argmin

β ky − Xβk

2 s.t αkβk2

2+ (1− α)kβk1 ≤ t for some t.

Resulted estimates from elastic net regression can be regarded as the weighted average of lasso and ridge solutions. This method also does the variable selec- tion and shrinkage at the same time and is capable of selecting the grouped variables.

Group lasso

An extension of lasso was introduced by Yuan and Lin (2006) where the se- lection is performed at the group level. Unlike the elastic net, in this method, the covariates are partitioned into non-overlap groups prior to penalisation. In other words, the solution will be non-zero groups of coefficient estimates instead of individual estimates. When the covariates are assumed to come from m non-overlap groups, this method solves

ˆ β= argmin β 1 2ky − m X l=1 X(l)β(l)k22+ λX l √ plkβ(l)k2

where X(l) is the submatrix of X columns of which are predictors in the l-th

Documento similar