• No se han encontrado resultados

Dimension reduction is an alternative approach to exclude irrelevant informa- tion and noisy features in the data. Such approaches reduce the dimension of data by projecting data onto a lower dimensional space, while the informative and interesting structure in the data is preserved.

Definition 1. A linear projection Rp → Rk is a linear map A or k× p matrix

of rank k:

w= Ax, x∈ Rp, w∈ Rk (2.1.4) The projection is orthogonal if the row vectors of A are orthogonal to each other and have length one. If k = 1, then A reduces to a row vector aT which

is called direction vector. A direction vector is a vector of norm one (Rao and Rao, 1998).

Projection Pursuit introduced by Friedman and Tukey (1974) is a dimen- sion reduction approach that pursues interesting low dimensional orthogonal projections of data. Koch (2013) describes the projection pursuit as the search for projections worth pursuing. This algorithm associates an index to each projection to measure the interestingness of that projection.

Definition 2. Let x be a p-dimensional random vector, and let a ∈ Rp be

a direction vector. A projection index Q is a function which assigns a real number to pairs (x, a) (Koch, 2013).

Through projection pursuit data are projected onto a lower dimensional space, then the low dimensional projections are described by the projection index. This index is then maximised to obtain interesting projections. Here, we introduce a special case of projection pursuit methods where the projection index is the variation in the data. In other words, the variation is the index which needs to be maximised. This technique is called Principal Component Analysis (PCA). Principal components capture directions with the highest variation in the data. Principal components are calculated as follows. Let

x ∼ (µ, Σ) be a d-dimensional random vector, and let a ∈ Rd be a direction

vector. The projection index for x and a is the variance of projected data. Hence,

Q(x, a) = Var(aTx),

Since Σ denotes the covariance matrix of x, to find the first principal compo- nent the following optimisation problem is solved

max a1 Var(a T 1x) = maxa 1 aT1Σa1 s.t aT1a1 = 1. (2.1.5)

Implementing the method of Lagrange multiplier and differentiating with re- spect to a1 gives (Σ − λ1Id)a1 = 0, where Id is a d × d identity matrix.

Thus, λ1 is the eigenvalue of Σ and a1 is the corresponding eigenvector. Since

aT

1Σa1 = aT1λ1a1 = λ1 is to be maximised, λ1 should be as large as possible.

So a1 is the eigenvector corresponding to the largest eigenvalue. Thus, the

maximiser of this projection index over direction vectors a1 is the eigenvector

of Σ with the eigenvalue of

λ1 = max

{a:kak=1} Q(x, a).

The second principal component is derived by solving the optimisation prob- lem (2.1.5) with the additional normalisation constraint aT

2a1 = 0 to guarantee

that these principal components are uncorrelated. Consequently, the second principal component is derived by constructing the following Lagrangian func- tion

aT2Σa2− λ2(aT2a2− 1) − λ3(aT2a1),

differentiating the above function with respect to a2 and setting the equation

equal to zero gives (Σ− λ2Id)a2 = 0. Similarly, λ2 is an eigenvalue of Σ with

the corresponding eigenvector a2. Also λ2 is the second largest eigenvalue of

Σ. Identically, the m-th principal component of x is aT

mx where am is the

eigenvector corresponding to the m-th largest eigenvalue (Jolliffe, 1986). It is common to find the first few principal components to reduce the dimension of

data. Indeed, PCA represents the data in a new orthogonal coordinate system which optimally accounts for the variation in the data. The proportion of the variance that each eigenvector represents can be calculated by dividing the eigenvalue corresponding to that eigenvector by the sum of all eigenvalues.

Dimension reduction for regression model (2.1.1) is also performed by find- ing the first q principal components. These projections are in the lower dimen- sional space spanned by the first q eigenvectors of the predictors. Consider the p-dimensional vector xT

i which is the i-th row of the design matrix Xn×p

in univariate regression model (2.1.2). We drop the index i in notations for the rest of this section and use x instead. Suppose R(x) is a function of di- mension less than p such that it carries all the information that x has about the response variable Y . Hence, E(Y|x) = E(Y |R(x)). Cook (2007) defines the dimension reduction as follows.

Definition 3. The action of replacing x with a lower dimensional R(x) pro- vided that it captures all the information that x contains about Y so that E(Y|x) = E(Y |R(x)) is called sufficient dimension reduction.

Dimension reduction is applied to the regression model (2.1.1) in two steps. On the first step which is the reduction step, x is reduced linearly to GTx using some methodology that produces G ∈ Rp×q, q ≤ p. The second step

is estimating the mean function E(Y|GTx) for reduced predictors. In the following we show that this sufficient reduction is performed through principal components.

Suppose Y is the n×1 vector of centred response and Xn×pbe the centered

design matrix with rows (xi−¯x)T, i = 1,· · · , n, where ¯x = n

P

i=1

x/n is the sample mean. Let ˆΣ= xTx/n denotes the sample covariance and ˆS= XTY /n. If we

denote the OLS estimator by ˆβols = ˆΣ−1S, then (Cook and Forzani, 2009)ˆ ˆ

βG= PG( ˆΣ)βˆols= G(GTΣG)ˆ −1GTSˆ (2.1.6)

product. If G = Ip then ˆβG= ˆβols. If the columns of G are chosen to be the

first q eigenvectors of ˆΣ then GTx includes the first q principal components and ˆβG is the principal component regression estimator (Cook, 2007).

Documento similar