• No se han encontrado resultados

5. Justificación

1.3 Marco Teórico

1.3.3 Los Públicos

1.3.3.1 Clasificación de los públicos

Interpolation is the process of finding the position of a sample in the biplot space,

formed by relating the given values to a set of biplot axes referred to as interpolative biplot axes. Accordingly the joint plot in which the measured variables are repre- sented by the interpolative biplot axes and the samples are represented by the points the positions of which are found using the interpolative biplot axes, is called the in- terpolative PCA biplot.

For convenience the interpolative biplot axis corresponding to thekth measured variable will be referred to as the kth interpolative biplot axis. Let X∗ denote the original observed data matrix andXdenote the matrix upon which the construction of the PCA biplot is based. Let x∗j and σˆ∗jj respectively denote the sample mean and sample variance of the jth measured variable calculated from the matrix X∗, j ∈[1∶p]. Recall that when the measured variables have widely differing standard deviations, the PCA biplot should be based on the centred and standardised data matrix, i.e.X=(I−n111′)X∗A−1, where A denotes the p×p diagonal matrix with

jth diagonal element equal to √σˆjj∗,j ∈[1∶p]. On the other hand, when the meas- ured variables have very similar standard deviations, the PCA biplot can be con- structed from the centred but unstandardised measurements i.e.X=(I−1

n11′)X∗.

Consider the svd of X:

X=UDV′.

Recall that the point that represents theith sample in ther-dimensional PCA biplot space, L = V(Vr), is the orthogonal projection of xi onto the biplot space. This

point in L is referred to as the interpolant of xi. The coordinate vector of the

interpolant ofxi in terms of the basis of thep-dimensional measurement space given

by thep p-dimensional unit vectors {ek} p

k=1, is given by

x′iVrVr′ .

The coordinate vector of this point in terms of the basis of L given by the column vectors ofVr is given by

x′iVr.

A method called the ‘vector sum method’ can be used to find the position of the interpolant of a samplexin the biplot space,L = V(Vr). This method relies on the

following expression of the interpolant x′Vr:

x′Vr=p 1 p p ∑ k=1 xk(e′kVr) .

The ‘vector sum method’ entails locating the centroid of the p points x1e′1Vr,

x2e′2Vr, ... and xpe′pVr, and then extending the vector stretching from the ori-

gin to this centroidptimes - the endpoint of this extended vector gives the position of the interpolantx′Vr.

Note that e′kVr is the interpolant of the unit point ek on the kth Cartesian

axis. The interpolant e′kVr therefore represents one unit of the kth variable,

˜xk.

Remember that this one unit is in terms of the scale of the elements of the kth column of the matrixX. Similarly, the interpolant µe′kVr, where µ is an arbitrary

constant, represents µ units of the kth variable,

˜

xk. The kth interpolative axis is

defined by points of the form µe′kVr. This means that the kth interpolative axis

is linear and lies collinear to the kth row vector of Vr which stretches from the

origin. It follows that if ek ∈ V(Vr), the kth interpolative biplot axis of the r-

dimensional PCA biplot and the Cartesian axis that represents the kth variable in the p-dimensional measurement space, will be collinear. If the interpolative biplot axes are calibrated in the same scales as the elements of X then point µe′kVr is

calibrated with the value µ such that the position of the interpolant of x can be found by locating the point on the ith interpolative biplot axis that is calibrated with the value xi for all i∈[1∶p], finding the centroid of these p points and then

extending the vector emanating from the origin to this centroidptimes - the endpoint of this extended vector gives the position of the point representing x in the PCA biplot. It is evident that the calibrations on the kth interpolative biplot axis of the r-dimensional PCA biplot increases in the direction of the kth row vector of Vr and decreases in the opposite direction. If on the other hand the interpolative

biplot axes are calibrated in the same scales as the elements ofX∗, the pointµe′iVr

will be calibrated with the value µ∗, where µ∗i = µ+x∗i if the PCA biplot was

constructed from the unstandardised measurements andµ∗i =µ√σˆii+x∗i if the PCA

biplot was constructed from the standardised measurements. It follows that if the interpolative biplot axes are calibrated in terms of the scales of the elements ofX∗, then the position of x can be found by locating the point on the ith biplot axis that is calibrated with the value x∗i for all i∈[1∶p] and then extending the vector emanating from the origin to the centroid of these p points p times - the endpoint of this extended vector gives the position of the interpolant of x.

The two-dimensional interpolative PCA biplot constructed from the standardised measurements of theUniversity data set is provided in Figure 2.6. In this biplot it is illustrated how the position of the point representing Purdue University (Purdue) in the biplot is found using the vector sum approach. The points on the six biplot axes that are calibrated with the appropriatex∗i values are indicated with solid triangles while the centroid of these six points is indicated with a solid square.

Samples other than those upon which the construction of the PCA biplot was based can be interpolated onto the existing PCA biplot in order to visualise their positions relative to the other samples. In order to interpolate the new sample x∗ onto the PCA biplot constructed form the matrix X, each element ofx∗ first needs to be transformed such that it is measured in the same scale as the elements of the corresponding column of the matrixX. If the existing PCA biplot was constructed from the unstandardised measurements i.e. X =(I−n111′)X∗, the new sample x∗ must be transformed to x=x∗−x∗, where x∗ = n11′X. If the existing PCA biplot

was constructed from the standardised measurements i.e. X=(I−n111′)XA−1, x

must be transformed to x=A−1(x∗−x∗). The point representing the new sample

in the r-dimensional PCA biplot space, L = V(Vr), is the orthogonal projection of

xonto L. The position of this point can be obtained using the interpolative biplot axes in exactly the same way as explained above.

SAT 4 6 8 10 10 12 12 14 16 16 18 18 20 20 Top10 −60 −40 −20 0 20 20 40 60 80 100 100 120 140 160 180 Accept −100 −80 −60 −40 −20 0 0 20 40 60 80 100 100 120 140 160 180 200 SFRatio −15 −10 −5 0 5 5 10 15 20 25 25 30 35 40 45 Expenses −60 −40 −20 0 20 40 60 80 100 120 140 Grad 20 30 40 50 60 70 80 90 100 100 110 120 130 140 HarvardPrinceton Yale Stanford MIT Duke CalTech Dartmouth Brown JohnsHopkins UChicago UPenn Cornell Northwestern Columbia NotreDameUVir Georgetown CarnegieMellon UMichigan UCBerkeley UWisconsin PennState Purdue TexasA&M

Figure 2.6: The two-dimensional interpolative biplot constructed from the standard- ised measurements of the University data set, illustrating the vector-sum approach for Purdue University.

Documento similar