Consideraciones generales - Delitos Contra la fe Pública

1.3.3. Delitos Contra la fe Pública

1.3.3.1. Consideraciones generales

Given the optimal reference set for process monitoring of the multiple parallel production processes, the PCA biplot methodology as described by Gower

et al. (2011) is applied to the multivariate monitoring of each production pro-

cess. Aldrich et al. (2004) and Sparkset al.(1997) discussed numerous advan-

tageous of biplots for process monitoring. The biplot provides the ability to detect deviations from expected performance, as well as which variables are responsible for the deviation. Therefore, the implementation of monitoring biplots in real time provides the engineer with information on process variables for bringing the process back to expected or target performance. However, in this study, the application of monitoring biplots has been extended to multiple parallel production processes.

To demonstrate the implementation of the selected reference set for multivariate monitoring of multiple production processes, the optimal month was selected (Section 3.2) as the reference set for the monitoring biplots. In this section only the process monitoring of the Eastern factory is considered as the implementation will be similar for the Western factory. All the production processes on all the trains are monitored in real time using the M20 data from train one (i.e., all processes on train one) as the reference set for expected performance of the processes. The real time monitoring is currently performed on 15 minute aggregate data, and therefore the reference set will consist of 15 minute aggregate data for the ten production processes (gasifiers) on train one for all the days in M20. The data were filtered according to specifications provided by the production engineer (the filtering and ranges of the data will be discussed in more detail in Chapter 4).

In the next section the underlying algebra of the PCA biplot as well as some measures of the predictive power of the plots will be briefly reviewed.

3.3.1 PCA Biplot

Before presenting the underlying algebra for the PCA biplot, some necessary notation will be specified. LetU be ann×northogonal matrix withUT_U ₌_I

and U UT =I. Therefore, the sum of squares of each column of U is equal to 1, and the columns ofU are orthogonal (or perpendicular) to one another i.e., their scalar products are 0. Similarly, the sum of squares of each row of U is equal to 1, and the rows of U are orthogonal to one another. A rectangular

n ×p matrix U with p < n is orthonormal if UT_U ₌ _I

p, but U UT 6= In.

Therefore, the sum of squares of each of the columns of an orthonormal matrix U is equal to 1, and its columns are orthogonal to one another.

The singular value decomposition (SVD) plays a central role in multivariate statistics (Gower et al., 2011; Gower and Hand, 1996; Greenacre, 2010;

Greenacre and Primicerio, 2014). The SVD of a n×p matrix X with n ≥p

is defined as

Xn×p =Un×n∗ Σ∗n×p(V∗)Tp×p (3.3.1)

whereU∗ is an×northogonal matrix with columns known as the left singular

values of X, andV∗ is a p×porthogonal matrix with columns known as the

right singular values of X. The matrix Σ∗ is of the form

Σ∗_n×p = Σr×r 0r×(p−r) 0(n−r)×p 0(n−r)×(p−r) (3.3.2) whererdenotes the rank ofX, and anr×rdiagonal matrix with the diagonal

elements the non-zero singular values of X. The singular values are non- negative, and specified in decreasing order i.e., σ1 ≥ σ2 ≥ · · · ≥ σr ≥ 0.

Therefore, (3.3.1) can be written as

X =Un×rΣr×rVr×pT (3.3.3)

where U is a n×r orthonormal matrix consisting of the firstr columns of U∗ and V is ap×r orthonormal matrix consisting of the first r columns of V∗. In this study, all references to SVD refer to the reduced form in (3.3.3).

The non-zero singular values ofX are the positive square roots of the non- zero eigenvalues of the eigenvalue (spectral) decomposition of the p×psquare

symmetric matrix XTX. Therefore,

XTX =VΛVT =VΣ2VT (3.3.4) where V is identical toV in (3.3.3).

In this study the optimal approximation of an n ×p data matrix X of rank r in fewer (say r∗) dimensions is of interest. Let Xˆr∗ of rank r∗ be the

approximation of X in r∗ (r∗ < r) dimensions. Then, Gower et al. (2011)

calculated _Xˆ

r∗ by the minimisation of

trace[(X −Xˆr∗)T(X−Xˆ_r∗)] = k(X −Xˆ_r∗)k2. (3.3.5)

The solution to this least-squares problem is given by the Eckart-Young theorem (Eckart and Young, 1936). According to this theorem, the least squares problem is minimised in r∗ dimensions by the r∗ dimensional approximation

for X

Xr∗ =U_r∗Σ_r∗V_rT∗ (3.3.6)

where Ur∗ is a n×r∗ orthonormal matrix with r∗ the first r∗ columns of U, Vr∗ is a p×r∗ orthonormal matrix (for r_∗ < p) with r∗ the first r∗ columns of V, and Σr∗ is a r∗×r∗ diagonal matrix.

The coordinates of the r∗ dimensional approximation of X are given by Ur∗Σ_r∗ = XV_r∗. In the PCA literature XV_r∗ is known as the scores of the

PCA. The directions of the PCA biplot axes are given by the rows of Vr∗, also

known as the loadings.

Gower and Hand (1996) extended the biplot methodology to include cali- bration of the axes. Calibrations are added to the axes as follows (Alves, 2012; Goweret al., 2011; Gower and Hand, 1996): Given a vectorµi ofk scale values

for the ith biplot axis and a vector ei of length p containing 0’s except for a

1 as the ith value, the coordinates of the scale values in the r∗ dimensional

representation of X can be obtained by

µijeTi Vr∗ eT

i Vr∗V_rT∗e_i (3.3.7)

for i= 1,· · · , p and j = 1,· · ·, k.

Biplots are discussed in detail in Greenacre (2010), Gower and Hand (1996) and Gower et al. (2011).

3.3.2 PCA Biplot Measures of fit

According to Gower et al.(2011) the overall quality of the PCA biplot display

is defined as: Pr∗ i=1σ 2 i Pp i=1σi2 . (3.3.8)

However, the overall quality of the biplot display does not contain any information about the quality of the representation of the variables in r∗ dimensions.

The projection of unit points I on the coordinate axes can be written as IVr∗, which gives ppoints that may be regarded as representing the variables. V is orthogonal, and therefore its rows have unit sums of squares. Thus, the sums of squares of the rows ofVr∗ measures the adequacy of the representation

for each variable (Gower et al., 2011). Therefore, the adequacy for the ith

variable is given by adequacy= r∗ X j=1 v_ij2 (3.3.9)

which is equal to the ith diagonal element of diag(Vr∗VT

r∗). The maximum

adequacy is equal to 1 when r∗ = p. Adequacy measures how closely the

endpoint of the ith vector vi lies to the circumference of the unit circle in the r∗ dimensional plane of approximation (Gardner-Lubbe et al., 2008).

Gardner-Lubbe et al. (2008) argued that the adequacy, although popular,

has limitations. An example is presented where the sample points lie on a plane in a three dimensional space. Therefore, the samples will be represented perfectly in a two dimensional approximation, whereas the three variables can not be represented perfectly in two dimensions. They stated that it is generally of more interest to know how close the points represented by the rows of X are to an approximating plane than how close the axes are.

As an alternative to the adequacy measure Gardner-Lubbeet al.(2008) pro-

posed the axis predictivity, which measures the degree to which the columns of the approximation _Xˆ_r∗ agree with the columns of the exact X. The name

axis predictivity is derived from the fact that it minimises the squared dif- ferences between the actual measurements of the variable and the predicted values of that variable as read off from the biplot axis. The axis predictivities are expressed as the ratio of the sums of squares (_XˆT

r∗Xˆ_r∗) to the total sums

of squares (XTX). Therefore, axis predictivities are defined as the diagonal elements of the matrix Πp×p given by

Π=diag( ˆX_rT∗Xˆ_r∗)[diag(XTX)]−1 (3.3.10) =diag(Vr∗Σ2

r∗V_rT∗)[diag(VΣ2VT)]−1 (3.3.11)

If X is standardised then (3.3.10) and (3.3.11) simplify to give

Π=diag( ˆX_rT∗Xˆ_r∗) (3.3.12) =diag(Vr∗Σ2

r∗V_rT∗) (3.3.13)

As the axis predictivity values are expressed in terms of the fitted sums of squares over the total sums of squares, implicit reference is made to a residual sum of squares. Note, without the orthogonal decomposition the total sum of squares would not be equal to the sum of the fitted sum of squares and the residual sum of squares. Gardner-Lubbe et al. (2008) provided algebraic

results to prove what they term Type B orthogonality:

XTX = ˆX_rT∗Xˆ_r∗+ (X−Xˆ_r∗)T(X−Xˆ_r∗) (3.3.14)

Type B orthogonality shows that increasing the fitted sum of squares of a variable (the axis predictivity of the variable) reduces the residual sum of squares of the variable. Specifically, for standardised X the axis predictivity is

Π=diag( ˆX_rT∗Xˆ_r∗) (3.3.15) =I −diag((X −Xˆr∗)T(X−Xˆ_r∗)) (3.3.16)

Alves (2012) proposed the mean standard predictive error (mspe) criterion as a measure of predictive power of the biplot axes, given by:

mspe= 1 T x(1)−xˆ(1),x(2)−xˆ(2),· · · ,x(p)−xˆ(p) n (3.3.17)

where1T is a1×nvector of 1’s, and x(i) is theith column ofX and similarly ˆ

x(i) is the ith column of Xˆr∗. The mspe criterion assumes the initial X was

standardized to mean 0 and unit variance for each column. The mspe value for each axis (or variable) is the mean absolute residual value for the actual

against fitted values for each variable in X. Therefore, a low mspe value represents good predictive power of the axis. Given that X is assumed to be standardised to mean 0 and unit variance the maximum mspe value will be equal to 1.

Comparing the axis predictivity values with the mspe values the following observations can be made:

• The mspe value for each axis (or variable) is the mean absolute residual value for the actual against fitted values for each variable in X.

• The axis predictivity value for a variable for a standardisedXis the fitted sum of squares value, or alternatively I−diag((X−Xˆr∗)T(X−Xˆ_r∗)).

Therefore, both the mspe and the axis predictivity are based on the residual values X −Xˆr∗, and it is expected that comparable results will be obtained

by applying these two criteria to PCA biplots.

3.3.3 PCA Biplots for Monitoring

The PCA biplot can be extended to function as a multivariate monitoring graphic (Aldrich et al., 2004; Coetzer et al., 2014; Sparks et al., 1997). If it

can be assumed that the rows of then×preference set (X) are approximately

multivariate normally distributed with mean equal tox¯ and covariance matrix

S, it follows that the multivariate normal density is constant on surfaces where

the Mahalanobis distance(x−x¯)T S−1₍_x₋_¯_x₎_(with_x_{a row of}_X_{) is constant.} That is, the rows of X lie on the surface of an ellipsoid centered at x¯.

It can then be shown that

(x−x¯)T S−1(x−x¯)∼χ2_p, (3.3.18)

whereχ2_pdenotes the chi-square distribution withpdegrees of freedom, which is

closely related to the T2_{-statistic discussed in detail in Section 4.2.2. According} to Gower et al. (2011) the (1−α) % concentration ellipse is defined by all observations that satisfy

(x−x¯)T S−1(x−x¯)≤χ2(α). (3.3.19)

Any observation that does not satisfy the above inequality is flagged for further inspection.

3.3.4 Results

3.3.4.1 Number of principal components to include

Alves (2012) discussed various strategies to determine the number of principal components. Specifically,

• Exclude the components for which the eigenvalues of the correlation matrix are less than one. This method was first proposed by Kaiser (1958), and is a special case of the more general methodology to exclude all the eigenvalues that are less than the average of all the eigenvalues. i.e., if

λi is defined as the ith eigenvalue of S the average eigenvalue will be

i λi

p. Note that the

i λi =trace(S) and therefore the average of the

eigenvalues is equal to the average variance (Everitt and Hothorn, 2011). • Retain the number of components that correspond to some percentage

of the total information (typically 80%).

• Choose the number of components based on the scree test. The scree test entails plotting the eigenvalues λi against i and visually identifying

the point where the slope change.

Greenacre and Primicerio (2014) page 223-224 discussed a formal methodology to specify the number of significant principal components using a permutation test. A brief outline of their methodology is provided here. In PCA the structure in the data is dependent on the correlation of the variables i.e., if there is no correlation between the variables there will be no structure in the data for PCA to capture. To test the null hypothesis that there is no correlation between the variables random data sets are generated by permuting the values in the columns of the data set. As an example, one random n×p data

set (sayZ) is generated from the reference setX by sampling without replace- ment (permute) the data in columnXito populate columnZi fori= 1,· · · , p.

Two criteria i.e., percentage variance explained and eigenvalues of the correlation matrix, are then calculated on Z, and the process is repeated j times.

This will generate j results. The significance (p values) for each dimension is

now calculated by taking the number of the generated criterion that is higher than the original criterion for each dimension. For example, if j = 9999 and one value is larger than the original criterion for dimension one the p value

equals 0.0001. The p-value thus obtained is termed the achieved significance level (ASL).

For the reference set the eigenvalues of the correlation matrix were calculated. The results are shown in Figure 3.12. According to criterion 1, four principal components should be retained. The cumulative percentage variance explained was also calculated (Figure 3.13), and according to this criterion five principal components should be retained for more than 80% variability explained.

The permutation test was performed for j = 9999 for both the eigenvalue and the cumulative percentage variance explained, and equivalent results were obtained. All the values of the permuted data sets were smaller than those for the original data set up to and including four dimensions. For five and higher dimensions all the criterion values of the permuted data set were higher than the original data set. Box plots for both the eigenvalues of the correlation

matrix and the percentage variance explained are provided in Figures 3.14 and 3.15 respectively. The permuted values are depicted in the box plot, and the original values are highlighted in red. It was therefore concluded that four principal components should be retained for specifying the reference data set.

1 2 3 4 5 6 7 8 9 10 11 Scree plot Dimension Eigen v alue 0.0 0.5 1.0 1.5 2.0 2.5 3.0

1 2 3 4 5 6 7 8 9 10 11

Cumulative % Variance Explained

Dimension Cum ulativ e % V ar iance Explained 0 20 40 60 80 100

Figure 3.13: Cumulative percentage variance explained

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●_●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●_●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●_●_●●●●●●●●●●●_●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●_●_●_●●●●●●●●●●●●●●●●●●●_●●●●●●●●●●●_●●●●●●●●● 1 2 3 4 5 1.0 1.5 2.0 2.5 3.0 Dimension Eigen v alue ● ● ● ● ●

Figure 3.14: Permutation versus original eigenvalues (The permuted values are depicted in the box plot, and the original values are highlighted in red)

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●_●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●_●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●_●●●●●●●●●●●●●●●●●●●● 1 2 3 4 5 5 10 15 20 25 30 Dimension % V ar iance Explained ● ● ● ● ●

Figure 3.15: Permutation versus original variance explained (The permuted values are depicted in the box plot, and the original values are highlighted in red)

3.3.4.2 Axis predictivity and interpretation

Figures 3.16a to 3.16f depict the PCA biplots of the reference set of the Eastern factory as discussed in Section 3.2.4.1. Note that for all the biplots generated in this section the smaller numbered principal component will be represented on thexaxis and the higher numbered principal component on theyaxis. For

example, in Figure 3.16a PC1 is represented along the x axis and PC2 along

the yaxis. The PCA biplot qualities are provided in Table 3.6 for the different

In document La fe pública notarial como garantía de seguridad jurídica en la legislación penal Peruana (página 133-136)