Suppose the random vector (X1, . . . , Xp) has mean vector 0 and non-singular
covariance matrix Σ = (σjk). Form the positive definite matrix Ω with (j, k)th
element equal to σjk/ (σjσkljlk), where l1, . . . , lpare positive constants and σjj = σ2j
for j = 1, . . . , p. Then the multivariate Chebychev inequality suggested by Olkin and Pratt (1958) is
pr (|Xj| ≥ ljσj, for some j) ≤ tr T−1ΩT−1
(2.38)
where T is the unique positive definite correlation matrix such that TΩ−1T is a diagonal matrix. For p = 1, inequality (2.38) reduces to the univariate Chebychev inequality, pr (|X1| ≥ l1σ1) ≤ 1/l12.
According to Olkin and Pratt (1958, p.233), T cannot be obtained from Ω by standard matrix operations except in special cases. Garthwaite et al. (2012) suggested using the cos-square transformation to determine T. Suppose X>X is set equal to Ω, and the cos-square transformation is applied to X>X, giving the diagonal matrix C (c.f. Algorithm 1). Garthwaite et al. (2012) show that the upper bound of the inequality (2.38) is tr(C2). For a more detailed description of
the Multivariate Chebychev inequality and determining the upper bound of the inequality, see Olkin and Pratt (1958) and Garthwaite et al.(2012).
2.5.5
Partition of Hotelling’s T
2, Mahalanobis distance and
discriminant function
The corr-max transformation can be used to partition the contribution of individ- ual variables to a quadratic form such as Hotelling’s one-sample T2, Hotelling’s
two-sample T2, Mahalanobis distance and a discriminant function. Suppose the
statistic of interest is
Θ = δ (X − µ)>Σb−1(X − µ) , (2.39) where δ is a positive scalar, bΣ is an estimate of Σ with var (X) ∝ Σ.
When Σ in Theorem 2 is unknown, we replace it by a sample estimate bΣ. If the sample variance of X is proportional to bΣ, then the sample estimate of W , denoted by cW =cW1, . . . , cWp
>
, is obtained by (Garthwaite and Koch, 2016)
c W = b D bΣ bD −1/2 b D (X − µ) , (2.40)
where bD is a diagonal matrix obtained from bΣ and bD bΣ bD has diagonal elements of 1. Then the contribution of the jth X variable to Θ is evaluated as δcW2
j. Parti-
tioning of Hotelling’s one and two-sample T2 statistics and Mahalanobis distance is straightforward as they have precisely the same form as in equation (2.39), while the discriminant function is closely related.
(a) Hotelling’s one-sample T2 statistic. Let X
1, . . . , Xn be a random sample of
size n from the multivariate normal population Np(µ, Σ). Suppose the sample
mean vector is ¯X and bΣ1 is the sample covariance matrix. Then Hotelling’s
one-sample T2 statistic for testing µ = µ 0 is
T12 = n X − µ¯ 0> b
Σ−11 X − µ¯ 0 . (2.41)
Let X = ¯X, which justifies var (X) ∝ Σ as var (X) = Σ/n. Hence, putting b
Σ = bΣ1, δ = n and µ = µ0 gives the contribution of individual variables to
T2 1.
(b) Hotelling’s two-sample T2 statistic. Suppose X
11, . . . , X1n1and X21, . . . , X2n2
are two random samples of sizes n1 and n2 from two multivariate normal pop-
ulations Np(µ1, Σ) and Np(µ2, Σ), respectively, having a common covariance
matrix Σ. Let ¯X1 and ¯X2 be the sample mean vectors and S1 and S2 be
the sample covariance matrices. Then Hotelling’s two-sample T2 statistic for testing µ1 = µ2 is
T22 = {n1n2/(n1+ n2)} X¯1− ¯X2
> b
Σ−1p X¯1− ¯X2 , (2.42)
where bΣp = {(n1 − 1) S1+ (n2− 1) S2} / (n1+ n2− 2) is the pooled estimate
of Σ. Let X = ¯X1− ¯X2, so var (X) = Σ/n1+ Σ/n2 = (1/n1+ 1/n2) Σ ∝ Σ.
The contribution of individual variables to T2
2 is obtained by putting bΣ = bΣp,
δ = n1n2/ (n1+ n2) and µ = 0.
(c) Mahalanobis distance. The Mahalanobis distance between two random vectors X[1] and X[2] is
X[1]− X[2]
> b
Σ−1M X[1]− X[2] . (2.43)
Let X = X[1]− X[2]. Then var (X) = k1Σ + k2Σ = (k1+ k2) Σ, where k1 and
k2 are the proportionality constants of var X[1] and var X[2], respectively.
The partition of the Mahalanobis distance is obtained by putting δ = 1, µ = 0 and bΣ = bΣM, where bΣM is an unbiased estimate of Σ.
(d) Fisher’s linear discriminant function. Suppose X11, . . . , X1n1 and X21, . . . ,
X2n2 are two random samples of sizes n1 and n2 from two multivariate normal
populations π1 and π2 having the common covariance matrix Σ. Let ¯X1 and
¯
X2 be the sample mean vectors and S1 and S2 be the sample covariance
two samples. A new observation X0 will be allocate to π1 if τ (X0) = X¯1 − ¯X2 > b Σ−1p X0− 1 2 ¯ X1+ ¯X2 > 0. (2.44) Now var X¯1− ¯X2 ∝ Σ and varX0− 12 X¯1+ ¯X2 ∝ Σ, hence the Garthwaite-Koch partition is valid. The transformations are of the form
c W0 =D bbΣpDb −1/2 b D X¯1 − ¯X2 (2.45) and c W∗ =D bbΣpDb −1/2 b D X0− 1 2 ¯ X1+ ¯X2 . (2.46)
Let cWj0 and cWj∗ denote the jth components of cW0 and cW∗, respectively.
Then as τ (X0) =
Pp
j=1cWj0cWj∗, the contribution of Xj to τ (X0) is cWj0Wcj∗.