Using properties of the transelliptical family of distributions we will define a robust estimator for high dimensional mCCA. In order to define the transelliptical family of distributions it is necessary to first define the elliptical family of distributions:
Definition 5.5( Elliptical Distributions). Ap×1random vectorY is considered to be elliptical if for some p× 1 vector µY, some p ×p positive semi-definite matrix ΣY, and a function
ψY[0,∞) → R, the characteristic function,Φ, satisfiesΦY−µY(t) = ψ(t
TΣ
Yt)for all p×1
vectorst. In this case we would say thatY is ap×1dimensional elliptically distributed random variable, which we can note asY ∼EDp(µY,ΣY, ψY)
Common elliptical distributions include multivariate normal, multivariate t, and multivariate logistic distributions. This family of distributions is useful for CCA because all linear combina- tions of elliptically distributed random variables are still elliptically distributed, andΣY which
is equal to the covariance matrix ofY up to a scalar when second moments exist. Even when moments do not existΣY defines the linear associations between the elements ofY. A useful
(Embrechts et al., 2002; Klüppelberg and Kuhn, 2009; Liu et al., 2012). A definition of the transelliptical family of distributions is given below:
Definition 5.6 ( Transelliptical distributions). A p× 1 dimensional random vector Z has a transelliptical distribution if there exists a positive-semidefinite matrix ΣhZ with all ones
along the diagonal, a functionψhZ : [0,∞) → R, and a set of functionshZ1, . . . , hZpwhere
hZi : R → R is a monotone increasing function for i = 1,2, . . . , p such that hZ(Z) = [hZ1(Z1), . . . , hZp(Zp)]T ∼EDp(0,ΣhZ, ψhZ). The random variableZ is ap×1dimensional
transelliptically distributed random variable, denoted asZ ∼T Ep(hZ,0,ΣhZ, ψZ).
An equivalent definition is any multivariate distribution with continuous marginal distribu- tions and a copula that comes from a multivariate elliptical distribution. Because transelliptical distributions allow for monotonic marginal transformations of elliptical distributions this can include heavily skewed marginal distributions. When considering methods such as CCA and Z is transelliptically distributed, it can be more useful to consider the elliptically distributed W =hZ(Z)rather thanZ itself. This is because as mentioned above linear combinations of
elliptically distributed random variables are still elliptically distributed. Further the parameter
ΣhZ describes the linear associations betweenW, and notZ, if the transformation functionshZ
are nonlinear. In fact the correlation or covariance matrix ofZitself may not be fully informative of the relationship between the elements ofZ if the marginal distributions are heavily skewed or otherwise non-elliptical.
For this reason it is desirable to estimateΣhZ rather than the correlation matrix ofZ. As
shown in Liu et al. (2012) a consistent estimate of every element ofΣhZcan be obtained through
transformations of consistent estimates of Kendall’s tau for all pairs of variables inZ. Assume thatZ = [ZT
1, . . . , ZdT]T is a d
P
i=1
pi×1dimensional transelliptically distributed random vector
where Zi = [Zi1, . . . , Zipi]
T, is a p
i ×1 random vector, andZ˜ is an identically distributed
k = 1. . . , dandl= 1, . . . , pdis equal to
τ(Zij,Zkl)=E[sign(Z
ij −Z˜ij)(Zkl−Z˜kl)]
Forniid copies ofZ,z1, . . . , zna consistent and asymptotically normal estimate ofτ(Zij,Zkl)is
ˆ τ(Zij,Zkl) n = 1 n 2 X X 1≤r<s≤n
sign(zrij−zsij)sign(zrkl−zskl),
wherezrij is therthcopy ofZij. Within the transelliptical family there is a known correspon-
dence between the,σij,kl, the entry ofΣhZ corresponding toZij andZkland the Kendall’s tau
coefficient betweenZij andZkl. Specificallyσij,kl = π2 arcsin τ(Zij,Zkl)
. This is a straightfor- ward extension of the same result from Lindskog et al. (2003) for elliptical distributions. Based on this a consistent estimate ofσij,klis
ˆ σij,kl = 2 πarcsin ˆτ (Zij,Zkl) n .
As shown by Liu et al. (2012) an estimate ofΣhZ,ΣˆhZ, can be obtained by estimating all the
elements in this fashion. Importantly this estimator does not require the estimation ofhZ orφhZ
because Kendall’s tau is invariant to monotone increasing transformations of the data. Other techniques for estimatingΣhZ will require estimation or assumptions for the form ofhZ and
φhZ. BecauseΣˆhZ is not guaranteed to be positive semidefinite it is sometimes necessary to map ˆ
ΣhZ to a positive semidefinite matrix which we will denote asΣ˜hZ. In the case whered > 2
andpiis large for someiin1, . . . , dwe can useΣ˜Z to get robust estimates of high-dimensional
mCCA directions. This can be considered to be a latent version of high-dimensional mCCA where we look for the most meaningful relationships betweenhZ(Z) =W, rather thanZ itself.
5.3.3 Latent high-dimensional mCCA in the transelliptical family