La virtud de la TEMPLANZA - Vivir las Virtudes Un camino de santidad

The content of Chapter 2, in which we provide an extension of the classical Mahalanobis distance for functional data, is summarized in Berrendero et al. (2018b). In Chapter 3 we analyze a variable selection technique for linear regression with functional predictors. The first part of the chapter is focused on scalar response problems and it can be found in Berrendero et al. (2018a). The second part includes an extension to stationary functional time series and it is available in Bueno-Larraz and Klepsch (2018). The results of Chapter 4 about functional logistic regression are included in a preliminary draft currently in progress.

Chapter 2 Functional Mahalanobis distance

2.1 Introduction

The classical (finite-dimensional) Mahalanobis distance and its applications

Let X be a random variable taking values in Rdwith non-singular covariance matrix Σ. In many practical situations it is required to measure the distance between two points x1, x2 ∈ Rd when considered as two possible observations drawn from X. Clearly, the

usual (square) Euclidean distance kx1− x2k2= (x1− x2)0(x1− x2) := hx1− x2, x1− x2i

is not a suitable choice since it disregards the standard deviations and the covariances of the components of x_i_{(given a column vector x ∈ R}d_{we denote by x}0 _{the transpose of}

x). Instead, the most popular alternative is perhaps the classical Mahalanobis distance, M (x1, x2), defined as

M (x1, x2) = (x1− x2)0Σ−1(x1− x2)

1/2

. (2.1)

Very often the interest is focused on studying “how extreme” a point x is within the distribution of X; this is typically evaluated in terms of M (x, m), where m stands for the vector of means of X.

This distance is named after the Indian statistician P. C. Mahalanobis (1893-1972) who first proposed and analyzed this concept (Mahalanobis, 1936) in the setting of Gaussian distributions. Nowadays, some popular applications of the Mahalanobis distance arise in the following fields (the list of references is far from exhaustive):

Supervised classification: this use of M is mostly linked to Gaussian models. It is easy to show that in a supervised classification problem of discriminating among k Gaussian ho- moscedastic populations with (mean, covariance) parameters (m₁, Σ), . . . , (mk, Σ) and

prior probabilities π1, . . . , πk, the optimal (Bayes) rule to classify a coming observation

x is just to assign x to population j defined by M2(x, mj) − 2 log πj = max 1≤i≤k M 2_{(x, m} i) − 2 log πi (2.2)

Outliers detection: Robust estimators of M (x, mj) are proposed in Rousseeuw and

van Zomeren (1990) in order to identify outliers and leverage points. In Penny (1996) Mahalanobis distance, combined with jackknife methods, is also used as a tool for outliers detection.

Multivariate depth measures: the function D(x) = 1/(1 + M2(x, m)) has been used as an assessment of “how deep” is the point x into the a population with mean and covariance parameters m and Σ, respectively. See, for example, Zuo and Serfling (2000) for a discussion on the relative merits of this proposal when compared with other popular depth measures.

Hypothesis testing : It is a well-known result in multivariate analysis that, whenever X has a Gaussian distribution in Rd, that is X ∼ N_d(m0, Σ), with a non-singular

covariance matrix Σ, we have

nM2(¯x, m0) ∼ χ2d, (2.3)

where ¯x denotes the vector of sample means of a random sample x1, . . . , xn from X,

and χ2_dstands for the distribution chi-square with d degrees of freedom. Of course this result can be used to get a significance test for H0 : m = m0 versus H1: m 6= m0.

When Σ is unknown we can estimate it with the usual empirical estimator from the sample x₁, . . . , xn. In that case we have

n(¯x − m0)0S−1(¯x − m0) ∼ Td,n−12 , (2.4)

where T_p,n−12 denotes the Hotelling distribution with d and n − 1 degrees of freedom. A two-sample version of this statistic (and the corresponding test) is also well-known; see, e.g., Rencher (2012, Ch. 5) for additional details on Hotelling’s statistic.

Goodness of fit : while this application is perhaps less relevant than those mentioned in the previous paragraphs, the paper by Mardia (1975) shows some interesting connec- tions between the moments of Hotelling’s statistic and some statistics commonly used in testing multivariate normality.

On the difficulties of defining a Mahalanobis-type distance for functional data

The framework of this thesis is Functional Data Analysis (FDA). In other words, we deal with statistical problems involving functional data. Thus our sample is made of trajectories x₁(t), . . . , xn(t) in L2[0, 1] drawn from a second order stochastic process

X(t), t ∈ [0, 1] with m(t) = E[X(t)]. The inner product and the norm in L2[0, 1] are denoted by h·, ·i₂and k · k₂, respectively. We will henceforth assume that the covariance function K(s, t) = cov(X(s), X(t)) is continuous and positive definite. The function

2.1 Introduction

K defines a linear operator K : L2[0, 1] → L2[0, 1], called covariance operator (already defined in Equation (1.4)), given by

Kf (t) = Z 1

K(t, s)f (s)ds.

The aim of this chapter is to extend the notion of the multivariate (finite-dimensional) Mahalanobis distance (2.1) to the functional case when x1, x2 ∈ L2[0, 1]. Clearly, in

view of (2.1), the inverse K−1of the functional operator K should play some role in this extension if we want to keep a close analogy with the multivariate case. Unfortunately, such a direct approach utterly fails since, typically, K is not invertible in general as an operator, in the sense that there is no linear continuous operator K−1 such that K−1_{K = KK}−1 _{= I, the identity operator.}

To see the reason for this crucial difference between the finite and the infinite-dimensional cases, let us recall that some elementary linear algebra yields the following representa- tions for Σx and Σ−1y,

Σx = d X i=1 λi(ei0x)ei, Σ−1y = d X i=1 1 λi (ei0y)ei (2.5)

where λ₁, . . . , λd are the, strictly positive, eigenvalues of Σ and {ei, . . . , ed} the corre-

sponding orthonormal basis of eigenvectors.

In the functional case, the classical Karhunen-Loève Theorem (see, e.g., Ash and Gard- ner (2014)) provides X(t) =P

jZjej(t) (in L2 uniformly on t) where {ej} is the basis

of orthonormal eigenfunctions of K and the Zj = hX, eji2 are uncorrelated random

variables with var(Z_j) = λj, the eigenvalue of K corresponding to ej. Then, we have

Kx = K ∞ X i=1 hx, eii2ei

Note that the continuity of K(s, t) impliesR1

0 K(s, t)

2_{dsdt < ∞, thus K is in fact a}

compact, Hilbert-Schmidt operator. In addition, it is easy to check thatR₀1R₀1K(s, t)2dsdt = P∞

i=1λ2i so that, in particular, the sequence {λi} converges to zero very quickly. As a

consequence, there is no hope of keeping a direct analogy with (2.5) since K−1x = ∞ X i=1 1 λi hx, e_ii₂ei (2.6)

will not define in general a continuous operator with a finite norm. Still, for some particular functions x = x(t) the series in (2.6) might be convergent. Hence we could use it formally to define the following template which, suitably modified, could lead

to a general, valid definition for a Mahalanobis-type distance between two functions x and m, f M (x, m) = ∞ X i=1 hx − m, eii22 λi !1/2 , (2.7)

for all x, m ∈ L2[0, 1] such that the series in (2.7) is finite. We are especially con- cerned with the case where x is a trajectory from a stochastic process X(t) and m is the corresponding mean function. As we will see later on, this entails some especial difficulties.

The organization of this chapter

In the next section the connection of the Mahalanobis distance with RKHS theory is introduced, together with the proposed definition. In Section 2.3 some properties of the proposed distance are presented and compared with those of the original multivariate definition. Then, a consistent estimator is analyzed in Section 2.4. Finally, some numerical outputs corresponding to different statistical applications can be found in Section 2.5.

2.2 A new definition of Mahalanobis distance for functional

In document Vivir las Virtudes Un camino de santidad (página 24-28)