• No se han encontrado resultados

4. ANALISIS DE LOS RESULTADOS

4.3. Entrevistas

The optimality criteria which we consider for the selection of a set of univariate random projections that may be suitable for the location of a low-density separator are:

1. Maximum relative depth in the estimated density of the univariate projections. This optimality criterion retains projections exhibiting a strong multimodal structure in their estimated density, with a low minimiser between two large modes. Hence, this is consistent with the objective of locating low-density separators that assign observa- tions in high-density regions around the modes ofpˆxto different clusters. Therefore,

this optimality criterion is related to the objective of MDH.

2. Maximum dip statistic (Hartigan and Hartigan,1985) in the estimated density of the univariate projections. Like the relative depth, this criterion also considers the modal- ity of the estimated projected density, and favours projections with a strongly mul- timodal structure. This was applied byKrause and Liebscher(2005) as an objective for projection pursuit clustering. Unlike the maximum relative depth criterion, the dip statistic only considers the extent to which the estimated density of the univari- ate projections is multimodal. Therefore, this criterion can permit projections that have a strongly multimodal distribution, but do not necessarily have a low minimiser between these modes.

3. Maximum variance in the univariate projections. Although there is no guarantee that directions of high variability are suitable for cluster detection (Kriegel et al.,2009), if the clusters are not heavily elongated, it is likely that projections which are highly dispersed are separable by a region of low density (Boley,1998;Tasoulis et al.,2010). This optimality criterion is consistent with the objective of PCA.

4. Minimum kurtosis, which retains univariate projections with minimal Gaussianity, so as to avoid projections with a clear unimodal Gaussian density.Peña and Prieto

(2001) show that locating univariate projections with minimal kurtosis corresponds to maximising the bi-modality in the estimated density of the projections. As such, this optimality criterion should permit a cluster separator that separates regions of high-density inpˆx. This is associated with the objective of ICA. However, since ICA

a projection direction with a very slender non-Gaussian distribution, while select- ing projections with the most negative excess kurtosis will always favour projections with a highly dispersed uniform-type or bi-modal distribution. Therefore, projections which minimise the kurtosis are arguably more consistent with locating cluster sep- arators than projections that maximise the absolute excess kurtosis. We expect cases where RP with this optimality criterion and ICA locate drastically different projec- tions to be rare in datasets with a clear clustering structure.

6.2.7 Computational Complexity

In this section, we discuss the computational complexity of locating hierarchies of low- density separators by the proposed RP approaches, and the alternative projection tech- niques considered. In our experiments, when extending these techniques to locate appropri- ate projections of feature vectors, we use then-dimensional projections of the feature vectors onto an orthonormal basis. This requires the construction of the kernel matrix (for which we use the Gaussian kernel), with costO(n2d). For the construction of ann-dimensional orthonormal basis of the feature vectors, we use KPCA, with computational costO(n3). Finally, the projections of the feature vectors onto the kernel principal components incurs a cost ofO(n3). This is the same for all the projection techniques considered, and is only computed once.

Hereafter, we consider the cost of locating the optimal projections ofX withnobser- vations andddimensions, either as the originald-dimensional set of observations or the n-dimensional projections of the feature vectors. First we consider the computational com- plexity of locating an optimal (or approximately optimal) univariate projection ofX. The first principal component ofX may be located by an iterative procedure such as the power method (Kuczyński and Woźniakowski,1992), avoiding the computation and complete eigen-decomposition of the covariance matrix. The power method has a cost ofO(nd2)

per iteration. The JADE algorithm for ICA iteratively computes the projection vector with minimal absolute excess kurtosis, with a computational cost ofO(d2+n)per iteration. The location of the MDH is also an iterative procedure. For each iteration,X is projected ontovwith computational costO(nd), and thenpˆvxis constructed atmpoints using

the fast Gauss transform (Morariu et al.,2009), at a cost ofO(m+n). Locating the min- imiser ofpˆvxto accuracyϵrequiresO(log2ϵ)iterations. The subsequent update ofv

by BFGS requires a single gradient evaluation, with a cost ofO(d2+nd). Therefore, the overall computational complexity of locating the MDH isO(d2+nd).

For an approximately optimal projection, located using RP, it is necessary to compute the projections ofX onto the matrix ofrrandom vectors, with costO(ndr). This is the most significant cost for the proposed RP approach, so dominates the computational complexity. It is worth noting that this is a single multiplication, and not an iterative procedure, such as those required for the optimal projection techniques considered. For each of therrandom univariate projections, it is necessary to compute the value of the optimality criterion of interest. For the maximum relative depth and maximum dip statistic criteria, this requires the construction of the estimated density of the projections atmpoints, and this has cost O(n+m). The maxima and minima in these densities can then be located to accuracyϵ with costO(log2ϵ). Meanwhile, the maximum variance and minimum kurtosis criteria have a cost ofO(n).

Once the projections ofX onto the selected projection vector have been computed, the subsequent bi-partition of the projections requires the construction and minimisation of a single univariate density estimate with costO(n+m)andO(log2ϵ)respectively. Except for the computation of the projections ofX onto the random vectors in the RP ap- proach, all the above operations are performed at each level of the hierarchy. Locating op-

timal projections by the iterative procedures required for PCA, ICA and MDH becomes computationally expensive for very large and high-dimensional datasets, and re-computing this at each level of the hierarchy increases the computational time required further, mak- ing the proposed RP approach very attractive. We investigate the computational times to locate bi-partitions and divisive clusterings using the projection techniques considered in Section 6.3.2. For a representative real-world dataset, the location of a cluster hierarchy took approximately 30, 15 and 20 minutes for MDH, PCA and ICA respectively. Meanwhile, locating a cluster hierarchy with 1,000 random projections took approximately 5 minutes.

Documento similar