Mean shift algorithm is a mode seeking algorithm. Mean shift algorithm clusters a dataset by assigning each data point to its corresponding mode. This is done by slightly shifting the data point toward the mode. For this, a neighborhood is considered around the data point. The mean point within this neighborhood is determined. In the next step, the neighborhood area is centered to the newly calculated mean point. In this way, the centers of high density areas, so called modes, are found. There is no need for a priori definition of the number of clusters. The number of clusters is controlled by the radius of the neighborhood area, so called kernel size.
The idea of the algorithm is strongly connected and inspired by Parzen windows [Co- maniciu and Meer, 2002, Fukunaga and Hostetler, 1975]. The Parzen windows technique
3.4 Unsupervised clustering methods without predefined number of clusters65
is a well-known and popular density estimation method [Parzen, 1962]. In this technique, a neighborhood is defined around each data point. This neighborhood, in 2D space for example, can be a rectangular window. Data points with a large number of neighbors in the surrounding window are selected as modes or centers of densities. In the original imple- mentation of Parzen windows, the space of the dataset was divided into a grid of sub areas. Each grid cell was considered as a Parzen window. While this kind of implementation is very easy and practical in one- or two-dimensional space, it is quite difficult to implement this method in higher dimensions. The result of Parzen windows varies strongly with the window size and with the decision, what the ’large’ number of points in a neighborhood is. The modes found by Parzen windows can be considered the centers of clusters.
In mean shift algorithm, centers of gravity are found simultaneously with the path of each point to the corresponding mode. Tracking of data points to modes, that is to the center of gravity, is done by shifting each data point to the actual center of its neighborhood. The shifting continues until it converges. It has been shown by Comaniciu and Meer [2002] that if the neighborhood function, the so called kernel, has a convex and monotonically decreasing profile, the mean shift algorithm converts.
A proper kernel for mean shift algorithm can be described as follows. For a dataset con- sisting ofndata pointsxi,i= 1, ..., nin them-dimensional vector spaceRm a multivariate
kernel density estimate is defined by:
e f(x) = 1 n n X i=1 KH(x−xi) (3.32) with KH(x) = |H| −0.5 K(H−0.5x). (3.33) The function K(x) is them-variate kernel function and satisfies the regularity constraints:
Z Rd K(x)dx= 1, lim kxk→∞kxk d K(x) = 0, Z Rd xK(x)dx= 0, Z Rd xxTK(x)dx=cKI, (3.34)
Figure 3.12: Mean shift clustering with different bandwidths is performed on the same simulated dataset from section 3.3.2.
can be, for example, a Gaussian kernel cte−x/2 or an Epanechnikov kernelctmax(1−x,0)
with the normalizing constant ct.
Mean shift algorithm is performed in following steps: 1. Select a data point x from the dataset.
2. Find all data points xi in the neighborhood of x:
N(x) ={∀xi| kxi−xk< λ}, (3.35)
where λ is the bandwidth of the algorithm.
3. Shift each data point to the center m(x) of neighborhood N(x):
m(x) = P xi∈N(x)K(xi−x)xi P xi∈N(x)K(xi−x) , (3.36)
where K(u) is the kernel from Eqn. 3.34.
4. Repeat from step two until the algorithm converts.
The result of the steps 1 to 4 is the mode which controls data point x. These steps can be applied to each data point in the dataset. At the end, points are merged to a cluster, if their modes lie at distances smaller than a threshold.
Figure 3.12 demonstrates results obtained from different runs of the mean shift algo- rithm with different bandwidths performed on the simulated dataset from section 3.3.2. Centers of the three main density distributions are discovered successfully in all execu- tions. The result of the mean shift algorithm strongly depends on the size of the chosen
3.4 Unsupervised clustering methods without predefined number of clusters67
Figure 3.13: A schematic illustration of the effect that by increasing the bandwidth new clusters can appear.
Figure 3.14: The randomized version of mean shift clustering with different bandwidths is performed on the same simulated dataset from section 3.3.2.
bandwidth. In general, decreasing the bandwidth in mean shift algorithm the number of clusters increases. In some situations it might happen that the number of clusters decreases by decreasing the bandwidth. An example is shown in Figure 3.13. In the first image from left, the bandwidth is smaller and the center point is shifted toward the upper six points. In the image on the right, the bandwidth is larger, and therefore the three points on the bottom left fall in the neighborhood. As a result, the center point of the neighborhood converts and builds a new cluster.
The computation complexity of the mean shift algorithm, in the given implementation, is O(rn2), with r being the number of iterations and n the number of data points. To accelerate the algorithm, normally an alternative implementation is used. In the new implementation, the steps 1 to 4 are performed on a randomly selected data point x. During the shifting phase, all data points in the neighborhood of x are marked as visited, and they belong to the same mode thatxbelongs. The next random point will be selected among the unvisited points. The computation complexity is reduced to O(rkn), where k
is the number of modes. The drawback of the latter implementation is that depending on the selected random points, for the same bandwidth the algorithm can result in different
Figure 3.15: The randomized version of mean shift clustering with a fixed bandwidth is several times executed on the same simulated dataset from section 3.3.2.
clusters.
Figure 3.14 shows the results obtained with the new implementation performed on sim- ulated data for different values of bandwidth. Again, the three centers of original densities are found successfully. Figure 3.15 depicts five executions of the random implementation for fixed bandwidth 1.75. The result of clustering varies in each execution, but the three centers of the original densities can be found correctly.
Mean shift algorithm is a powerful technique in finding the clusters of an arbitrary shape. The original implementation is computationally very expensive. In the randomized version, the correct assignment of all points to the corresponding mode is not guaran- teed. However, the main drawback of the algorithm, in both implementations, remains the difficulty of selecting the proper bandwidth.