7. MARCO CONCEPTUAL
7.1. Planteamiento del Concepto
7.1.5. El Período: El Barroco
ExpandCluster(SetOfObjectsDB, Objectstart, Integercid, Realε, Integerk)→boolean
SetOfObjectsseeds:=Nε(start); if |seeds|< kthen
start.clusterID := NOISE; return false;
end if
for eacho∈seedsdo
o.clusterID :=cid; end for
removestartfromseeds; whileseeds6=∅do
o:= first point inseeds; neighbors:=Nε()(o); if |neighbors| ≥kthen
for eachp∈neighborsdo
ifp.clusterID∈ {UNCLASSIFIED, NOISE}then ifp.clusterID = UNCLASSIFIEDthen
insertpintoseeds; end if
p.clusterID :=cid; end if
end for end if
removeofromseeds; end while
return true;
Figure 2.5: The methodExpandCluster.
2.3
Extensions of Density-Based Clustering
Hierarchical Density-Based ClusteringDBSCAN computes a flat density-based decomposition of a database w.r.t. a global density parameter, specified byεandk. However, there may be clus- ters of different density and/or nested clusters in the database (see Figure 2.6 for an illustration). In this case, the globally chosen density threshold determines which clusters will be found and DBSCAN is not able to detect all the clustering information contained in such data.
Figure 2.6: Clusters with different density (left) and nested clusters (right). extended by hierarchical concepts [ABKS99]. Based on these concepts, the algorithm OPTICS is presented. The key idea is that (for a constant k- value) density-based clusters w.r.t. a higher density (i.e. a lower value for
ε) are completely contained in density-based clusters w.r.t. a lower density (i.e. a higher value for ε). Figure 2.7 illustrates this observation: C1 and
C2 are density-based clusters w.r.t. eps1 < eps2 and C is a density-based cluster w.r.t.eps2, completely containing C1 and C2.
The algorithm OPTICS works like an extended DBSCAN algorithm, computing the density-connected clusters w.r.t. all parameters εi that are
smaller than a generic valueε. In contrast to DBSCAN, OPTICS does not assign cluster memberships, but stores the order in which the data objects are processed and the information which would be used by an extended DB- SCAN algorithm to assign cluster memberships. This information consists of only two values for each object, the core distance and the reachability distance. The core distance of a pointq is the smallest threshold ˆε≤εsuch
C
C1 C2
eps1 eps2
2.3 Extensions of Density-Based Clustering 25
Figure 2.8: Reachability plot (right) computed by OPTICS for a sample 2D data set (left).
that q is a core point w.r.t. ˆεand k. The reachability distance of a point p
w.r.t. another pointq is the smallest threshold ˆε≤εsuch that pis directly density-reachable fromq.
A great advantage of OPTICS is that the resulting cluster ordering can be visualized very intuitively and clearly by means of a so-called reachabil- ity plot. A reachability plot is a two-dimensional visualization of a cluster ordering, where the points are plotted according to the sequence specified in the cluster ordering along the x-axis, and for each point, the reachability distance along the y-axis. Figure 2.8 (right) depicts the reachability plot based on the cluster ordering computed by OPTICS for the sample two- dimensional data set in Figure 2.8 (left). Intuitively, clusters are “valleys” or “dents” in the plot, because sets of consecutive points with a lower reach- ability value are packed more densely. In particular, to manually obtain a density-based clustering w.r.t. anyε0 ≤εby visual analysis, one simply has to cut the reachability plot at y-level ε0 (i.e. parallel to the x-axis). The consecutive valleys in the plot below this cutting line contain the respective clusters. An example is presented in Figure 2.8 (right): For a cut at the level
ε1, we find two clusters denoted as A and B. Compared to this clustering, a cut at level ε2 would yield three clusters. The cluster A is split into two smaller clusters denoted byA1 andA2and clusterBdecreased its size. This illustrates, how the hierarchical cluster structure of a database is revealed at a glance and can be easily explored by visual inspection.
Figure 2.9: Browsing through cluster hierarchies. Visually Mining through Cluster Hierarchies
In [BKKP04] the authors show how visualizing the hierarchical clustering structure of a database of objects can aid the user in his time consuming task to find similar objects. Based on reachability plots produced by OPTICS, approaches which automatically extract the significant clusters in a hierar- chical cluster representation along with suitable cluster representatives are proposed. These techniques can be used as a basis for visual data mining. The resulting interactive browsing tool is called BOSS (BrowsingOPTICS- Plots for Similarity Search), which utilizes automatic cluster recognition and extraction of cluster representatives in order to provide the user with significant and quick information (see Figure 2.9 for an illustration). The effectiveness and efficiency of this approach is for example shown for CAD objects from a German car manufacturer.