5. EL DESARROLLO DE LA INVESTIGACIÓN
5.4. Los Obstáculos enfrentados y los ajustes introducidos en la
To provide additional information about clusters and individual points, hills in the landscape as well as data glyphs located on them can be labeled interactively.
Hill-based labeling. Labels above the hills provide more information about the
clusters. For example, they can present exact cluster sizes, which is helpful to compare similarly sized hills and avoids having to look at the landscape from above to evaluate base areas. If classification information is available, more complex labels like pie-charts could also summarize the class-distribution inside the clusters. Figure 4.5 gives two examples based on the Reuters data set. Depending on the application domain, labels could also display the result of more sophisticated data analysis. For example, in case of text document data, the label above a hill could present statistical results of the documents in that cluster, or a determined topic based on their content. Labels can also trigger an action if clicked on them, e.g. by linking the corresponding points to other views.
From an implementation point of view, determining labels for the hills is just an operator on the merge tree. This operator may require only the merge tree itself, e.g. to determine cluster sizes based on implicitly stored regular nodes, or it may process additional information, e.g. to summarize their properties or content. A label’s position is simply that of the center vertex of its hill in the landscape. This position is stored together with each maximum during the landscape construction. Furthermore, labels always face the viewer, i.e. their orientation is updated when the landscape is rotated during data inspection.
Figure 4.6: Glyph-based labeling in the landscape of the 9-D Reuters data set: Using a movable focus lens, only those glyphs inside the lens are labeled. Depending on their glyph positions inside the lens, labels are displayed in two scrollable lists. For classified data, labels are colored by class. To facilitate interactive inspection, the focus lens is connected to the position of the mouse cursor.
Glyph-based labeling. Typically, the analyst also needs information about single
points to make sense of the data, e.g. to locate interesting data items or to find out why several points do or do not belong to the same cluster. For this purpose, we annotate data glyphs with meta-information like an entity’s name, id, or class. However, for large data sets, showing labels for all glyphs at the same time quickly occludes other labels and the landscape. Therefore, we implemented the excentric labeling [56] to annotate only those glyphs inside a movable focus lens. In its simplest case, the lens has a rectangular or circular shape and is connected to the position of the mouse cursor on the screen. Two scrollable lists to the left and to the right of the lens contain the labels, sorted by the vertical order of the points inside the lens. For classified data, labels are colored by class to be in line with the glyphs. Figure 4.6 shows an example based on the Reuters data set using the class names.
To determine the glyphs inside the movable focus-lens in real-time, the labeling is implemented with a two-dimensional kD-tree. For a fixed viewing direction, i.e. after changing the view on the landscape, a 2D-tree is constructed based on the screen space coordinates of all visible glyphs. While moving the focus-lens, label candidates are then identified by a range query on the kd-tree using the range defined by the focus lens.
(a)
(b) (c)
Figure 4.7: 2-D topological atoll of the 9-D Reuters data set: (a) Using a hypsometric tint, the landscape is colored by height values. This already helps compare height values irrespective of perspective distortion. (b) Because density information is now conveyed by colors, height information is redundant and can be discarded. The result is a 2.5-D visualization. (c) Inspecting the visualization from above gives a 2-D atoll-like visualization in which absolute densities, persistence, and feature sizes (base areas) are visible at the same time. Isolines help the analyst compare point
densities inside the clusters.
4.3
2-D Topological Atoll
Compared to the occlusion problems that projections and axis-based techniques have for high-dimensional data, the 3-D topological landscape is already free of structural occlusion in the sense that all clusters in the original domain truly appear as separated hills. However, the landscape suffers from typical problems of 3-D visualizations, like perspective distortion and view-dependent occlusion of geometry. Depending on the camera position, hills in the background appear smaller and are often occluded. Glyphs residing on a hill’s back-side are also invisible. If large hills are positioned at the landscape’s border, these drawbacks have a negative effect on the visual analysis. It is also impossible to compare height values and base areas at the same time because both properties require to inspect the landscape from different angles.
These problems occur because the landscape’s geometry conveys more than one feature property and requires more than two dimensions for this purpose. Distributing the required information more efficiently to the available information channels can
mitigate these issues. One information channel that is still unused is the landscape’s color. Since colors only discriminate hills visually, this attribute can be used to display a feature property in order to simplify the landscape’s geometry.
By using a hypsometric tint, which indicates elevation by colors as commonly used in geographic maps, we transfer information about densities from a triangle’s height attribute to its color. To preserve the expressive power of the landscape metaphor, we use a transfer function that maps naturally occurring colors to different height levels: going from blue (water) and yellow (beach) through green (grass) into brown (mountains) and finally to white (snowy mountain top). Black isolines augmented to the landscape at various height levels help compare density values (cf. Figure 4.7a). Because height information is now redundant, the terrain can be flattened by setting the z-coordinates of all vertices to zero (cf. Figure 4.7b).
While these explanations illustrate how the 3-D landscape is transformed into a 2-D visualization, a more efficient implementation would directly create a flat triangulation. Understanding the former z-information as a 2-D scalar field, an extraction of isolines using marching triangles, a specialization of marching cubes [115] for isosurfaces, and using a color map with the transfer function above directly leads to the same result.
The landmasses surrounded by water remind the viewer of a bird’s eye view on an atoll rich of islands. This metaphor is useful to depict clusterings whose density function contains saddles of low density. These are represented and perceived easily by the blue area around the islands. Figure 4.7c gives an example based on the Reuters data set. Compared to the 3-D landscape, all clusters and data glyphs are visible in the same view, and cluster sizes and their persistence as well as absolute densities of clusters and points can be identified and compared at the same time.
4.4
2-D Topological Landscape Profile
The original topological landscape requires three dimensions to visualize a contour tree because ambiguities need to be solved when representing local minima by sinks and merge saddles by valleys (cf. Figure 4.1b,d). However, these ambiguities do not arise for the less complex merge tree which only captures the appearance and merging behavior of superlevel sets. Consequently, its simpler structure allows the merge tree to be visualized as a 1-D function having the same topology. This 1-D function can be imagined as a cut through a 3-D density height field; hence its name 2-D topological landscape profile. In this section, we present a novel 2-D landscape metaphor specifically designed for a merge tree. While visualizing a merge tree as a
0 2 3 2 1 0 1 1 3 0 0 (a) 0 1 2 3 0 10 1 2 30 (b)
Figure 4.8: Transforming a merge tree’s 3-D topological landscape into a 2-D landscape profile with the same topology: (a) Schematic illustration of the 3-D landscape seen from above. Spiral shaped arrows, alternating colors and numbers indicate the spiral layout of the hills and the hierarchy of the child branches. (b) By unrolling the spiral layout, all child patches are placed next to each other. A lateral cut through the 3-D landscape provides a 2-D view on the hills.
2-D landscape profile is also valuable for other research fields, here, the metaphor is developed particularly for visual cluster analysis.
Conceptually, the 2-D landscape profile could be obtained from a variation of the 3-D topological landscape for merge trees. Although the profile will not be constructed this way later on, it is still interesting to see the relation between both visualizations. We make the following observations (cf. Figure 4.8): The 3-D landscape of a merge tree only consists of hills and valleys belonging to the maxima and saddles, respectively. That is, because there are no sinks and ambiguities, the spiral layout of the child branches can be unrolled by placing them next to each other. The topology of this 3-D landscape still reflects that of the input merge tree. However, the analyst still has to inspect the landscape from different angles to compare heights and base areas. This is avoided by depicting a feature’s volume/size by its hill’s width instead of its base area, i.e. by using only one dimension instead of two. Furthermore, the volume/size that was previously assigned to all triangles around the hills is now distributed only to the triangles between the hills. When looking at this landscape from the side, all hills are visible at the same time and, after the metric-based distortion, their width and the gaps between them accurately reflect the volumes/sizes of all individual merge tree arcs. Because depth-information is now redundant, it can be discarded by considering only a lateral cut—a profile—of the landscape.
(a) (b)
Figure 4.9: 2-D topological landscape profile of an artificial 2-D data set: (a) 2-D point cloud with clusters of varying size (number of points) and compactness. The merge tree encodes the clustering structure. (b) 2-D profile with the merge tree augmented. In the profile, separated clusters are represented by separated hills. A hill’s height, width and area reflects that cluster’s persistence, size, and stability, respectively. The subtrees of saddle nodes are sorted by persistence to place similarly persistent/high hills next to each other. Histograms or stacked bar charts (colored by class) on the hills indicate the point density distribution inside the clusters.