OBJETIVOS DE ESTE CAPITULO CAPITULO
CALIFICACIÓN MEDICIÓN EVALUACIÓN
Two-dimensional partitioning is a more flexible alternative to one-dimensional partitioning, in which there is no specific part assigned to a given row or column. Thus, we must specify the part for particular sets of nonzeros. A recent paper by Catalyurek et al. [16] gives an overview of several of these methods described in this subsection. In this paper, the authors also develop a recipe to choose between several two-dimensional partitioning methods for different sparse ma-
trix types. This recipe produces partitions that are typically slightly worse (in terms of communication volume) than the fine-grain hypergraph method (dis- cussed below) but often require less runtime to calculate.
Two-dimensional Cartesian partitioning is a simple method of partitioning in which a part is assigned to the nonzeros that lie in both a particular set of rows and a particular set of columns. Such a partitioning is obtained by partitioning the matrix into r parts in one dimension, say row-wise, and then partitioning each of the row parts into q column parts for a total of r × q parts. Figure 2.18 shows a block version of this method where the parts consist of nonzeros in a set of contiguous rows and columns. Although Cartesian block partitioning is a good method for dense matrices, it suffers from potential poor load-balancing for most sparse matrices.
Figure 2.18: Two-dimensional Cartesian partitioning.
Two-dimensional coarse-grain hypergraph
Catalyurek and Aykanat showed how the two-dimensional block Cartesian par- titioning method can be improved by using one-dimensional hypergraph parti- tioning in both directions to obtain a more scattered Cartesian partitioning [14]. As with the two-dimensional block Cartesian method, their coarse-grain hyper- graph method partitions the sparse matrix into r parts in one dimension, say row-wise, and then partitions each of the row parts into q column parts for a total of r × q parts. This results in an irregular checkerboard partitioning that limits the number of messages in the sparse matrix-vector product that any process must send to r − 1 in the expand phase of communication and q − 1 in the fold phase (as described in Section 2.2). Unlike the two-dimensional block Cartesian partitioning method, this coarse-grain partitioning method allows a part to be assigned noncontiguous blocks of nonzeros, which allows for an im- provement in the load-balancing. This method also attempts to exploit the sparsity of the matrix to reduce the communication volume.
The coarse-grain hypergraph partitioning method consists of two stages of one-dimensional hypergraph partitioning [14]. There are two basic variants of this method, one that partitions row-wise then column-wise and another that partitions column-wise then row-wise. Without loss of generality, we describe only the former variant. In the first stage, the rows are partitioned into r parts using one-dimensional hypergraph partitioning (as described in Subsec- tion 2.5.1). This stage balances the work across process rows and attempts to minimize communication in the expand communication phase (for this 2D process mesh topology). Figure 2.19(a) shows an example of stage one for a simple 6 × 6 matrix. The nonzeros in rows 1-3 will be assigned to either part 1 or 2. The nonzeros in rows 4-6 will be assigned to either part 3 or 4. In the second stage, the columns are partitioned using a special one-dimensional multi-constraint hypergraph partitioning algorithm. This algorithm attempts to minimize the communication in the fold communication phase (for this 2D process mesh topology). However, it not only balances the work across process columns, it satisfies multiple constraints to ensure that given the row partition- ing in the first stage, this column partitioning yields a partition such that each part has approximately the same number of nonzeros. Figure 2.19(b) shows the second stage for the same 6 × 6 matrix. The nonzeros in columns 1, 2, 6 are assigned to either part 1 or 3. The nonzeros in columns 3-5 are assigned to either part 2 or 4. These two stages result in each part containing four nonze- ros and produces the partition shown in Figure 2.19(b) (with different colors representing different parts).
(a) Stage 1. (b) Stage 2.
Figure 2.19: Two-dimensional coarse-grain hypergraph partitioning.
Two-dimensional Mondriaan method
A slightly more general and flexible two-dimensional partitioning method is the Mondriaan method [69]. Mondriaan uses recursive bisection such that at each level of the algorithm the partitions from the previous level can be partitioned by
either rows or columns. As shown in Figure 2.20, this yields a rectangular tiled partitioning, where each part tile can have varied dimensions. In Figure 2.20, we see that the first level partitioning was made row-wise (division shown by the horizontal cyan line). The second level partitioning was made column-wise for the top portion but row-wise for the lower partition (orange lines). As with the Cartesian method, the parts need not consist of consecutive rows/columns but are shown this way for easier illustration.
Figure 2.20: Two-dimensional Mondriaan partitioning.
Two-dimensional fine-grain hypergraph
The most flexible partitioning method is the fine-grain hypergraph partitioning method in which each nonzero can be partitioned separately from the others [13]. In the fine-grain hypergraph model, each nonzero is assigned a partition separately and thus is represented by a vertex in the hypergraph. Each row is represented by a hyperedge in the hypergraph (magenta horizontal hyperedges in Figure 2.21). Likewise, each column is represented by a hyperedge in the hypergraph (orange vertical hyperedges in Figure 2.21). Thus, for an n × n matrix, the fine-grain hypergraph model has 2n hyperedges.
As with the one-dimensional hypergraph model, we partition the vertices into k equal sets (k = 2 in Figure 2.21) such that the hypergraph cut metric described in Subsection 2.5.1 is minimized. Again, the communication volume is equivalent to this hyperedge cut metric. Catalyurek and Aykanat proved that this fine-grain hypergraph model yields a minimum volume partitioning when solved optimally [13]. In Figure 2.21, we see the fine-graph hypergraph parti- tioning of the 8 × 8 arrowhead matrix. The resulting communication volume is seen to be three, which is a significant improvement over the communica- tion volume of six from the optimal one-dimensional partitioning. As with the one-dimensional hypergraph model, solving the fine-grain hypergraph model op- timally is NP-hard, but there are heuristics that can solve it close to optimally in polynomial time. Unfortunately, the resulting fine-grain hypergraph problem
h7 h8 h16 h1 h2 h3 h4 h5 h6 h9 h10 h11 h12 h13 h14 h15
Figure 2.21: Fine-grain hypegraph partitioning of arrowhead matrix with k = 2 partitions. Cut hyperedges are shaded. The hyperedge cut, and thus the communication volume, is three.
is a larger NP-hard problem and thus may be too expensive to solve quickly for large matrices.
Two-dimensional bipartite graph
Another generalization of a symmetric model to the nonsymmetric case is to use a bipartite graph. This generalization is equivalent to a model recently proposed by Trifunovic and Knottenbelt [68].
We start with the bipartite graph G = (R, C, E) of the matrix, where R and C correspond to rows and columns, respectively. In the one-dimensional distribution, we partition either the rows (R) or the columns (C). For the fine- grain distribution, we partition both R, C, and E into k sets. Note that we explicitly partition the edges E, which distinguishes our approach from previous work. To balance computation and memory, our primary objective is to balance the edges (matrix nonzeros). Vertex balance is a secondary objective.
We wish to analyze the communication requirements, so suppose that the vertices and edges have already been partitioned. Cut edges, that is, edges with endpoints in different parts, may incur communication but not necessar- ily. The edge cut approximates but does not represent communication volume exactly [12, 35]. Instead we assign communication cost to vertices. Clearly, a vertex where all incident edges belong to the same part incurs no communi- cation because all operations are local. Conversely, a vertex with at least one incident edge in a different part must incur communication because there is a vector element xi or yj residing on a different processor than ai,j. The com-
munication volume will depend on the number of different parts to which these edges belong. We have:
E(v) denote the set of edges incident to vertex v. Let π(v) and π(e) denote the parts to which vertex v and edge e belong, respectively. For a set of edges E = {e1, e2, . . . , en}, let π(E) = π(e1) ∪ π(e2) ∪ · · · ∪ π(en). Then the communication
volume for matrix-vector multiplication is P
v∈R∪C(|π(v) ∪ π(E(v))| − 1).
In the bisection case, the volume is simply equal to the number of vertices that have at least one incident edge in a different part (boundary vertices). A crucial point is that by assigning edges to processes independently of the vertices, we can reduce the number of boundary vertices compared to the traditional one-dimensional distribution, where only vertices are partitioned and the edge assignments are induced from the vertices.
Our model is a variation of the bipartite graph model proposed in [68]. One difference is that while [68] partitions only edges, we partition both edges and vertices. Second, [68] uses the term edge coloring, which we find confusing since “coloring” has a specific (and different) meaning in graph theory and algorithms.
Equivalence of bipartite model to fine-grain hypergraph model Since our bipartite graph model is exact, it must be equivalent to the fine-grain hypergraph model. In fact, there is a simple and elegant relation between the two models, independently discovered by Trifunovic and Knottenbelt [68]. Let the dual of a hypergraph H = (V, E) be another hypergraph H∗ = (V∗, E∗), where |V∗| = |E|, |E∗| = |V |, and hyperedge i in E∗ contains vertex j iff
hyperedge j in E contains vertex i. Now let H be the hypergraph for the fine-grain model.
Theorem 2.5.2 (Trifunovic (2006)) The dual hypergraph H∗ of the fine- grain hypergraph is the bipartite graph.
In other words, vertices in the fine-grain hypergraph H are edges in the bipartite graph G and vice versa. Hence partitioning the vertices of H corresponds to partitioning the edges in G. In our bipartite graph model, we also explicitly partition the vertices in G, while the hyperedges in H are partitioned only implicitly in the hypergraph algorithm.
Given a matrix with z nonzeros, the bipartite graph has z edges while the fine-grain hypergraph has 2z vertices. Thus, algorithms based on the bipar- tite graph model may potentially use less memory, though using the standard adjacency list data structure each edge is stored twice, so there is no savings.