CAPÍTULO 11: CONTEXTOS FUNERARIOS Y CEREMONIALES
11.4 Aproximaciones al ritual ofrendatorio
Here we assume that h-renement approach has been used and a hierarchy is main- tained. In order to illustrate the simple idea behind diusion algorithms it is con- venient to introduce a weighted graph which, following Vidwans et al. ([104]), we call a Weighted Partition Communication Graph (WPCG). This represents the face adjacency of thekPk processors being used (processors that share at least one edge
of a root element with a given processor are said to be face adjacent to that pro- cessor). A WPCG is obtained by having one vertex for every processor and an edge between two vertices if and only if they are face adjacent to each other. The weight wNi of the i
mesh which reside on theith processor and the weightwE
ij of the edge connecting
the ith andjth processors is equal to the number of leaf-level edges which lie on the
interpartition boundary between the two processors. Diusion methods correspond closely to simple iterative methods for the solution of diusion problems; indeed, the surplus load can be interpreted as diusing through WPCG towards a steady balanced state.
2.4.1 Basic Diusion Method
This iterative approach, which is described in [12] for example, is a very simple and intuitive parallel method for dynamic load balancing. Here for each vertex in the WPCG we transfer an amount of work to each of its neighbours which is proportional to the load dierence between them. In general this approach will not provide a balanced solution immediately, so the process has to be iterated a number of times until the load dierence between any two processors is smaller than a specied value. In eect this method diuses the load gradually amongst neighbours. If we denote by li the load of the processor pi then the above basic
diusion method can be described algorithmically by the procedure given in Figure 2.1.
The main advantage of this method is that it only needs communications among neighbours (which may also be asynchronous). The main disadvantage is that the convergence can be slow (in the worst case the number of iterations needed to reach a given tolerance is O(kPk
2) where
kPk is the total number of processors ([52]))
and the method is neither able to detect a global imbalance nor able to remedy it (see [52] for an example). It may also be noted that a processor pi essentially acts
simultaneously on all its interprocessor communications channels. Even though a machine may have parallel hardware for communication, the communication will often have to be serialised with respect to an individual processor.
In order to avoid these shortcomings we consider another diusion method, to be called the multi-level diusion method ([52]).
2.4.2 A Multi-Level Diusion Method
This is basically a divide-and-conquer type of approach. Let P be the WPCG (see
begin
while (not converged) do for all processors pi do
for all Ni neighbours pj of pi do
if li > lj
transfer b(li?lj)=2c load from pi to pj
end for end for end while end.
Figure 2.1: Diusion method.
& %
in the set P at that stage. The change in computational load on processor pi is
denoted byli. The sum of the load incrementsli of all subproblemspi in the subset
Pj of P is denoted by Lj. The procedure balance shown in Figure 2.2 achieves the
desired load balance. It is important to note that the bisection step in Figure 2.2 means the following:
- P1 \P 2 = ;, - P1 [P 2 = P, - jkP 1 k?kP 2 kj 1.
It is also important to note that no assumptions on the processor topology are made by the algorithm. Hence the user has the freedom to orient the bisection of the processor sets towards his/her processor topology if this is appropriate. It can easily be seen that the average case time complexity of this algorithm is O(logkPk).
The principle drawback of this algorithm is that it is not always possible to bisect a connected graph into two connected subgraphs. Also the conditionjkP
1 k?kP
2 kj
1 is too restrictive in the sense that relaxing this condition may improve the quality of the load balancer.
As a matter of fact the dynamic load balancing algorithm presented in forth- coming chapters relaxes this condition in addition to choosing the sorted version of
begin balance(P)
if kPk= 1 then return
bisect P intoP1 and P2
calculateL1 and L2 transferb(L 2 kP 1 k?L 1 kP 2 k)=(kP 1 k+kP 2 k)c load fromP 2 to P1 balance (P1) balance (P2) end balance.
Figure 2.2: Multi-level diusion method.
& %
the Fiedler vector for the purpose of bisections.
2.4.3 Dimension Exchange Method
In [23] Cybenko shows that the basic diusion algorithm is very slow to converge and therefore proposes an alternative version of the algorithm known as the di- mension exchange method. This method is designed specically with a hypercube architecture in mind.
Let us rst dene the edge-colouring of a graph G = (V,E). By this we mean that the edges of G are coloured with some minimum number of colours (say k) such that no two adjoining edges are of the same colour. A dimension is then dened to be the collection of all edges of the same colour. Let us assume that we have an edge-colouring of the WPCG. Then the dimension exchange method can be described in terms of the procedure shown in Figure 2.3.
Xu and Lau (see [117, 118]) have generalised the dimension exchange method by introducing an exchange parameter and called the new method the generalised dimension exchange method. In their paper they have also analyzed its properties and potential eciency.
Unfortunately all of the above mentioned algorithms do not take into account one important factor, namely that the data movement resulting from the load bal- ancing schedule should be kept to a minimum. Also no information is given about
Procedure for processor i ( 0 i < kPk)
begin
while (not Terminate) for( c = 1; c k; c++)
if there is an incident edge coloured c
load balance the two connected processors end if
end for end while end procedure.
Figure 2.3: Dimension exchange method.
& %
culates the total weight to be transferred.