• No se han encontrado resultados

The pre-computation of paths is a time-consuming operation in large graphs such as the Internet AS-level graph. Even though changes to the slow-changing path records happen infrequently (as a result of permanent changes in the connectivity), it is still important to complete the pre-computations in a timely manner (i.e., at most a few hours). To that end, we use a parallel breadth-first search method which is effective in reducing the computation time. However, parallel breadth-first search requires a significant amount of memory when computing paths, because the frontier (i.e., incomplete) paths are kept in the main memory. We use two strategies to

significantly reduce the memory usage: i) divide the problem of computing paths from a single source domain to all destination access domains into subproblems of

computing paths from an egress channel Seof the source domain to an ingress channel

of a destination access domain Di, and ii) pre-compute distances from all the channels to the destination ingress channel in order to identify and filter frontier paths that

are to exceed the permissible length (i.e., L+2) of the path from Se to Di, as they

are generated.

To further reduce the memory usage, the path generation method first generates paths on an “abstract” AS-level topology that contains a single “virtual channel” between all the neighboring domains that are connected by one or more physical channels in the actual topology. In the second step, the set of paths generated using the abstract topology are duplicated using the virtual-to-physical channel mappings. Generating paths using the two-step approach does not miss any paths that a breadth- first search method would compute on the actual topology.

The procedure to compute paths from a single source domain S to all destination

domains is given in Listing 6.3. In this approach, we first generate an initial set of

paths (Lines 4–11) using the abstract topology, named Tabstract. The path generation

procedure is divided into smaller sub-problems of computing paths from an egress

channel Se of the source domain to an ingress channel Di of the destination domain.

The procedure computes all the paths from Se to Di with length up to L + 2, where

L is the length of the shortest path (i.e., distance) from Se to Di. Then, the paths

are duplicated with the multiple channels of the actual topology (Line 12).

The parallel breadth-first search method (Line 8) computes paths from an egress

channel Se of the source domain to an ingress channel Di of the destination domain

D. The method uses the pre-computed distances from the channel Di to all other

channels in the topology which are computed using the Dijkstra algorithm (Line 7). In particular, the pre-computed distances are used to eliminate frontier paths, whose

current length plus the pre-computed distance of its last channel to Di exceeds the

length constraint (i.e., distance between Se and Di plus two), as they are generated.

1 def Path_Generation ( Topology_Graph T , Domain S ) 2 Tabstract = E x t r a c t _ C o n n e c t i v i t y _ G r a p h ( T )

3 Paths = {}

4 for each egress channel Se of S

5 for each Stub Domain D in T

6 for each ingress channel Di of D

7 DistD

i = D i j k s t r a (Di,Tabstract)

8 Paths += B F S _ P a r a l l e l (Se,Di,Tabstract,DistDi)

9 end

10 end

11 end

12 F i n a l _ P a t h s = D u p l i c a t e _ P a t h s _ P a r a l l e l ( Paths ,Tabstract,T )

13 end

Listing 6.3: Procedure to compute paths from domain S to all access domains.

Using the topology described in Section 6.1.2, we computed the paths from a ran-

domly selected access domain to all other access domains using the above procedure. The computation resulted in 1.65 billion paths. The path computation procedure requires around five hours to complete on a single server with four Intel Xeon X5650 processors (six cores per processor) with 128 MB RAM, using 24 threads working in parallel.

The main bottleneck in the computation is the input and output since the compu- tation is data intensive. Because the MapReduce programming model is a good fit for data intensive computations, we developed and used a parallel breadth-first search im-

plementation in Hadoop [80]—an open source MapReduce implementation—to gen-

erate paths. The MapReduce implementation generates the same set of paths from a single source to all destinations with the length constraint in just an hour using 8 quad-core computers each with 500 GB storage.

The distribution of the number of paths to each destination domain is shown in Figure 6.2. It is important to note that a large percentage of the destinations (32,067 of 37,986 domains or 86% of all domains), have 60,000 or less paths. Also, a very small

percentage of domains, (517 domains or around 1% of all domains) have 300, 000 or more paths. The domains with large number of paths have high connectivity and as a result, our channel-oriented path generation method computes more paths to such highly connected access domains. These domains include popular destinations such as content distribution networks (e.g., Akamai) and content producers like Google. The popularity of the domains with large number of paths makes it more challenging to scale the path computations. However, by caching recently computed paths, the path service can respond to queries that request paths to popular destinations directly from its cache.

100 1000 10000 100000 1e+06 1e+07 0 5000 10000 15000 20000 25000 30000 35000 40000 Number of Paths Destination Domains

Figure 6.2: Distribution of paths to destination domains.

In the experiments, we assume that the slow-changing routing information is static. Therefore, the slow-join operation is only performed once and no incremental changes are made thereafter. Instead, we focus on the more challenging problem of scaling path computations with periodic changes in the fast-changing routing infor- mation. Next, we describe the path service implementation which only deals with the fast-changing routing information.

Documento similar