• No se han encontrado resultados

Trajectory forest traversal is an embarrassingly parallelizable problem. The branching of any two leaf nodes in the forest are independent operations that may be carried out in parallel, in a straightforward manner. To begin with, the launch dates to be considered are distributed evenly amongst all available processing cores.

Each core is responsible for branching each launch date node that is assigned to it. Once the first generation of leaf nodes has been produced (representing departure legs), the leaf nodes that are selected for further branching are again evenly distributed amongst the available processors and the procedure repeats. In this manner, the entire problem is executed in parallel except for a few select operations such as sorting and file i/o. Even these operations can be performed in parallel as there exist algorithms for performing parallel file read/write as well as sorting.

This parallelization of the tree search problem is realizable in both distributed and shared memory configurations. It is worth noting that a distributed implementation of this algorithm would not require

substantial inter-processor communication overhead, and is therefore especially conducive to processor grid scaling. For this work, a shared memory parallelization scheme is considered, primarily due to the compute hardware available, but the analyses and observations are equally valid for a distributed version of the tree search algorithm.

6.7.1 Multithreading Using OpenMP

OpenMP is a specific implementation of processor multithreading that follows the fork-join operational concept and is available for the C, C++ and Fortran programming languages. OpenMP is attractive as it is relatively straighforward to implement, being controlled almost exclusively by way of compiler directives in the source code. Essentially, a programmer need only designate areas of the code that are to be executed in parallel (e.g. a for loop whose iterations are independent of one another) and the OpenMP API carries out the necessary low-level actions required to perform the parallel execution.

In the OpenMP paradigm, the runtime environment designates a master thread that subsequently desig-nates (forks) its own slave threads amongst which the system divides the computational work to be performed.

Individual threads in this thread pool carry out their alloted instructions in the designated parallel region of code and then idle at the end of the parallel region until all threads in the pool have finished their execution (i.e. they join). Being a shared memory scheme, all threads executing in an OpenMP parallel region exist on the same compute node and therefore draw from the same source of RAM.

6.7.2 Intel Xeon Cluster vs. Intel i7 Quad-Core

The parallel tree search framework described in this chapter is demonstrated in this work by deployment on a Linux computing cluster furnished with four Intel Xeon E-74890 processors. Each Xeon chip houses 15 individual processors, which in turn have two logical threads each, for a total of 160 compute threads across the system. The cluster also features 512 GB of total RAM, which is an attractive feature for the RAM-limited breadth-first tree traversal algorithm.

In order to quantify the execution speed increases gained from parallelization of the flyby tree traversal algorithm, an example MGA interplanetary mission to Saturn is considered. A fixed flyby sequence of Earth-Earth-Venus-Venus-Earth-Saturn is assumed. Only ballistic trajectories are considered, and the search is performed from Saturn backwards through the flyby sequence to Earth. We have found this backwards search to be beneficial when considering trajectories to the outer planets as the tree search is only performed

on trajectory legs that successfully connect from the inner solar system to the outer solar system. The backward search also filters out any Lambert solutions with excessively long times-of-flight between the inner and outer planets, thus eliminating early in the search many trajectories that will violate a total flight time constraint imposed by the mission designer.

For this problem, a 1000 day Saturn arrival window scan in one day increments is considered. The orbits of Venus and Earth are discretized into 225 and 365 grid points of true anomaly respectively, achieving a 1 day flight time resolution for the Lambert flight time grid. The breadth-first tree search algorithm retained the top 30% of all partial solutions at each level (flyby) in the search. Tree nodes were ranked according to the following partial cost function:

C = 1

√2

qT OF2+ v2

f (6.10)

where T OF is the time-of-flight of the current partial solution and vf is the arrival hyperbolic velocity at Saturn. This is a partial cost function in that it is applied against each partial flyby sequence upon completion of each level in the tree search. More sophisticated partial cost functions and Pareto sorting algorithms have been described by both Johnson [108] and Lantukh [9] that seek to mitigate the greediness associated with pruning the flyby tree as the search is executed, however, those strategies are not a focus of this work.

The tree search was executed on three separate hardware configurations in order to capture the benefits of the parallel search:

1. Serial search on an Intel i7-7820 @ 2.90 GHz

2. Parallel OpenMP search on an Intel i7-7820 @ 2.90 GHz

3. Parallel OpenMP search on the Linux cluster with Intel Xeon E-74890 @ 2.80 GHz

The Intel i7 runs were executed on a laptop with four processors (for a total of eight logical threads). The solution with the lowest Saturn encounter velocity is shown in Fig. 6.10.

Figure 6.10: Ballistic Lambert MGA trajectory from Earth to Saturn, using an E-EVVE-S flyby sequence.

The execution times associated with each hardware setup are summarized in Table 6.2.

Table 6.2: Lambert tree search execution times for the E-EVVE-S problem.

Architecture Runtime

Serial i7 3.06 hrs.

OpenMP i7 0.54 hrs.

OpenMP Linux cluster 4.97 min.

The embarrassing parallelism of the breadth-first flyby tree search affords substantial speed-up when OpenMP multithreading is exploited. All three cases computed 1 153 641 feasible Lambert trajectory legs and 579 full E-EVVE-S sequences were identified (with the lowest 70% partial sequences being pruned at each level in the tree search).

Documento similar