Universidad Complutense - Las TIC en la Enseñanza: Experiencias en la UCM

First, we consider how each introduced test problem is represented within the model, in terms of dependencies the current iteration of the wave front requires, as well as other details such as how test data is represented. Maximum theoretical problem instance sizes are also defined based on the code being deployed on a GPU with 6GB of memory.

4.1.1 Longest Common Subsequence Problem

The dependency structure for the longest common subsequence problem is given in Fig. 4.1, with dark grey cells denoting the wave front, and light grey cells showing the dependencies. White cells in the top left, behind the dependencies, are cells which can be transferred back to the host if required. Therefore in the case of this problem, only 3 vectors need to be stored on the GPU. The memory complexity of algorithm on the GPU is therefore O (n) where n is the length of the longest input string. Due to this low memory complexity, we believe that if the scoring grid was not be stored on the host, and assuming the data type of the scoring grid was a 8 byte long integers, strings of length roughly 150,000,000 could be solved before

Fig. 4.2 Dependencies of the wavefront when solving the Manhattan tourist problem

exceeding the 6GB limit of our GPU hardware. This figure assumes there both test strings are stored on the device, and the scoring grid is required to store long integers rather than regular integers due to the possible value of the longest common subsequence. Finally, both input strings will be stored in constant memory on the GPU, as characters of vectors.

In the case of the edit distance problem - the dependency structure is identical to that of the LCS problem. Therefore the above approach can also be used for the edit distance problem by only changing the dynamic programming case statement.

4.1.2 The Manhattan Tourist Problem

The Manhattan tourist problem follows a similar structure to the LCS problem, but requires less previous dependencies. Figure 4.2 shows the dependency structure of this problem, again with the wave front denoted in dark grey and the dependencies in lighter grey. Due to the dependencies of this algorithm the memory complexity is still O (n), where n is the width of the scoring grid. The test data requirement for this model is higher, however. We store the input test data in two matrices, where the first matrix stores the vertical edge weights, and

Fig. 4.3 Dependencies of the wavefront when solving the knapsack problem

the second matrix stores the horizontal edge weights. Each row of the input matrices stores a complete column, or rows worth of edge weights.

Based on these increased test data requirements, we now attempt to determine the theoretical maximum problem instance size of the model. Based on two square input matrices representing the edge weights storing 4 byte integers, and the scoring grid also storing 4 byte integers, we believe the maximum theoretical city dimension our model can solve is 25, 0002. When running the Manhattan tourist problem, we elect to store the test data in texture memory as it is possible requests may be spatially close within the 2D space.

4.1.3 The Knapsack Problem

Next we consider the knapsack problem; compared to the previous problems, this has a more complex dependency structure. It is possible for a cell to require the entire row the cell is stored in, from column 0, through to the column the current cell is in. Therefore, a much larger number of previous iterations must be maintained to satisfy the dependencies. Until the wave front reaches the halfway point of the scoring grid, all previous iterations must be

Fig. 4.4 Dependencies of the wavefront when solving the knapsack problem, once the wavefront is more than halfway through the scoring grid

maintained. This is shown in Fig. 4.3, where the dark grey cells are the wave front, and the lighter grey cells are the dependencies.

Once the wave front has moved passed the centre point, only then can old iterations begin to be transferred off the GPU. Figure 4.4 shows how the dependency structure appears when the wavefront has moved beyond the halfway point of the scoring grid. It can be observed that some cells are maintained that are not needed, for example the entire top row of the scoring grid. This is due to the fact some of the cells of the iterations they belong to are required by the wavefront, and therefore the entire iteration must be maintained.

Due to the higher number of previous iteration dependencies, memory complexity for data which must be stored on the GPU is O ⌈W₂·n⌉ where W is the capacity of the knapsack and n is the size of the item set. Test data is stored on the GPU in two vectors, a vector containing the weight of all items in the input data set, and a second vector containing the profit of these items. In the case of the bounded knapsack problem, a third vector is stored on the GPU dictating how many times each item can be selected. All of these vectors reside in constant memory of the GPU. As the memory usage of the algorithm is dependent on the capacity as well as the size of the item set, a limit to the size of input problem cannot be

defined in terms of item set size alone. However we seek to provide an example of a problem which will use all of the memory of the test GPU. A problem instance for the 0/1 knapsack problem which contained 30,000 items and had a knapsack capacity of 400,000 would be a rough limit for the problem size which could be executed on our GPU. This is based on the assumption all data structures are storing standard 4 byte integers.

In document Las TIC en la Enseñanza: Experiencias en la UCM (página 37-40)